[Mono-bugs] [Bug 480152] string.Normalize() frequently produces incorrect output

Tue Apr 20 16:28:56 EDT 2010

http://bugzilla.novell.com/show_bug.cgi?id=480152

http://bugzilla.novell.com/show_bug.cgi?id=480152#c34

--- Comment #34 from Damien Diederen <dd at crosstwine.com> 2010-04-20 20:28:55 UTC ---
Created an attachment (id=355741)
 --> (http://bugzilla.novell.com/attachment.cgi?id=355741)
Normalization.cs: Follow the spec when checking composition pairs.

Figure 7 in section 1.3 of http://unicode.org/reports/tr15/ shows
how when doing composition, one has to examine the successive
(starter, candidate) pairs, and combine if a matching canonical
decomposition exists.

The original algorithm was, instead, iterating on canonical
decompositions, and, for each one, trying to match a sequence
of (starter, non-starter, ...).     This, however, does not produce
the same results as it is violating some implicit ordering
constraints in the Unicode tables.

E.g., when composing the following sequence of codepoints, the
original algorithm was picking:

  03B7 0313 0300 0345
  ^^^^        ^^^^
  1F74 0313     0345
  ^^^^             ^^^^
  1FC2 0313

and would stop at 1FC2 0313 as there is no decomposition matching
it.  The new algorithm, which follows the guidance of the pretty
figure 7, ends up doing:

  03B7 0313 0300 0345
  ^^^^ ^^^^
  1F20        0300 0345
  ^^^^        ^^^^
  1F22             0345
  ^^^^             ^^^^
  1F92

resulting in the correct 1F92.

-- 
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.