[Mono-bugs] [Bug 480152] string.Normalize() frequently produces incorrect output
bugzilla_noreply at novell.com
bugzilla_noreply at novell.com
Tue Apr 20 16:28:56 EDT 2010
http://bugzilla.novell.com/show_bug.cgi?id=480152
http://bugzilla.novell.com/show_bug.cgi?id=480152#c34
--- Comment #34 from Damien Diederen <dd at crosstwine.com> 2010-04-20 20:28:55 UTC ---
Created an attachment (id=355741)
--> (http://bugzilla.novell.com/attachment.cgi?id=355741)
Normalization.cs: Follow the spec when checking composition pairs.
Figure 7 in section 1.3 of http://unicode.org/reports/tr15/ shows
how when doing composition, one has to examine the successive
(starter, candidate) pairs, and combine if a matching canonical
decomposition exists.
The original algorithm was, instead, iterating on canonical
decompositions, and, for each one, trying to match a sequence
of (starter, non-starter, ...). This, however, does not produce
the same results as it is violating some implicit ordering
constraints in the Unicode tables.
E.g., when composing the following sequence of codepoints, the
original algorithm was picking:
03B7 0313 0300 0345
^^^^ ^^^^
1F74 0313 0345
^^^^ ^^^^
1FC2 0313
and would stop at 1FC2 0313 as there is no decomposition matching
it. The new algorithm, which follows the guidance of the pretty
figure 7, ends up doing:
03B7 0313 0300 0345
^^^^ ^^^^
1F20 0300 0345
^^^^ ^^^^
1F22 0345
^^^^ ^^^^
1F92
resulting in the correct 1F92.
--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.
More information about the mono-bugs
mailing list