[Mono-devel-list] Patch idea for previous message

Atsushi Eno atsushi at ximian.com
Tue Jun 7 23:23:02 EDT 2005


Hi,

Kornél Pál wrote:
>>> String.Compare ("\u00E6\u0304", "\u01E3")
> .NET 1.1 returns -1

BTW "\u01E3".Normalize(NormalizationForm.NFD) is "\u00E6\u0304"
in .NET 2.0 i.e. they are canonically equivalent.

>> Oops, I mean String.Compare ("A\u0308\u0301", "\u1EA6") .
> .NET 1.1 returns 0
> 
> I have no idea whether these characters exists in real life. Collations
> should be based on the rules of an existing languge and it's quite 
> undefined
> how characters not in the language should be sorted. I think this function
> is intended to sorting human readable text and not to match case 
> insensitive
> file names, user names, element and attribute names, ... And this is why
> OrdinalIgnoreCase was introduced in .NET 2.0
> 
> Windows XP displays "A\u0308\u0301" as a compound charcter and a separated
> accent but both "A\u0308" and "A\u0301" display a single compund character
> so this may not be a bug but I'm not experienced in Unicode enough to tell
> whether Windows XP should display "A\u0308\u0301" as a single compound
> character or .NET should not treat it as a single character. And of course
> it is possible that both of these things are allowed by Unicode.

Note that "culture sensitive" comparison never means that it can
treat a pair of strings as equal where one (or both) of them is
not "real" string in the culture, unless any characters in the
string is ignorable. You will get different result if \u0301 is
\u0302.

It happens because Windows has no concept of "blocking" combining
marks and which just sums diacritical weights up ignoring overflow.
It is design failure of Windows.

Am going to introduce that crappy comparison into mono though :-/

You can check that java.text.Collator in JDK never regards them
as equal.

Atsushi Eno



More information about the Mono-devel-list mailing list