[Mono-list] String comparisons slow

Jonathan Pryor jonpryor at vt.edu
Wed Jul 21 22:22:04 EDT 2010


On Wed, 2010-07-21 at 15:30 -0400, David S wrote:
> Ok. Now I'm confused. How come "CurrentCulture" for US/ENG doesn't
> just run the Ordinal?

This may be hard to believe, but en-US (and en-UK) are more than just
ASCII.  Consider the word rèsumè, an English loan word from French.  Or
the "long s" [0] which, while not commonly used anymore, was used in no
less than the US Bill of Rights...

So, consider è: there are (at least) two ways to express it:

  - Precomposed as U+00E8
  - With combining chars as U+0065 U+0300.

Presumably when sorting entries, you would like \u00e8 to sort with
\u0065\u0300, not...elsewhere [1]; at least, this is usually what users
expect, as they (hopefully) don't know or care about ASCII, they just
want to use their data.

The other reason has to do with Windows collation [2] (and thus may or
may not matter for Mono, and certainly won't matter for Silverlight in
which the underlying platform's collation support is used), as the
default collation table contains collation data for 70 languages [3],
and (of course!) English uses the default table, so it gets a lot of
this collation information "for free".

 - Jon

[0] http://en.wikipedia.org/wiki/Long_s
[1] And "elsewhere" varies a lot; it could be intermingled with some 
    other character, placed after all other characters, placed before 
    all other characters...
[2] See also the years of articles written by Michael Kaplan, in which 
    the default table is frequently mentioned: 
    http://blogs.msdn.com/b/michkap/
[3] http://blogs.msdn.com/b/michkap/archive/2005/04/08/406413.aspx




More information about the Mono-list mailing list