[Mono-list] What affects collation order in a given culture?
clockworksaint at gmail.com
Mon Jun 3 12:22:36 UTC 2013
I should preface this by saying that I don't know Mandarin, so I'm
working rather blind. I want to make sure that a list of
filenames/track names/artists/albums is sorted correctly for Chinese
users. My understanding is that the most commonly expected sort order
is based on the pinyin transcription of the characters.
I've been investigating how strings are sorted in various cultures in
.NET, and I've found that I get different results in Mono from .NET
for the "zh-Hans" culture. From what I've read, I think this should
just be another name for the "zh-CHS" culture, and I should get the
same results for both, but Mono gives me different results.
Here's a link to my short test program:
Here's my output on .NET:
On .NET, in both the zh-Hans and the zh-CHS culture, the example
strings are sorted in an order consistent with their pinyin
transcriptions, which is what I expect.
Here's my output on mono 3.0.10, running on Ubuntu:
This time, I get the same result as for .NET with zh-CHS. However, for
zh-Hans, I get a different order. It *looks* like they're just being
ordered by unicode code-point. I am surprised that I see a different
sort order for zh-CHS from zh-Hans on the same setup, and I'm
surprised at the difference from .NET.
I tried another attempt with an older version of Mono, 2.10.8, as
distributed with Ubuntu:
This gives me the expected sort order for both zh-Hans and zh-CHS, but
it also reports the culture name as being simply "Chinese" in each
case, instead of the expected "Chinese (Simplified) Legacy" and
Finally, I've summarized the results in a table:
Runtime Requested culture Culture display name
Collation order for
.NET 4.0 invariant Invariant Language (Invariant
.NET 4.0 zh-CHS Chinese (Simplified) Legacy
.NET 4.0 zh-Hans Chinese (Simplified)
Mono 2.8.10 invariant Invariant Language (Invariant
Mono 2.8.10 zh-CHS Chinese
Mono 2.8.10 zh-Hans Chinese
Mono 3.0.10 invariant Invariant Language (Invariant
Mono 3.0.10 zh-CHS Chinese (Simplified) Legacy
Mono 3.0.10 zh-Hans Chinese (Simplified)
(In case the formatting is screwed up in email, here it is monospaced:
If you're still following me (thankyou!) I have a few questions:
1. Am I correct to expect that zh-CHS and zh-Hans should have the same
collation behaviour as each other?
2. Am I correct to expect that zh-Hans will have a pinyin-based collation order?
3. What systems/libraries are involved here? Does Mono depend on some
system library for its collation order, or does it implement this
itself? Are there particular configuration options I need to be aware
of if I am compiling mono myself?
4. How does Mono pick the default culture on its various platforms?
Will it ever pick 'zh-Hans' as the default culture? Or would it always
prefer 'zh-CHS'? I'm worried that if it defaults to 'zh-Hans' for some
Chinese users they will get a surprising and unhelpful sort order.
More information about the Mono-list