[Mono-devel-list] The first (attempt to checkin) managed collation patch

Thu Jul 21 14:18:48 EDT 2005

Hello,

> 180 KB is reasonable for now. It would be good to be able to
> build a configuration that allows just simple us-ascii collation
> for embedded systems that really care.
> We can look at optimizing the size when the speed issues are sorted out.

Well, 180KB is about only one file. Though today I optimized that
180 KB file to 120 KB, there are additional files that are up to
220KB (all *.bin files in http://monkey.workarea.jp/tmp/20050720/ )

>>The corresponding code is already in svn,
>>mcs/class/corlib/Mono.Globalization.Unicode/MSCompatUnicodeTable.cs.
>>It creates a BinaryReader instance for each manifest resource stream,
>>and for byte arrays it does Read(array, 0, size).
> 
> This is fine while you do testing, but for the release it will need to
> be changed to not copy the data in managed arrays. Just make sure access to
> the level*, categories, etc fields are encapsulated so you can easily change
> them later to be unamanaged byte and ushort pointers.

The architecture is one of the first class matter for me now, so I'd
just introduce such changes from now on.

So, you mean, I should avoid managed resource but acquire those
pointers from the runtime via icall, right?

>>>It might be nice to put the files in /usr/share. A few things we win by
>>>doing that:
>>
>>How can we get the precise file location, especially when we specify
>>different GAC to reference mscorlib?
> 
> Don't worry about that, the runtime will load the file for you, this is the last
> of the issues. It would be good if the file contained a version id that you can
> check for consisntency, so please add that.

Does this mean that we already have such functionality in the runtime?

> What I'd like is a small data file embedded inside the mono binary that
> contains support for the most common locales, so it's always available.
> The rest could be loaded at runtime, if it's installed. How hard would it be
> for you code to deal with this case?

Actually, except for CJK mapping for zh-CHS/zh-CHT/ja/ko cultures,
"locale dependent" mapping data is tiny, since the largest 120KB
file is for InvariantCulture.
The latest code already ignores special CJK processing when the
corresponding CJK table was not available, and they are not loaded
unless the corresponding collator instance is created.

I can provide minimum ASCII support resource. But I'm not sure what
this exactly means. For example, does it always ignore
CompareOptions.IgnoreWidth and CompareOptions.IgnoreKanaType?
(There is no full-width characters, neither is Kana, in ASCII.)

Atsushi Eno