[Mono-devel-list] How to handle huge string collation resources?

Atsushi Eno atsushi at ximian.com
Tue Jun 21 15:26:13 EDT 2005


Hello,

Finally I got my managed collation engine working, though it is far
from complete form I aim and it is mostly conceptual for now (it
does not handle many things, performs so bad). For now it handles
ASCII case sensitivity, large part of CompareOptions flags, large
part of diacritical mark processing.

Here is the steps to make it available:

	1. apply attached patch against mcs/class/corlib.
	2. go to mcs/class/corlib/Mono.Globalization.Unicode
	3. run "make". It will automatically downloads some files
	   from some sites. For now without this step the build
	   b0rks.
	4. make corlib as usual.
	5. set MONO_USE_MANAGED_COLLATION environment variable
	   as "yes".

Here is a serious problem. In step 3 it makes 1.2MB of a C#
source file that results in 500KB increase of mscorlib.dll.
It could be made as C header i.e. runtime source, like existing
culture-info-table.h. But it is still huge.
And for about 200KB of data, they are just for CJK cultures
so they won't be used unless we use those cultures to handle
culture-sensitive CJK collation. That is mostly waste of memory.

One possible solution idea is to create different assembly and
loads the tables like:

	- CompareInfo or whatever holds those tables as static
	  variables.
	- If the variable is null, then it tries to load the
	  "internally stored table" via runtime icall_1. However
	  at this stage it returns null, since nothing is stored.
	- Then, CompareInfo or whatever loads "table-only assembly"
	  via reflection and loads table into memory, and
	  then invokes an icall_2 that sets the table as runtime
	  internal table.
	- Next time CompareInfo tries to fill the table, icall_1
	  will return the table.

In fact the same discussion also applies to string Normalization
tables (to support String.Normalize() introduced in .NET 2.0).

Any good ideas for this problem?

Thanks,
Atsushi Eno
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: managed-collation-20050621.diff
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050622/a9a0cee5/attachment.pl 


More information about the Mono-devel-list mailing list