[Mono-dev] Replacing/Removing I18N

Andreas Nahr ClassDevelopment at A-SoftTech.com
Mon Oct 9 17:43:15 EDT 2006


Hi *,

I've been looking into this topic for quite some time now. However my 
knowledge (especially of the internals of the VM) is potentially not enough, 
so please correct me if I make wrong statements or assumptions:

Current situation:
I18N is located in multiple separate assemblies that contain encoding 
classes that are autogenerated. The single-byte encodings (my current focus) 
use a potentially big CASE-Structure to compute the output.

Problems:
* I18N is loaded through Reflection-Mechanisms, which triggers a HUGE amount 
of other code, so Mono loads/jits and instatiates a considerable portion of 
the entire corelib when I18N is used.
* Initializing the I18N is slow and creates a huge amount of objects.
* (At least on windows) I18N is basically ALWAYS used, because a single 
access to Console (which even the corelib contains a lot for debug output) 
is enough to instantiate it. I'm assuming this doesn't happen on Linux?
* The I18N classes themselves consist of considerable amount of IL-Code 
making them slow to JIT and using a considerable amount of memory to hold 
the IL and produced native code.
* The I18N classes are potentially slow (especially when assuming that 
optimization is probably sub-optimal because of the often huge IL-size).
* Pressure on the GC because of big number of generated objects.

Considerations for change:
* Data must not be in private memory but shared as much as possible 
(currently most is shareable)
* If possible avoid internal-calls and other direct runtime-support 
(currently does not need any)

Goals:
* Drop additional libraries
* Use less memory
* Make instantiation faster
* Try to limit the needed JITted code to a reasonable amount
* Possibly make encoding faster

I started with the single-byte encodings, because these are the most simple 
ones and the most numerous:

The idea is to only use a single class for all single-byte encodings that is 
instantiated with lookuptables suited for the relevant codepage.
The lookuptables would be binary data that should be shared. A solution 
seems to be embedding these as resource-files into corelib and then getting 
a pointer to the binary data through Assembly.GetManifestResourceStream. The 
data would be uncompressed and complete for maximum performance and minimal 
code requirement (about 65kb per encoding). The data should already be 
memmaped is embedded as a resource (how big would be one of the pages??).

Some ballpark figures as rationale (See the attached text-files which show 
an app using only I18N, one with String and Globalization and one without 
all):
For String and Globalization Mono uses about 15kb memory for code and 7kb 
for data in 246 objects.
With I18N mono uses (Codepage 1250) > 140kb for code and 81kb for private 
data in 1800 objects (potentially putting pressure on the GC)
So the 65kb (how much could be saved because of memmapping - ALL single-byte 
encodings do not use a part of the unicode-range?) we need for an encoding 
is actually far less than the current overhead for I18N (obviously that may 
not hold true if a lot of different encodings are used). And obviously the 
lookup would be faster and we would not need any time spent in the JIT.

Open questions:
* Creating the binary data should be simple when generating from a .Net VM. 
Would it be allowed to gernerate directly from MS.Net? From Portable.Net? 
(obviously from Mono is no problem, but would not allow to ADD data)
* Size of a memmaped page?
* Growth in *file*size for corelib acceptable? Altogether probably 5-10MB
* Other sideeffects possible?


Greetings
Andreas 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: With String&Globalization.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20061009/5b2b6da7/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: With I18N.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20061009/5b2b6da7/attachment-0001.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Without all.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20061009/5b2b6da7/attachment-0002.txt 


More information about the Mono-devel-list mailing list