[Mono-dev] Replacing/Removing I18N
Andreas Nahr
ClassDevelopment at A-SoftTech.com
Mon Oct 9 17:43:15 EDT 2006
Hi *,
I've been looking into this topic for quite some time now. However my
knowledge (especially of the internals of the VM) is potentially not enough,
so please correct me if I make wrong statements or assumptions:
Current situation:
I18N is located in multiple separate assemblies that contain encoding
classes that are autogenerated. The single-byte encodings (my current focus)
use a potentially big CASE-Structure to compute the output.
Problems:
* I18N is loaded through Reflection-Mechanisms, which triggers a HUGE amount
of other code, so Mono loads/jits and instatiates a considerable portion of
the entire corelib when I18N is used.
* Initializing the I18N is slow and creates a huge amount of objects.
* (At least on windows) I18N is basically ALWAYS used, because a single
access to Console (which even the corelib contains a lot for debug output)
is enough to instantiate it. I'm assuming this doesn't happen on Linux?
* The I18N classes themselves consist of considerable amount of IL-Code
making them slow to JIT and using a considerable amount of memory to hold
the IL and produced native code.
* The I18N classes are potentially slow (especially when assuming that
optimization is probably sub-optimal because of the often huge IL-size).
* Pressure on the GC because of big number of generated objects.
Considerations for change:
* Data must not be in private memory but shared as much as possible
(currently most is shareable)
* If possible avoid internal-calls and other direct runtime-support
(currently does not need any)
Goals:
* Drop additional libraries
* Use less memory
* Make instantiation faster
* Try to limit the needed JITted code to a reasonable amount
* Possibly make encoding faster
I started with the single-byte encodings, because these are the most simple
ones and the most numerous:
The idea is to only use a single class for all single-byte encodings that is
instantiated with lookuptables suited for the relevant codepage.
The lookuptables would be binary data that should be shared. A solution
seems to be embedding these as resource-files into corelib and then getting
a pointer to the binary data through Assembly.GetManifestResourceStream. The
data would be uncompressed and complete for maximum performance and minimal
code requirement (about 65kb per encoding). The data should already be
memmaped is embedded as a resource (how big would be one of the pages??).
Some ballpark figures as rationale (See the attached text-files which show
an app using only I18N, one with String and Globalization and one without
all):
For String and Globalization Mono uses about 15kb memory for code and 7kb
for data in 246 objects.
With I18N mono uses (Codepage 1250) > 140kb for code and 81kb for private
data in 1800 objects (potentially putting pressure on the GC)
So the 65kb (how much could be saved because of memmapping - ALL single-byte
encodings do not use a part of the unicode-range?) we need for an encoding
is actually far less than the current overhead for I18N (obviously that may
not hold true if a lot of different encodings are used). And obviously the
lookup would be faster and we would not need any time spent in the JIT.
Open questions:
* Creating the binary data should be simple when generating from a .Net VM.
Would it be allowed to gernerate directly from MS.Net? From Portable.Net?
(obviously from Mono is no problem, but would not allow to ADD data)
* Size of a memmaped page?
* Growth in *file*size for corelib acceptable? Altogether probably 5-10MB
* Other sideeffects possible?
Greetings
Andreas
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: With String&Globalization.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20061009/5b2b6da7/attachment.txt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: With I18N.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20061009/5b2b6da7/attachment-0001.txt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Without all.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20061009/5b2b6da7/attachment-0002.txt
More information about the Mono-devel-list
mailing list