[Mono-devel-list] String constants and localization

Mon Jul 14 18:40:21 EDT 2003

Hello,

> By the time Mono class libraries are complete they will probably contain
> about 1-2MB of hardcoded strings.
> It is not possible to do ANY optimization or improvement on that other than
> removing these.
> For a normal PC 1-2MB is today negligible
> For a memory limited device (e.g. a Palm or a PocketPC or a Cell-Phone) 1-2
> MB permanently lost is HUGE (ok maybe not for a PocketPC ;)

Understood.  But given that our user population is mostly desktop and
server users, this is not really a problem.

For an embedded device, you will likely do other cuts:

	* Remove unrequired classes, in fact, tuning it to the
	  particular classes that you will use, removing everything
	  you wont use.

	* Remove chunks of the runtime that are not needed.

	* Modify the images to use utf-8 encoding.

	* If you are at a premium, you could even gzip chunks of the
	  image.

And if you go down that path, you might as well modify the compiler to
cope specially with any strings "labeled" for translation, for example:

	throw new Exception (Resource.GetText("Hello World"));

The compiler could replace the above with:

	throw new Exception (Resource.GetTextFromID (765));

Or we could even add support to translate directly into a specific
language:

	mcs -adapt:myLocale.File.spanish sample.cs

which would do:

	throw new Exception (Resource.GetText ("Hola Mundo"));

> They are obviously having two different code bases.
> .Net framework uses a short string identifier to identify strings (e.g.
> GetString("Get_Error_NotValid"))
> .Net compact framework seems to use an Int value to identify strings (e.g.
> GetString(45) )

Today Mono is used by the audience that uses the .NET Framework, and we
have not yet been approached by anyone doing a more compact framework,
but as I said, am willing to add specific changes to the compiler to
address these memory usage patterns.

> The absolute minimum size you can archive (removing all strings, assumed 1MB
> strings without changing the code base):
> Mono: 1000KB (cannot remove without removing every single string)
> MS: estimated 250KB (assuming the identifier is average 1/4 of the string
> itself)
> MS Compact: about 40KB
> Suggestion: about 40KB (assuming you remove the enumeration after compiling)

We can not remove the enumeration after compiling, as I said before, you
can always do:

	object a = MyEnum.Value;

And given the compiler structure, it is a lot easier to implement what I
described before than removing an enum after it has bene used.

This is a lot more hackish than my proposal above.

> The minimum size you can archive when using localization (one localized
> resource set, assumed 1MB strings):
> Mono: 3000KB
> MS: estimated 1500KB (assuming the identifier is average 1/4 of the string
> itself)
> MS Compact: about 1040KB
> Suggestion: about 1290KB (assuming the enumeration entry is average 1/4 of
> the string itself) (assuming you remove the enumeration after compiling)
> Suggestion: about 1040KB (assuming you remove the enumeration after
> compiling)

Not quite correct.  Mono would include 1M of strings that are already in
one language (english or as I suggested previously, anyone of your
liking).

We are not likely going to be able to remove the enumeration entries. 

> RAM need at runtime when using localization for getting ONE/The first entry
> (one localized resource set, full memory cache, assumed 1MB strings):
> Mono: 2000KB (Hashtable implementation)

Wrong.

As I repeated a number of times: you do *not* need to use a Hashtable. 
In fact, Microsoft .NET does *not* use a Hashtable, they use an
"internal" method that maps strings to their index, using a binary
search.

Also, the strings do *not* have to be loaded, until they are requested
(and in fact, if they come from the file, they are mmapped, so the
kernel wont load the page, unless needed, and can always remove the page
from memory, as it has backing store associated with it.  So it is *not*
an issue.

> Mono would use the most memory of all implementations (For the compiled
> assembly as well as RAM for execution)

It would use the most memory if you *insist* on the hashtable
implementation, as opposed to the binary-search implementation. 

If you do the *right* thing (as I pointed out before), the memory use is
minimal, and does not even show up.

> The memory need of the Mono implementation will never allow Mono to run on a
> memory limited device. And there is NO way to do any optimization on the
> assembly size.

Yes, there is, I listed various above.

Miguel