[Mono-devel-list] String constants and localization

Mon Jul 14 04:39:15 EDT 2003

Hi,

I've read your answer, but it seems that at quite some points you overlooked
advantages (maybe I'm also wrong with any of these, but I don't think so).
So I added some addidional comments to it

> > right now there is nearly no localization support in the Mono class
> > libraries and all strings (mainly for errors) are hardcoded into the
> > source-files.
>
> Thanks for this proposal, I have some comments in this email about the
> specifics of the proposal.
>
> Initially, I wanted to use it, but it meant that we would have to:
>
> * deviate from the standard practice (something I would not
>   mind, if there were strong enough arguments for)

The basic arguments are:
* Much faster
* Much smaller Assembly size (see below)
* Much smaller RAM need
* More safe when programing because of compile errors for e.g. typos

> * Create and maintain a new infrastructure for localization.
>   Not bad per-se, but it would minimize the reuse of existing
>   knowledge that people might acquire or obtain from the NET.
>
> * Reimplement the chunks we already have for handling resources
>   in corlib to cope with all the CultureInfo bits.

This is not neccesarily true. The sample implementation I did is using
System.Resources namespace to get it's localized data internally. More
specifically it ALLOWS to use it if you want, but does not force you to if
there is a better solution somebody wants to implement. And you can change
this solution at any time without having to change anything in the sources.
You could even use the now-used string tables and still save a litte memory
(see below: Strings as 16bit).

> I also want to avoid loading all the strings in memory, but it is
> possible to do so:
>
> > with a call like
> > Print (MS.GetString (MonoString.GenericENullNotAllowed));
>
> We should use the Resource infrastructure in .NET here: there are many
> issues related to loading the proper assembly given the selected
> CultureInfo, and the code is mostly implemented.
>
> The file format for resources allows for this case: it is possible to
> fetch the information without having to load all the strings to
> localize.
>
> What we need to do is improve the implementation of
> ResourceSet.GetObject.  Basically we should define an internal method in
> the ResourceReader that can do lookups based on strings, without having
> to use the resource enumerator.

OK - but IMHO your solution just has two flaws:
* Reimplement the chunks we already have for handling resources
   in corlib to cope with all the CultureInfo bits (which is exactly what
you wanted to avoid above)
* Sooner or later you will always come to the GetResourceStream function,
which actually provides a memory stream, which is: loading all things into
memory (and if you want to provide a complete second infrastructure for
strings, then the work that has to be done would be IMHO FAR more work than
anything you might have to do to implement something like my suggested
solution)

> We already have an API that can load a string from an index, so the only
> thing we have to do is perform a binary search on the strings in the
> file (like Monodoc does now for its help).

Sorry but IMHO this it total overkill. You want to perform a binary search
DIRECTLY on a file containing an estimated 200KB string values EVERY time we
do a string lookup. Are you sure this won't totally fry your HDD. And what
about if the assembly we are accessing is on e.g. a network share that has
slow access times?
IMHO you will need to load string index that into memory in any case to
perform a binary search (or probably ANY other search)

As I already said: Even with a binary search you will just get search speeds
of O(ln n) while my solution would get O(1) and that is without taking into
account that you have to do the binary search on STRINGS, not on int's

> > The Advantages are:
> > * Smaller Assemblies (probably leads to faster runtime performance in
> > Jit also because Jiting a constant int should be faster than Jiting a
> > constant string)
>
> Well, the space that you save on strings, say the string:
>
> "Null not provided"
>
> Would be encoded into an enumeration:
>
> Null_Not_Provided
>
> And that would end up in the metadata as well, so the change in size is
> only half the size (strings are stored in 16-bit ucs-2 encoding).

I didn't even think about savings from not-having-to store as unicode ;) -
that even adds to data savings :)

I think you are overlooking a LOT of things here:

First example:
1. Mono now: Key = "Null not provided", Translation = "Null not provided"
2. Suggestion: Key = Null_Not_Provided, Translation = "Null not provided"

In that case key equals about the size of Translation. As you said we only
need half the size for the enum value. So we need:
1. Memory: SuggestionKey * 2 * 2 (we also need it in the lookup table) +
Translation
2. Memory: SuggestionKey * 1 + Translation
SAVING is: SuggestionKey * 3
If you want to store the string somewhere to not have to hardcode it into
each individual class to prevent e.g. spelling errors (seems to be what MS
does) this even grows to a saving of:
SuggestionKey * 4
with inlining? active at compiling to a saving of:
SuggestionKey * 6

Second example (IMHO somewhere about what it could be in reality):
1. Mono now: Key = "Null not provided because we have never provided null",
Translation = "Null not provided because we have never provided null"
2. Suggestion: Key = Null_Never_Provided, Translation = "Null not provided
because we have never provided null"

This should show the savings at a SuggestionKeySize about half of the size
of the string itself (I would estimate this to be a good total average):
1. Memory: SuggestionKey * 2 * 2 (we also need it in the lookup table) * 2
(double the size) + Translation
2. Memory: SuggestionKey * 1 + Translation
SAVING is: SuggestionKey * 7 !!!!!!
for the other options described above it would even save:
SuggestionKey * 8 or
SuggestionKey * 12

A saving of SuggestionKey * 7 (with the settings of the second example)
would in reality mean a saving of about 70% TOTAL size (including the
translation)
In the first example we would save about 60% size

Also for extremely memory limited devices you probably can remove the
enumeration completely after compiling (all enum members are compiled into
int's), which increases savings even more.

All that I stated now are just savings in assembly size. At runtime the
savings are EVEN HIGHER!
At runtime Mono should never need to access the enumeration keys (everything
is int now) so the need for RAM is probably about 80% LESS than the current
solution!!!!
That all with more programing safety and much higher access speeds at much
lower CPU usage.

Andreas