[Mono-devel-list] String constants and localization

Andreas Nahr ClassDevelopment at A-SoftTech.com
Thu Jul 3 12:00:34 EDT 2003


Hi,

right now there is nearly no localization support in the Mono class libraries and all strings (mainly for errors) are hardcoded into the source-files.
If someone wants to localize these strings, it is nearly impossible if we do not add the required support now. Moreover coding all the strings into the source - files and compiling them adds to the size of the libraries, wich not only (probably) degrades general performance, but also will make it hard to make extra-small versions of Mono (e.g. for mobile devices).

As rationale just a simple example:
Say you want to localize the whole libraries and use a technique like currently used in the System Assembly. Something like:
  public static string GetText (string msg)
  {
// Do localization
   return LocalizedString;
  }
with a call like:
Print (FromWherever.GetText ("This text will be printed somewhere"));
and assuming that all strings in the class libraries are about 500KB (and I personally really like verbose error messages, so there might even be more - 1 MB?)

Using the scheme above you will need: the strings hardcoded in the sources, a hashtable to lookup with the original values and the replace values, per supported fallback level (e.g. en-US -> en) at least one additional set which makes 4 Times all the strings -> 2 MB - 4 MB RAM just wasted the first time you try to access a single string of a few bytes size.
If you write a small tool it is guaranteed to use megs of RAM
Moreover we are not just talking about memory, but the required lookups will need an possibly considerable amount of time.

I would therefore suggest the following:

Replace all hard-coded strings with an enumeration value. This gets compiled into an int in IL and allows much more efficient lookups ( O(1) instead of X * O (ln n) for the previous solution.
The strings are stored in an external XML file and are compiled through a simple program into a binary data format (using an integer lookup index)
This reduces (compiled) code size and RAM usage and improves speed cosiderably, also is much more extensible and easily translatable.

 public static string GetString (MonoString stringResource)
 {
  return StringForMonoStringEnumValue (MonoString);
 }

 public enum MonoString
 {
  GenericENullNotAllowed,
  GenericEValueMustBeLarger0,
  GenericEValueMustBeLargerOrEqual0
 }

with a call like
Print (MS.GetString (MonoString.GenericENullNotAllowed));

All strings

The Advantages are:
* Smaller Assemblies (probably leads to faster runtime performance in Jit also because Jiting a constant int should be faster than Jiting a constant string)
* Far faster String lookups
* Far less memory usage (ideally we practicly wouldn't need ANY memory overhead)
* More flexible solution (allows simple changes from entirely removing all strings for extra-small versions, to fully cached versions)
* Even in worst case with full caching of all strings memory usage is is only a fraction of what it would be now
* Easily extensible
* Data would be created outside from an XML file, which e.g. is easily translatable
* Comparable speeds when using localization or not

The Disadvantages:
* Need to change existing code (for most strings this is true anyway if localization is needed)
* More programing work needed:
    To add a new string you would have to:
        Add a code line (that is the code line you would also have to write normally
        Add an enumeration member to MS.Strings (additional)
        Add the key/string value to an XML file (additional)
* Compiling gets a (little) more complicated because the string resources file has to be (auto)generated in addition with each compile

I love like to hear some suggestions and comments.
If everybody is positive I will start to add this to an assembly (probably System) to try out how it works

Open questions are:
* Should every class get its own internal version of the localization support (would help to not change the public signature of the assembly, but possibly values between assemblies would have to be stored multiple times (things like GenericErrorNullNotAllowed will probably be needed in every assembly)
* Should data files be added to the assemblies as resources or stay independently (independently would allow easy switch of language by simply overwriting the file, but also adds the risk of invalid/ tampered files)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20030703/9347f758/attachment.html 


More information about the Mono-devel-list mailing list