[Mono-devel-list] Why not to use gettext ()

Thu Jun 16 18:29:47 EDT 2005

Hi,

sorry I answer so late, but I didn't have much time this week.
I've been the one writing the stub for Locale, after we had quite a long 
discussion about this topic about 2 oder 2,5 years ago, but did not get to 
any result.
In fact back then I had proposed a solution (including a nearly finished 
implementation) based on a binary lookup mechanism, which IMHO would still 
be the best solution.

But lets start from the beginning (some of the things we already found out 
back then)
In mono we have basically three types of strings:
1. Exception-Texts (the biggest share). Will usually not be displayed to the 
user, translation is probably not critical. Perfomance may not be critical 
if only used in an exception that stops the app anyway.
2. Object-Descriptions (e.g. in Attributes). Will be displayed to the user 
with some probability. Perfomance is somewhat critical, translation should 
probably be made.
3. Strings used for user display. (Windows Forms, Globalization Info, 
others). Will be very likely displayed to the user, translation critical, 
performance critical.

And we have at least three options:
1. gettext
2. ResouceManager
3. Custom solution

Gettext has the advantage that - as long as no translation is done/needed 
and english is used - it offers the best performance and memory use, if a 
translation is needed it offers worst performance and worst memory use of 
all options.
ResourceManager based solution needs less tools/code than the other 
solutions, but has likely high memory use, abysmal initialization 
performance and relatively high runtime cost
Custom Solution using contant/enum lookup (see discussion back then): Most 
work/lowest reuse, fastest constant lookup, low memory in all situations 
(would even allow to rip out texts completely to trim down the 
classlibraries), would allow to share strings between libraries.

Numbers from back then were that we need probably 2000 strings in corlib and 
up to 10000 in all libraries.

ResourceManager would need to construct a hashtable using more that 500.000 
function calls (resulting in a high number of million or even billion 
operations) consuming about 1-2MB of ADDITIONAL RAM (additional to the 
strings, assuming identifier length of 10-20 chars). This would be managed 
ram, so the GC would need to run several times to bring that all up to gen 
3.
Gettext would be even worse (if using hash-lookup), but shines when not 
translated (will the exceptions get translated anyways?)

I personally don't like the ResourceManager in this context, because nearly 
everybody with a non-trivial app will get the full impact and together with 
the amount of strings this approach could easily consume several megs of 
RAM, which even small apps would have to take. Also this means that anybody 
using mono could not opt out of this.

mfg
A.Nahr
----- Original Message ----- 
From: "Kornél Pál" <kornelpal at hotmail.com>
To: "Miguel de Icaza" <miguel at ximian.com>; 
<mono-devel-list at lists.ximian.com>
Sent: Wednesday, June 15, 2005 10:52 PM
Subject: Re: [Mono-devel-list] Why not to use gettext ()

>> From: Miguel de Icaza
>>> So I think none of these issues are critical.
>>
>> None are critical, but there is no compelling reason to use something
>> different.
>
> I resubjected the message to separate this topic form my Locale.cs
> implementation that uses gettext () style resouce management.
>
> My reasons against using English texts as identifiers:
>
> 1. Every modified character in English text has to be modified in all of 
> the
> other language files as well. This prevents distributing satellite
> assemblies separately or using satellite assemblies that are not very up 
> to
> date.
>
> As translations will probably done by a different set of people than the
> code I can imagine that satellite assemblies will be released 
> independently
> from the code.
>
> And this will result in English text being used altought the text is 
> aready
> translated but a single forgotten character was added to the English text.
>
> While it an English text can be reworded or just corrected for a lot of
> reasons it's very unlikely to modify any of the string resource 
> identifiers.
>
> 2. Mono itself has a complete infrastructure for localization
> (ResourceManager, ResourceSet, ...) designed for identifier based resource
> files.
>
> 3. It is wasting of disk space as identifiers are stored in UTF-16 while
> texts are stored in UTF-8. So it requires less disk space to use relative
> short string identifiers than using whole sentences as identifiers.
>
> Note that returning the identifier parameter as the text is faster when
> using English messages because they are not looked up in a Hashtable but I
> think this does not affect overall performance very much and Hashtables 
> has
> to be used for any other languages anyway.
>
> Kornél
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>