[Mono-devel-list] Why not to use gettext ()
Andreas Nahr
ClassDevelopment at A-SoftTech.com
Thu Jun 16 18:29:47 EDT 2005
Hi,
sorry I answer so late, but I didn't have much time this week.
I've been the one writing the stub for Locale, after we had quite a long
discussion about this topic about 2 oder 2,5 years ago, but did not get to
any result.
In fact back then I had proposed a solution (including a nearly finished
implementation) based on a binary lookup mechanism, which IMHO would still
be the best solution.
But lets start from the beginning (some of the things we already found out
back then)
In mono we have basically three types of strings:
1. Exception-Texts (the biggest share). Will usually not be displayed to the
user, translation is probably not critical. Perfomance may not be critical
if only used in an exception that stops the app anyway.
2. Object-Descriptions (e.g. in Attributes). Will be displayed to the user
with some probability. Perfomance is somewhat critical, translation should
probably be made.
3. Strings used for user display. (Windows Forms, Globalization Info,
others). Will be very likely displayed to the user, translation critical,
performance critical.
And we have at least three options:
1. gettext
2. ResouceManager
3. Custom solution
Gettext has the advantage that - as long as no translation is done/needed
and english is used - it offers the best performance and memory use, if a
translation is needed it offers worst performance and worst memory use of
all options.
ResourceManager based solution needs less tools/code than the other
solutions, but has likely high memory use, abysmal initialization
performance and relatively high runtime cost
Custom Solution using contant/enum lookup (see discussion back then): Most
work/lowest reuse, fastest constant lookup, low memory in all situations
(would even allow to rip out texts completely to trim down the
classlibraries), would allow to share strings between libraries.
Numbers from back then were that we need probably 2000 strings in corlib and
up to 10000 in all libraries.
ResourceManager would need to construct a hashtable using more that 500.000
function calls (resulting in a high number of million or even billion
operations) consuming about 1-2MB of ADDITIONAL RAM (additional to the
strings, assuming identifier length of 10-20 chars). This would be managed
ram, so the GC would need to run several times to bring that all up to gen
3.
Gettext would be even worse (if using hash-lookup), but shines when not
translated (will the exceptions get translated anyways?)
I personally don't like the ResourceManager in this context, because nearly
everybody with a non-trivial app will get the full impact and together with
the amount of strings this approach could easily consume several megs of
RAM, which even small apps would have to take. Also this means that anybody
using mono could not opt out of this.
mfg
A.Nahr
----- Original Message -----
From: "Kornél Pál" <kornelpal at hotmail.com>
To: "Miguel de Icaza" <miguel at ximian.com>;
<mono-devel-list at lists.ximian.com>
Sent: Wednesday, June 15, 2005 10:52 PM
Subject: Re: [Mono-devel-list] Why not to use gettext ()
>> From: Miguel de Icaza
>>> So I think none of these issues are critical.
>>
>> None are critical, but there is no compelling reason to use something
>> different.
>
> I resubjected the message to separate this topic form my Locale.cs
> implementation that uses gettext () style resouce management.
>
> My reasons against using English texts as identifiers:
>
> 1. Every modified character in English text has to be modified in all of
> the
> other language files as well. This prevents distributing satellite
> assemblies separately or using satellite assemblies that are not very up
> to
> date.
>
> As translations will probably done by a different set of people than the
> code I can imagine that satellite assemblies will be released
> independently
> from the code.
>
> And this will result in English text being used altought the text is
> aready
> translated but a single forgotten character was added to the English text.
>
> While it an English text can be reworded or just corrected for a lot of
> reasons it's very unlikely to modify any of the string resource
> identifiers.
>
> 2. Mono itself has a complete infrastructure for localization
> (ResourceManager, ResourceSet, ...) designed for identifier based resource
> files.
>
> 3. It is wasting of disk space as identifiers are stored in UTF-16 while
> texts are stored in UTF-8. So it requires less disk space to use relative
> short string identifiers than using whole sentences as identifiers.
>
> Note that returning the identifier parameter as the text is faster when
> using English messages because they are not looked up in a Hashtable but I
> think this does not affect overall performance very much and Hashtables
> has
> to be used for any other languages anyway.
>
> Kornél
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
More information about the Mono-devel-list
mailing list