[Mono-list] A couple of questions...
A Rafael D Teixeira
Tue, 14 Aug 2001 15:10:00 -0300
>Miguel de Icaza <email@example.com> wrote:
>>2. I am fairly certain that the native type for "string" in Microsoft's
>>.NET will be UNICODE. Will this be the case with Mono or will we opt for
>>ANSI strings to be the native "string" type?
>We could support both. For now, we will use 16-bit encoded Unicode
>(that is what Microsoft uses). In the future, we might want to store
>our runtime strings as UTF-8 and provide conversion operators when
>needed (for example for mobile devices).
>But in the particular case of mobile devices, it would be good to
>have the EMCA people allow for both the current encoding and UTF-8 to
>minimize the size of executables (consider again, palm-like devices,
>where you do not have a lot of room).
In truth it´s more complex than that: the compiler/runtime should "OPTIMIZE"
the strings, choosing for each one what is the encoding that generates the
smallest representation, and then tagging it with which encoding it used.
For example: if you have a japanese version of a software, that uses mostly
kanji 'characters', probably UTF-16 will have a shorter representation (2
and 4 bytes per kanji symbol) than UTF-8 (3,4,5 and even 6 bytes per
symbol). That holds true for any device, memory constrained or not. But in
the same application some messages may be in English and therefore those
will be best represented as UTF-8.
I would keep the number of choices small, just UTF-8 and UTF-16. If we
extend the concept, probably we would end up with many encoders/decoders
(ANSI, UTF-32, EBCDIC...) having to be in the minimal runtime, which is
needed by small devices, increasing size but not adding enough benefits.
I´m just guessing, but I think Microsoft is using UCS-2, but probably they
will support Unicode 3.1 via UTF-16 soon, because it's needed for
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp