[Mono-list] unicode trouble
Mon, 09 Feb 2004 07:16:52 -0500
On Mon, 2004-02-09 at 02:22, gabor wrote:
> i just can't understand why the designers of dotnet didn't look at the unicode
> standards. i can understand that java has this problem, but java is much older
> than dotnet.
> maybe it's because winapi uses 16-bit characters?
I imagine it's due to a memory trade-off. The easiest way for the
programmer do deal with things would be to just use UCS-32 for all
Unicode strings. You wouldn't have to worry about code pairs or
anything else like that.
It would also mean that all strings would require 32-bits for each
character, which would eat up *lots* of memory for all strings. The
most common code points -- US, Europe, Asia -- all easily fit within
16-bits, *by design*. So the designers had a choice: use 32-bit
characters internally everywhere, forcing nearly all users to "waste"
16-24 bits/character, or 1/2 - 3/4 of all memory dedicated to strings,
or use 16-bit characters internally, which would suite the needs of most
current users (probably > 80%), while only "wasting" 8-bits/character
for the US and parts of Europe, a minority of the world population.
16-bit characters were considered to be a decent trade-off, I would
An alternative approach could have been for the string to do on-the-fly
conversion between Unicode UCS-32 code-points and an internal
representation, such as UTF-16. This would imply that System.Char is a
32-bit structure, and that System.String wouldn't conceptually store a
char array, but rather some implementation-defined encoding of the
char array, to save memory. This could be argued to complicate
things, but I don't know why else this strategy wouldn't work.