[Mono-list] unicode trouble

Fergus Henderson fjh@cs.mu.oz.au
Mon, 9 Feb 2004 15:37:23 +1100


On 08-Feb-2004, max <aranym@adelphia.net> wrote:
> Hi Gabor,
> I think you're confused. Characters in .NET are 16 bits BECAUSE they are 
> unicode. 16 bits = 2 bytes = 65536 values.

No, Gabor is not confused.  Unicode has grown.  It is now 20 bits, not 16.
See for example <http://www.terena.nl/library/multiling/unicode/utf16.html>
(which I just found by googling; it looks a bit out-of-date).

Unfortunately Windows, Java, and .NET all use 16-bit characters.
That means that they must either (a) use UCS-2 encoding, i.e.
don't support the new unicode characters such as "OLD ITALIC LETTER A";
or (b) use UTF-16 encoding, which means that these characters which
don't fit in 16 bits get represented as a pair of 16-bit codes.

-- 
Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the pursuit
The University of Melbourne         |  of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.