[Mono-list] unicode trouble
Fergus Henderson
fjh@cs.mu.oz.au
Mon, 9 Feb 2004 15:37:23 +1100
On 08-Feb-2004, max <aranym@adelphia.net> wrote:
> Hi Gabor,
> I think you're confused. Characters in .NET are 16 bits BECAUSE they are
> unicode. 16 bits = 2 bytes = 65536 values.
No, Gabor is not confused. Unicode has grown. It is now 20 bits, not 16.
See for example <http://www.terena.nl/library/multiling/unicode/utf16.html>
(which I just found by googling; it looks a bit out-of-date).
Unfortunately Windows, Java, and .NET all use 16-bit characters.
That means that they must either (a) use UCS-2 encoding, i.e.
don't support the new unicode characters such as "OLD ITALIC LETTER A";
or (b) use UTF-16 encoding, which means that these characters which
don't fit in 16 bits get represented as a pair of 16-bit codes.
--
Fergus Henderson <fjh@cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.