[Mono-devel-list] Unicode
A Rafael D Teixeira
rafaelteixeirabr at hotmail.com
Tue Sep 16 14:01:06 EDT 2003
>From: Chris Seaton <chris at chrisseaton.com>
>
> > What exactly is System.Text.Unicode supposed to be? It seems to work as
> > UTF-16, but not quite as it assumes two bytes for a character, so can't
> > work with surrogate pairs.
>Bah! I mean System.Text.UnicodeEncoding.
> >
Well any string in .NET/Mono is an 2-byte Unicode representation, THAT WORKS
with surrogate pairs, so in truth instead of UCS-2 we have UTF-16, but that
seems like an afterthought of MS (Unicode was extended from 2 to 4 bytes,
while .NET was being developed), and I guess that is the reason the name
doesn't reflect the new standards status quo.
> > What does Mono do if I'm using characters in the giddy heights of the
> > uppermost planes?
You can embedded any unicode character in C# literal strings, using the \u
and \U escaping codes, for example:
"This is a specific character: \u034F"
"This is a upper-plane character: \U00013344" : if you look at the array
returned by calling ToCharArray() on this string, you'll see the correct
surrogate pair at the two final positions.
So the problem is that programatically you will have to deal with surrogate
pairs by hand, with some small exceptions, as System.Char is defined as an
2-byte-wide entity.
> > The name of this class implies it is the be all and
> > end all of Unicode, able to represent everything, unlike UTF8 and UTF7
> > which can only work with a limited part of Unicode.
Well UTF-8, UTF-7 and also UTF-16, can encode the whole set of 16
high-planes currently defined, so I don't see it as much restrictive...
> > Shouldn't System.Text.Unicode really be System.Text.UTF16?
The name was cast in stone by Microsoft more than 3 years ago, and is now
part of the ECMA/ISO standard, so we can't change that.
>And where's System.Text.UTF32?
We can ask ECMA to what to do about it, they can possibly add it in some
future revision of the standard.
Meanwhile, if you really need it, you can contribute a
Mono.Text.UTF32Encoding in an separate Mono.Text assembly.
>--
>Chris Seaton
Best hackings,
Rafael Teixeira
Brazilian Polymath
WEBforAll Ltda.
Mono, MonoQLE, #Wiki Hacker
_________________________________________________________________
MSN Messenger: instale grátis e converse com seus amigos.
http://messenger.msn.com.br
More information about the Mono-devel-list
mailing list