[Mono-devel-list] Unicode

A Rafael D Teixeira rafaelteixeirabr at hotmail.com
Tue Sep 16 14:01:06 EDT 2003


>From: Chris Seaton <chris at chrisseaton.com>
>
> > What exactly is System.Text.Unicode supposed to be? It seems to work as
> > UTF-16, but not quite as it assumes two bytes for a character, so can't
> > work with surrogate pairs.
>Bah! I mean System.Text.UnicodeEncoding.
> >

Well any string in .NET/Mono is an 2-byte Unicode representation, THAT WORKS 
with surrogate pairs, so in truth instead of UCS-2 we have UTF-16, but that 
seems like an afterthought of MS (Unicode was extended from 2 to 4 bytes, 
while .NET was being developed), and I guess that is the reason the name 
doesn't reflect the new standards status quo.

> > What does Mono do if I'm using characters in the giddy heights of the
> > uppermost planes?

You can embedded any unicode character in C# literal strings, using the \u 
and \U escaping codes, for example:

"This is a specific character: \u034F"
"This is a upper-plane character: \U00013344" : if you look at the array 
returned by calling ToCharArray() on this string, you'll see the correct 
surrogate pair at the two final positions.

So the problem is that programatically you will have to deal with surrogate 
pairs by hand, with some small exceptions, as System.Char is defined as an 
2-byte-wide entity.

> > The name of this class implies it is the be all and
> > end all of Unicode, able to represent everything, unlike UTF8 and UTF7
> > which can only work with a limited part of Unicode.

Well UTF-8, UTF-7 and also UTF-16, can encode the whole set of 16 
high-planes currently defined, so I don't see it as much restrictive...

> > Shouldn't System.Text.Unicode really be System.Text.UTF16?

The name was cast in stone by Microsoft more than 3 years ago, and is now 
part of the ECMA/ISO standard, so we can't change that.

>And where's System.Text.UTF32?

We can ask ECMA to what to do about it, they can possibly add it in some 
future revision of the standard.

Meanwhile, if you really need it, you can contribute a 
Mono.Text.UTF32Encoding in an separate Mono.Text assembly.

>--
>Chris Seaton

Best hackings,


Rafael Teixeira
Brazilian Polymath
WEBforAll Ltda.
Mono, MonoQLE, #Wiki Hacker

_________________________________________________________________
MSN Messenger: instale grátis e converse com seus amigos. 
http://messenger.msn.com.br




More information about the Mono-devel-list mailing list