[Mono-list] Just to clarify it: strings in .NET are in UTF-16 not UCS-2

A Rafael D Teixeira rafaelteixeirabr@hotmail.com
Thu, 04 Oct 2001 08:30:45 -0300


>
> > C# code like this:
> >
> >   string x = "\U00010001Test";
> >   foreach(char c in x.ToCharArray())
> >     System.Console.Write(" " + ((int)c).ToString("X4"));
> >
> > When compiled with csc and run in MS runtime, will output:
> > D800 DC01 0054 0065 0073 0074
>
>Uh oh.  I am starting to get confused.
>
>Maybe they do encode the \U00010001 as two characters in the stream?
>
>Miguel.

That is what UTF-16 means, any character in the 0x10000 to 0xFFFFF range is 
coded as two 16-bits chars (the 'surrogate pair')

Rafael Teixeira
Brazilian Developer



_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp