[Mono-list] string encoding

A Rafael D Teixeira rafaelteixeirabr@hotmail.com
Sun, 22 Jun 2003 10:41:36 -0300


>As best I can tell from C# docs a string is a sequence of char, and a
>char is a 16-bit Unicode character. So strings are in UCS-2
>encoding. Trying to figure out then how to marshal/unmarshal UTF-8 via
>PInvoke.

In truth strings are in UTF-16 in memory, not UCS-2. I tested this myself on 
.NET more than a year ago, I'll have to test with current mcs/Mono to see if 
they do handle it properly.

But, yes, the documentation is misleading.

Best regards,

Rafael Teixeira
Brazilian Polymath
Mono Hacker since 16 Jul 2001



>From: Havoc Pennington <hp@redhat.com>
>To: mono-list@lists.ximian.com
>Subject: [Mono-list] string encoding
>Date: Sun, 22 Jun 2003 00:33:38 -0400
>
>Hi,
>
>As best I can tell from C# docs a string is a sequence of char, and a
>char is a 16-bit Unicode character. So strings are in UCS-2
>encoding. Trying to figure out then how to marshal/unmarshal UTF-8 via
>PInvoke.
>
>I looked at GTK# for an example.  However, GTK# seems to use "string"
>for the type to pass in and out of GTK, and GTK is wanting UTF-8, not
>UCS-2.
>
>DllImport has this CharSet parameter that's used to convert native
>strings to UCS-2, but it doesn't have UTF-8 as a possible value, and
>anyway GTK# doesn't specify CharSet.
>
>So is GTK# broken, if not why not, if yes how do I do it properly?
>Basically, how is string encoding handled?
>
>The clean solution to me seems to be that CharSet would contain UTF-8
>as a value and CharSet=Auto would imply UTF-8 on UNIX, but I imagine
>this would be an unacceptable extension of standard APIs.
>
>Havoc
>_______________________________________________
>Mono-list maillist  -  Mono-list@lists.ximian.com
>http://lists.ximian.com/mailman/listinfo/mono-list

_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE*  
http://join.msn.com/?page=features/junkmail