[Mono-list] string encoding

Havoc Pennington hp@redhat.com
Sun, 22 Jun 2003 00:33:38 -0400


Hi,

As best I can tell from C# docs a string is a sequence of char, and a
char is a 16-bit Unicode character. So strings are in UCS-2
encoding. Trying to figure out then how to marshal/unmarshal UTF-8 via
PInvoke.

I looked at GTK# for an example.  However, GTK# seems to use "string"
for the type to pass in and out of GTK, and GTK is wanting UTF-8, not
UCS-2.

DllImport has this CharSet parameter that's used to convert native
strings to UCS-2, but it doesn't have UTF-8 as a possible value, and
anyway GTK# doesn't specify CharSet.

So is GTK# broken, if not why not, if yes how do I do it properly?
Basically, how is string encoding handled?

The clean solution to me seems to be that CharSet would contain UTF-8
as a value and CharSet=Auto would imply UTF-8 on UNIX, but I imagine
this would be an unacceptable extension of standard APIs.

Havoc