[Mono-list] C -> C# strings

Miguel de Icaza miguel@ximian.com
Mon, 13 Dec 2004 16:55:24 -0500


Hello,

> The primary difference between Ansi and Unicode under Mono is that Ansi
> uses a char* while Unicode is an "unsigned short*" -- that is, 8-bit vs.
> 16-bit character strings.  The actual string encoding has nothing to do
> with it (though unfortunately Microsoft chose Ansi to mean "local code
> page", unnecessarily tying the two concepts).  For example, Ansi could
> be codepage 1252, 1256, or UTF-8 encoding, while Unicode could use
> either the UCS-2 or UTF-16 encodings, which are (subtly) different.
> 
> Further confusing things, Mono chooses not pay attention to the code
> page at all, and assumes that all Ansi strings are in UTF-8, period.

The good news: In the upcoming ECMA specification, we are going to get
two new encodings.

The bad news: the bits are "implementation specific" and wont be
guaranteed to work compatibly across platforms.

Miguel.