[Mono-list] C -> C# strings
Miguel de Icaza
miguel@ximian.com
Mon, 13 Dec 2004 16:55:24 -0500
Hello,
> The primary difference between Ansi and Unicode under Mono is that Ansi
> uses a char* while Unicode is an "unsigned short*" -- that is, 8-bit vs.
> 16-bit character strings. The actual string encoding has nothing to do
> with it (though unfortunately Microsoft chose Ansi to mean "local code
> page", unnecessarily tying the two concepts). For example, Ansi could
> be codepage 1252, 1256, or UTF-8 encoding, while Unicode could use
> either the UCS-2 or UTF-16 encodings, which are (subtly) different.
>
> Further confusing things, Mono chooses not pay attention to the code
> page at all, and assumes that all Ansi strings are in UTF-8, period.
The good news: In the upcoming ECMA specification, we are going to get
two new encodings.
The bad news: the bits are "implementation specific" and wont be
guaranteed to work compatibly across platforms.
Miguel.