[Mono-list] C -> C# strings

Jonathan Pryor jonpryor@vt.edu
Sun, 12 Dec 2004 21:18:35 -0500


On Mon, 2004-12-13 at 10:24 +1000, Bryan Buchanan wrote:
> Hi,
> 
> Can some please tell me why I cannot return 8 bit strings from C to C#.

You can return 8-bit strings, but they need to be specially encoded 8-
bit strings.  UTF-8 encoded strings, in particular.

An example of a valid 8-bit string would be "\xE8\xAA\x9E" (which is
\u8A9E, Japanese "Go", meaning language).

<snip/>

> If the string in C has a high bit character set (eg xFE) the string
> returned is alway null. If I change it to, say, x4E, a valid string is
> returned.

In your particular example, 0xFE is an invalid UTF-8 octet, which is why
null is returned.  0x4E is a valid UTF-8 octet, which is why it works.  

>  I've trued CharSet Auto, Ansi, Unicode and none make any
> difference.

The primary difference between Ansi and Unicode under Mono is that Ansi
uses a char* while Unicode is an "unsigned short*" -- that is, 8-bit vs.
16-bit character strings.  The actual string encoding has nothing to do
with it (though unfortunately Microsoft chose Ansi to mean "local code
page", unnecessarily tying the two concepts).  For example, Ansi could
be codepage 1252, 1256, or UTF-8 encoding, while Unicode could use
either the UCS-2 or UTF-16 encodings, which are (subtly) different.

Further confusing things, Mono chooses not pay attention to the code
page at all, and assumes that all Ansi strings are in UTF-8, period.

 - Jon