[Mono-dev] Marshalling of strings

Jonathan Pryor jonpryor at vt.edu
Tue Sep 20 21:45:46 EDT 2005


On Wed, 2005-09-21 at 00:16 +0100, Chris Seaton wrote:
> Hows marshalling of strings to native functions work in Mono and .NET in 
> general?

"Implementation Defined" comes to mind -- things differ between Mono
and .NET.

> What does ANSI mean? I thought ANSI was a family of character sets?

On Mono, ANSI means UTF-8, regardless of platform.

On .NET, you thought correctly -- ANSI means "the default non-Unicode
character set for this Windows installation", which could be latin1,
latin2, latin3, S-JIS, BIG5...

In short, ANSI means "whatever the legacy Windows code page happens to
be for this installation", which (of course) can change on Windows
NT-based platforms for both the system and the user.

> What does Unicode mean exactly? UTF-16? UTF-32? What about endianess?

On both Mono and .NET Unicode means UTF-16 platform-endian.  This can be
problematic on Mono/Linux as many apps use UTF-32, not UTF-16.

> If I write a native function that uses UTF-8, I should set the CharSet 
> to be Unicode, but then how does it know that it's UTF-8 and not 
> anything else? Does it sniff the bytes?

If you're targeting only Mono, use CharSet.Ansi.

If you're targeting .NET, set your codepage to 60001 using chcp (which
should set your code page to UTF-8), and use CharSet.Ansi.

If changing your code page on Windows isn't practical (and it probably
isn't), then you have to marshal the string manually.  You can look at
the current Gtk# sources for an example, or you can use this (untested)
code:

	string input = "this is the input string";
	byte[] marshal = new byte [
		System.Text.Encoding.UTF8.GetByteCount (input) + 1
	];
	if (System.Text.Encoding.UTF8.GetBytes (s, 0, 
			s.Length, marshal, 0) != (marshal.Length-1))
		throw new Exception ("WTF?");
	marshal [marshal.Length-1] = 0;
	IntPtr marshal_buf = Marshal.AllocHGlobal (marshal.Length);
	if (marshal_buf == IntPtr.Zero)
		throw new OutOfMemoryException ();
	try {
		Marshal.Copy (marshal, 0, marshal_buf, marshal.Length);
		your_pinvoke_function (marshal_buf);
	}
	finally {
		Marshal.FreeHGlobal (marshal_buf);
	}

If you're brave and/or don't care about using unsafe code, you could
always do this:

	fixed (byte* pb = marshal) {
		your_pinvoke_function (pb);
	}

> What if my function uses UTF-7? I mean that's still Unicode but I bet 
> Mono isn't expecting my function to return it.

To call a function that takes UTF-7 strings as arguments, do what I
demonstrated above but use Encoding.UTF7 instead of Encoding.UTF8.

NEVER DECLARE System.String AS THE RETURN TYPE OF FUNCTIONS.  The
runtime is supposed to free the memory of the function return value,
with .NET using CoTaskMemFree() to free the memory and Mono using
g_free().  Since you normally don't want the runtime freeing this memory
for you (as you're probably not using the correct memory allocator),
declare the return type of functions returning strings as IntPtr.  You
will need to manually marshal the returned string (untested):

	IntPtr r = your_pinvoke_function ();
	// count # bytes
	int i = 0;
	while (Marshal.ReadByte (r, i) != (byte) 0) ++i;
	byte[] s_buf = new byte [i];
	Marshal.Copy (r, s_buf, 0, s_buf.Length);
	string s = System.Text.Encoding.UTF8.GetString (s_buf);

Of course, you should use Encoding.UTF7 if necessary.

If the returned string is a UTF-16 string, just use
Marshal.PtrToStringUni().

> What if my strings aren't null terminated? They're still Unicode if 
> they're not null terminated.

They're still in the unicode encoding, but nobody can safely operate on
unterminated strings unless you specify the string length separately
(such as with strncpy(3)).  Avoid unterminated strings.

> Can someone tell me explicitly what character encodings should be used 
> to do pinvoke?

It depends on what you're invoking.  If you control both the C# and C
code, I'd suggest using Unicode (UTF-16), as this is simplest and
requires the least marshaling effort to deal with.  Otherwise you need
to use whatever encoding the unmanaged code requires.

See also: http://www.mono-project.com/Interop_with_Native_Libraries

 - Jon





More information about the Mono-devel-list mailing list