[Mono-list] Marshaling bug?

Jonathan Pryor jonpryor@vt.edu
Sat, 25 Oct 2003 12:40:23 -0400


Pardon me for saying this, but your SWT code is borked.  Seriously.

Why?  Opaque pointer values should be expressed as a System.IntPtr, not
a System.Int32 ("int").  This is so that if you ever move to a platform
with a different sized pointer (say, 64-bit Opteron, or PowerPC 970, or
UltraSparc, or...), you won't kill all your pointer values.

Furthermore, you've got arrays where you shouldn't have arrays. 
Consider the prototype for g_utf8_to_utf16:

	gunichar* g_utf8_to_utf16 (const gchar *str,
		glong len,
		glong *items_read,
		glong *items_written,
		GError **error);

Then, consider how C code would call it:

	int items_read, items_written;
	const char* str = "this is my utf-8 string";
	gunichar* result = 
		g_utf8_to_utf16 (str, strlen (str),
			&items_read,
			&items_written,
			NULL /* ignore errors */);

Assuming that we don't want to handle wrapping GError in C#, this would
be a better wrapper:

	[DllImport(...)]
	static extern IntPtr g_utf8_to_utf16 (byte[] str, int len, 
		out int items_read, out int items_written,
		IntPtr error /* pass IntPtr.Zero for this */);

Notice that "items_read" and "items_written" are mapped to a "out"
parameter, instead of an array.  This is more appropriate for this
function (and for any function similar to it -- items_read and
items_written aren't holding arrays, they're just an "out" pointer for
various data).

To answer your last question: what's the advantage of this:

	string s = "Let's i18n, baby...do it hard!";
	IntPtr unmanaged_memory = Marshal.StringToHGlobalAnsi (s);

over the voluminous amounts of code you demonstrated before?  Well, it's
shorter.

But it's also seriously broken, at least from a
cross-platform/portability perspective.  Here's why:

  - Not all platforms support "HGLOBAL".  On Unix platforms, this is
    likely to be normal g_malloc/g_free, but on Windows, this should
    be using the GlobalAlloc and GlobalFree Win32 APIs.  Which means
    you have different functions to call on different platforms, which
    will be a portability headache.

  - Even worse, StringToHGlobalAnsi creates an "Ansi" string.  Ansi
    IS NOT Utf-8.  At least, you can't assume that it is, though it
    *could* be.  Ansi is, typically, the local code page, and if you've
    been paying attention to the file-name handling thread on 
    mono-devel-list, you'd know that trying to mix the current code
    page with Unicode handling is fraught with danger (and confusion,
    and annoyance, and users with Pitchforks complaining about your
    app not working right...).

So, how do you do string-interop, portably, between Mono & GTK+?  Well,
you could just use Gtk#, which will tackle this issue (eventually; it
appears to use Marshal.StringToHGlobalAnsi in some places, so it's
likely assuming that, under Mono, Ansi == UTF-8).  This is certainly the
easiest way to go, unless you're dead set on providing *another* GTK+
wrapper.  (Of course, this places a Gtk# dependency on SWT, which may be
undesirable.)

If you do it on your own, you're pretty much stuck doing what you're
doing in your first example.

As for why it doesn't work, it could be a regression.  On my system, it
appears to be correctly converting the .NET UTF-16 input string "str"
into a UTF-8 string -- I'm able to pass "data" to g_printf and see
unmanaged representation.

It's the return trip -- converting the UTF-8 unmanaged memory and
copying it into the CLI char[] array, that appears to be the problem. 
I'll need to write a small test case, and if this is a new marshalling
bug, I'll file it in bugzilla.

Thanks,
 - Jon

On Sat, 2003-10-25 at 09:52, pbaena@uol.com.ar wrote:
> I reported a bug (#50116) about this problem of mine (of SWT really), and I wanted to get help from the experts to see if the API can be improved.
> 
> SWT works this way to append and retrieve from a g_list:
> 
> --------------------------------CODE-------------------------
> using System;
> using System.Runtime.InteropServices;
> 
> class testbug {
> 
> public const string GLIB_LIBRARY        = "glib-2.0";
> public const string STRLEN_LIBRARY      = "pango-1.0";
> public const string MEMMOVE_LIBRARY     = "gtk-x11-2.0";
> 
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern int g_utf16_to_utf8(char[] str, int len, int[]
> items_read, int[] items_written, int[] error);
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern int g_utf8_to_utf16(byte[] str, int len, int[]
> items_read, int[] items_written, int[] error);
> 
> [DllImport(STRLEN_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern int strlen(int str);
> 
> [DllImport(MEMMOVE_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern void memmove(int dest, int[] src, int size);
> [DllImport(MEMMOVE_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern void memmove(int dest, byte[] src, int size);
> [DllImport(MEMMOVE_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern void memmove(int[] dest, byte[] src, int size);
> [DllImport(MEMMOVE_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern void memmove(byte[] dest, int src, int size);
> [DllImport(MEMMOVE_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern void memmove(char[] dest, int src, int size);
> [DllImport(MEMMOVE_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern void memmove(int[] dest, int src, int size); 
> 
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern void g_free(int mem);
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern int g_malloc(int size); 
> 
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern int g_list_append(int list, int data); 
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)]
> public static extern int g_list_nth_data(int list, int n);
> 
>  
> public static void Main ()
> { 
>         string str = "Let's i18n, baby...do it hard!";
>         int glist = 0; 
>         bool terminate = true;
>         char [] strchar = str.ToCharArray(); 
> 
>         int [] items_read = new int [1], items_written = new int [1];
>         int ptr = g_utf16_to_utf8 (strchar, str.Length, items_read, 
> items_written, null);
>  
>         int written = items_written [0];
>         //TEMPORARY CODE - convertion stops at the first NULL 
>         if (items_read [0] != strchar.Length) written++;
>         byte [] buffer = new byte [written + (terminate ? 1 : 0)]; 
>         memmove (buffer, ptr, written);
>         g_free (ptr); 
> 
>         int data = g_malloc (buffer.Length);
>         memmove (data, buffer, buffer.Length); 
>         glist = g_list_append (glist, data);
>  
>         data = g_list_nth_data (glist, 0);
>         int length = strlen (data); 
>         byte [] buffer1 = new byte [length];
>         memmove (buffer1, data, length); 
> 
>         ptr = g_utf8_to_utf16 (buffer1, buffer1.Length, null,
> items_written, null); 
> 
>         length = items_written [0]; 
>         char [] chars = new char [length];
>         memmove (chars, ptr, length * 2); 
> 
>         Console.WriteLine (chars); 
> 
>         g_free (ptr); 
> }
> 
> }
> ------------------------------------------------------------------
> 
> That worked till mono 0.28, but doesn't work with current mono from CVS. Now I was testing things and found that this other approach to the problem works:
> 
> ------------------------------CODE--------------------------------
> using System;
> using System.Runtime.InteropServices;
> 
> class testbug {
> 
> public const string GLIB_LIBRARY        = "glib-2.0";
> 
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)] 
> public static extern int g_list_append(int list, IntPtr data);
> [DllImport(GLIB_LIBRARY, CharSet = CharSet.Unicode)] 
> public static extern string g_list_nth_data(int list, int n);
>  
> public static void Main ()
> { 
>         string str = "Let's i18n, baby...do it hard!";
>         int glist = 0; 
> 
> 	glist = g_list_append (glist, Marshal.StringToHGlobalAnsi (str)); 
>         string data2 = g_list_nth_data (glist, 0); 
> 	Console.WriteLine (data2);
> 
> 	return;
> }
> 
> }
> ----------------------------------------------------------------
> 
> Now I was wondering what are the advantages of the latest approach in contrast with SWT's. Can you give me some advice?
> 
> Thank you very much!
> Pablo
> _______________________________________________
> Mono-list maillist  -  Mono-list@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list