[Mono-list] Support for marshalling of C# string to unmanaged wchar_t on Linux

Jonathan Pryor jonpryor@vt.edu
Thu, 16 Dec 2004 21:22:10 -0500


Your code is buggy, but I'll tackle it anyway. :-)

On Thu, 2004-12-16 at 13:12 +0000, Kala B wrote:
> Hi,
> Does mono support marshalling of C# string to
> unmanaged wchar_t on Linux? 

No.  Actually, I was surprised it ran at all (I was expecting a
g_assert_not_reached() message), but once I ran it, the problem became
clear: When Mono marshals to CharSet.Unicode, it doesn't marshal to a
Unix wchar_t, it marshals to a Windows wchar_t.

Wchar_t on most Unix platforms is 4 bytes (32-bits), while it's 2 bytes
(16-bits) on Windows.  This is easy for Mono to do (as it stores all
strings as 16-bit Unicode strings internally), and difficult for lots of
other people.

Though wchar_t has enough problems in Unix/Linux that I've been told to
avoid it.  It's more trouble than it's worth -- sticking with UTF-8 is
far easier to do.  (Then there's the immortal question of how to
portably convert between wchar_t* and char*.  You could use wcstombs,
but the standard doesn't say what encoding it'll use, which makes it
nearly useless for most practical purposes...)

> It does not seem to work. Consider the following
> sample code, 
> ( contents of 3 files ) 
> 1. chash.cs
>    This C# code makes a call to a C API, which takes a
> wchar_t[].
> 2. testlib.c
>    This is the C code which implements the C API.
> 3. makefile
>    to build the library and C# exe.

<snip/>

> Contents of testlib.c
> ---------------------
> #include <stdio.h>
> #include <wchar.h>
> 
> typedef struct _id
> {
>     int len;
>     wchar_t name[256];

Change "name" to the following and things work better:

	unsigned char name[256];
> }Id;
> 
> int TestFn(Id *id)
> {
>     printf("%s\t%S\t%d\n",__func__,id->name,id->len);

This printf is broken -- %S isn't valid.  %ls is the correct
standardized way to print a wchar_t string, but since name isn't a
wchar_t[] it won't work anyway.  So we'll change it to make it nicer:

	printf ("%s\t%d\n", __func__, id->len);
	for (int i = 0; i < id->len; ++i)
		printf ("\t%.3i: %c\n", i, (char) id->name[i]);

>     printf("wcslen returns.. %d\n",strlen(id->name));

Besides, you're not even using wcslen here, you're using strlen here.
OF COURSE it'll return "1" -- it'll hit the "null" embedded in the first
wide character.

> Could you please help? 
> If marshalling is not supported, could you please
> suggest some alternate solution to solve the issue?

Alternate solution: Use UTF-8.  Lots of libraries support it, it
requires minimal changes and support, and most other libraries/platforms
are migrating toward it (see Gnome and KDE).  The only reason to not use
UTF-8 is portability with Windows, which makes it easier to use the
WCHAR type, but Windows still supports UTF-8 in its conversion
functions.

 - Jon