[Mono-list] Support for marshalling of C# string to unmanaged
wchar_t on Linux
Jonathan Pryor
jonpryor@vt.edu
Thu, 16 Dec 2004 21:22:10 -0500
Your code is buggy, but I'll tackle it anyway. :-)
On Thu, 2004-12-16 at 13:12 +0000, Kala B wrote:
> Hi,
> Does mono support marshalling of C# string to
> unmanaged wchar_t on Linux?
No. Actually, I was surprised it ran at all (I was expecting a
g_assert_not_reached() message), but once I ran it, the problem became
clear: When Mono marshals to CharSet.Unicode, it doesn't marshal to a
Unix wchar_t, it marshals to a Windows wchar_t.
Wchar_t on most Unix platforms is 4 bytes (32-bits), while it's 2 bytes
(16-bits) on Windows. This is easy for Mono to do (as it stores all
strings as 16-bit Unicode strings internally), and difficult for lots of
other people.
Though wchar_t has enough problems in Unix/Linux that I've been told to
avoid it. It's more trouble than it's worth -- sticking with UTF-8 is
far easier to do. (Then there's the immortal question of how to
portably convert between wchar_t* and char*. You could use wcstombs,
but the standard doesn't say what encoding it'll use, which makes it
nearly useless for most practical purposes...)
> It does not seem to work. Consider the following
> sample code,
> ( contents of 3 files )
> 1. chash.cs
> This C# code makes a call to a C API, which takes a
> wchar_t[].
> 2. testlib.c
> This is the C code which implements the C API.
> 3. makefile
> to build the library and C# exe.
<snip/>
> Contents of testlib.c
> ---------------------
> #include <stdio.h>
> #include <wchar.h>
>
> typedef struct _id
> {
> int len;
> wchar_t name[256];
Change "name" to the following and things work better:
unsigned char name[256];
> }Id;
>
> int TestFn(Id *id)
> {
> printf("%s\t%S\t%d\n",__func__,id->name,id->len);
This printf is broken -- %S isn't valid. %ls is the correct
standardized way to print a wchar_t string, but since name isn't a
wchar_t[] it won't work anyway. So we'll change it to make it nicer:
printf ("%s\t%d\n", __func__, id->len);
for (int i = 0; i < id->len; ++i)
printf ("\t%.3i: %c\n", i, (char) id->name[i]);
> printf("wcslen returns.. %d\n",strlen(id->name));
Besides, you're not even using wcslen here, you're using strlen here.
OF COURSE it'll return "1" -- it'll hit the "null" embedded in the first
wide character.
> Could you please help?
> If marshalling is not supported, could you please
> suggest some alternate solution to solve the issue?
Alternate solution: Use UTF-8. Lots of libraries support it, it
requires minimal changes and support, and most other libraries/platforms
are migrating toward it (see Gnome and KDE). The only reason to not use
UTF-8 is portability with Windows, which makes it easier to use the
WCHAR type, but Windows still supports UTF-8 in its conversion
functions.
- Jon