[Mono-list] Trouble with utf-16 marshaling

Maser, Dan Dan.Maser at inin.com
Fri Jun 29 19:44:30 EDT 2007


    I don't think that's the issue.   They're aren't any mutable strings
in my C library; I probably introduced that confusion accidentally.  My
C library has input UTF-16 parameters that are const, and has return
values that are UTF-16 as well.   Despite how the function signature
below looks, that "myArg" is really a const input.   My C code actually
does have "const unsigned short *" for the arguments and I just mistyped
it when I wrote the original message.
 
  I think I could have done the "CharSet=CharSet.Unicode" but (unless
I'm mistaken) that's an equivalent shortcut to putting the MarshalAs for
all the parameters.  In my case, I'm using SWIG so it's far easier to
make SWIG output the MarshalAs than the CharSet=CharSet.Unicode.
 
  But most importantly I am not trying to invoke a C function with a
mutable string buffer.  
 
________________________________

From: Andy Hume [mailto:andyhume32 at yahoo.co.uk] 
Sent: Friday, June 29, 2007 6:13 PM
To: Maser, Dan; mono-list at lists.ximian.com
Subject: RE: [Mono-list] Trouble with utf-16 marshaling


If the string argument is mutable then I believe one should use a
StringBuilder -- with its capacity set, and that length passed to the
native function too.  And if the native method writes more chars than
allocated then the heap will be corrupted. :-(
 
So with native method:
    void my_function(unsigned short* myArg, int maxLen); 
 
Do 
    [DllImport("myCLib", CharSet=CharSet.Unicode)]
    // I think <CharSet> on that attr is enough, no need for MarshalAs
on the param...
    public static extern void my_function(StringBuilder myArg, int
maxLength); 
...
    StringBuilder bldr = new StringBuilder(256);
    NativeMethods.my_function(bldr, bldr.Capacity);

See a similar sample at
http://msdn2.microsoft.com/en-us/library/x3txb6xc(vs.80).aspx
<http://msdn2.microsoft.com/en-us/library/x3txb6xc(vs.80).aspx> , and
reference material at
http://msdn2.microsoft.com/en-us/library/s9ts558h(VS.80).aspx
<http://msdn2.microsoft.com/en-us/library/s9ts558h(VS.80).aspx>  etc.

Unless I'm much confused it shouldn't work (at least isn't guaranteed
to) on MSFT either: "Platform invoke copies string arguments, converting
from the .NET Framework format (Unicode) to the platform unmanaged
format. Strings are immutable and are not copied back from unmanaged
memory to managed memory when the call returns."

I suppose since it is Unicode on both sides the MSFT CLR skips the copy
and just passed the address of the String's content.  Whereas Mono
doesn't have that optimisation perhaps.

Does that solve it, or is something else the problem?

Andy


________________________________

	From: mono-list-bounces at lists.ximian.com
[mailto:mono-list-bounces at lists.ximian.com] On Behalf Of Maser, Dan
	Sent: 29 June 2007 23:23
	To: Maser, Dan; mono-list at lists.ximian.com
	Subject: Re: [Mono-list] Trouble with utf-16 marshaling
	
	
	   I have debugged this some more, and found this.  (I'm not yet
sure how to convert this information into something actionable).
	 
	I was browsing some of the mono source code and found this
function (and its sisters):
	      MonoString* mono_string_new_utf16 (MonoDomain *domain,
const guint16 *text, gint32 len);
	 
	which seem to be the function(s) that initialize internal C#
strings from C data.  This one in particular appears to be invoked when
internal C# strings are created from UTF-16 "C" data.   I hacked in a
simple loop that printf'd the hex values of the UTF-16 data (the 'text'
parameter).
	 
	  What I see in my console window is interesting.  (After a
bunch of unrelated stuff) I see my C library returning a UTF-16 string
that gets correctly interpreted as MonoString:
	 
	    DBG: invocation of mono_string_new_utf16 with data:
	                   002f  0068  006f  006d  0065  002f  0064
0061  006e  006d  002f  0069  006e  0074 ...
	 
	which is the correct string.  The next thing I see in the
console window is this:
	 
	
	    DBG: invocation of mono_string_new_utf16 with data:
	                   682f  6d6f  2f65  6164  6d6e  692f  746e''
	 
	Notice that this second data is similiar to the first where each
2-bytes in the second string is the corresponding *4* bytes of the first
string and re-ordered as if there were some endian issue.  Clearly this
second string is supposed to be the same as the first string but it's
been damaged by some translation process.
	 
	   Does that information mean anything to anyone?   As always,
thanks for any help.
	        Dan Maser.
	
	
________________________________

	From: Maser, Dan 
	Sent: Friday, June 29, 2007 1:10 PM
	To: Maser, Dan; 'mono-list at lists.ximian.com'
	Subject: RE: [Mono-list] Trouble with utf-16 marshaling
	
	
	   Furthermore, I see in the mono source code that there is a
test function in the mono/mono/tests/libtest.c
	 
	STDCALL unsigned short*
	test_lpwstr_marshal (unsigned short* chars, long length)
	{
	...
	}
	 
	   Which is basically the same thing I'm doing; further
indicating that this should work.

________________________________

	From: mono-list-bounces at lists.ximian.com
[mailto:mono-list-bounces at lists.ximian.com] On Behalf Of Maser, Dan
	Sent: Friday, June 29, 2007 9:13 AM
	To: mono-list at lists.ximian.com
	Subject: [Mono-list] Trouble with utf-16 marshaling
	
	


	   My situation is this:  I've got a C library that has a lot of
UTF-16 inputs and outputs.  The C type is always "unsigned short*" or
"const unsigned short*" (because clearly wchar_t* isn't portable because
it's 4 bytes on linux).   All of my C# code has the
"[MarshalAs(UnsignedType.LPWStr)]" attribute specified.

	   It works properly in windows with MS .NET, but doesn't work
for me in linux with mono.   I've verified in gdb that the C library is
returning the correct string, but immediately after the C dll returns
and mono does the LPWStr marshaling the string is total garbage
characters.   I am under the impression from previous posts that 2-byte
UTF-16 should marshal properly to mono with the LPWStr attribute.  In
fact it looks like some of the gdiplus calls use that same thing and
work... any ideas what I can check on because mine doesn't?

	   For more clarification my C library has a function signature
like this: 

	void my_function(unsigned short* myArg); 

	    And my C# code looks like this: 


	[DllImport("myCLib")] 
	public static extern void
my_function([MarshalAs(UnmanagedType.LPWStr)] string myArg); 

	   Thanks in advance for any ideas on what to check! 
	      Dan Maser 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-list/attachments/20070629/a29754f1/attachment-0001.html 


More information about the Mono-list mailing list