[Mono-dev] PtrToStringAnsi

Joshua Tauberer tauberer at for.net
Thu Mar 9 09:09:06 EST 2006


Atsushi wrote:
> Mono does not support non-UTF8 multibyte conversion by design.

That's ok, but whatever we marshal out we should be able to marshal
back, yeah?

Okay, so after some more digging and realizing things are more
complicated than I thought, here's what I've learned:

PtrToStringAnsi does a UTF-8-to-UTF-16 conversion

StringToHGlobalAnsi does exactly the reverse

StringToCoTaskMemAnsi does something totally different!  It does
something kind of like a conversion to ANSI (or maybe it is ANSI, I'm
not sure).  There's no way to marshal such pointers back.

While the Ptr and HGlobal methods are icalls, the CoTaskMem methods are
in C#.

My confusion yesterday came from my assumption that
StringToCoTaskMemAnsi was simply wrapping StringToHGlobalAnsi, whose
implementation I was looking at.  StringToHGlobalAnsi eventually calls
the glib conversion function, and so I was expecting to see UTF-8 in the
resulting bytes whereas I was only seeing ANSI.

Anyway, I think StringToCoTaskMemAnsi should be changed to do exactly
the same thing as StringToHGlobalAnsi, right?  StringToCoTaskMemUni also
has a managed implementation, and while it looks OK, it strangely also
doesn't reuse the implementation of StringToHGlobalUni.

-- 
- Joshua Tauberer

http://taubz.for.net

"Unfortunately, we're having this discussion. It's too bad,
because guess who listens to the discussion: the enemy."

Atsushi Eno wrote:
> Hello,
> 
> Mono does not support non-UTF8 multibyte conversion by design. We
> shouldn't change its behavior from current one. Actually it is pretty
> classic matter which has been stated since 2003.
> http://lists.ximian.com/archives/public/mono-list/2003-June/014500.html
> 
> It is Microsoft who should provide additional marshaling flags so that
> it will be truly functional on every platforms (especially considering
> that there is also Gtk+ on Windows which is apparently designed to work
> on Windows and uses UTF-8 based marshaling). AFAIK they are also aware
> on this matter through ECMA meetings.
> 
> Atsushi Eno
> 
> 
>> While debugging a SqliteClient issue, I came across an interesting bug.
>>  The following returns null when I'm pretty sure it should not (it
>> doesn't on Windows):
>>
>> Marshal.PtrToStringAnsi(Marshal.StringToCoTaskMemAnsi("ü"))
>>
>> In case the encoding of this email gets messed up, that's a u with
>> umlauts, (char)0xFC.
>>
>> The encoding half "works" (Marshal.ReadByte reports the bytes (0xFC
>> 0x00)), on the assumption that I'm supposed to get ANSI out of this
>> method.  Internally, g_utf16_to_utf8 is used, which means that (besides
>> being surprised this call doesn't actually do ANSI encoding) I would
>> actually expect a multibyte representation of that character.  That's
>> from a few minutes of Googling for info on UTF-8.
>>
>> So I'm confused.  Can someone with more knowledge about encodings tell
>> me whether this really doesn't make sense?
>>
>> I'm using the latest RPMs.  Here's a test program:
>>
>> using System;
>> using System.Runtime.InteropServices;
>>
>> public class Test {
>>     public static void Main() 		
>> Console.WriteLine(Marshal.PtrToStringAnsi(Marshal.StringToCoTaskMemAnsi("ü")));
>> 	}
>> }
>>





More information about the Mono-devel-list mailing list