[Mono-list] Why UTF-16 strings in Mono.Unix?

Jonathan Pryor jonpryor at vt.edu
Mon Oct 17 20:31:06 EDT 2005


On Mon, 2005-10-17 at 17:04 -0700, Brion Vibber wrote:
> Jonathan Pryor wrote:
> > On Mon, 2005-10-17 at 19:03 +0200, Florian Weimer wrote:
> >>Why are UTF-16 strings used in Mono.Unix?  Doesn't this mean that some
> >>resources are inaccessible to programs running under Mono in a
> >>multibyte localeq (such as one using UTF-8)?
> >
> > Care to elaborate?  System.String is always used to represent strings in
> > Mono.Unix and Mono.Unix.Native, but Mono's marshaler will convert the
> > strings to UTF-8 for the P/Invoke call.
> 
> A peek at Mono.Unix/UnixMarshal.cs hints that managed strings are
> marshalled to/from the locale encoding (assuming Encoding.Default is the
> locale encoding); so locales other than UTF-8 ought to also work.

Having written that code, I can assure you it's meaningless. :-)

In particular, there are functions in UnixMarshal to convert managed
strings to null-terminated byte strings in a given encoding
(StringToAlloc, and vice versa, PtrToString), but nothing else in
Mono.Posix.dll makes use of those facilities.  Furthermore, those
methods were added only recently, and thus aren't part of any shipping
Mono distribution.

In short, if you call Syscall.open() or Stdlib.fopen(), you'll be
calling (nearly) directly into the unmanaged function, meaning you'll be
using Mono's default string marshaler, which at this point in time only
converts strings to UTF-8.

Strings coming back into managed code (such as Syscall.getgrent()) will
be converted using UnixMarshal.PtrToString(), which previously was a
thin wrapper over Marshal.PtrToStringAnsi() (thus on mono it decoded
UTF-8 strings).  The recent changes to UnixMarshal change this to decode
using Encoding.Default (the default encoding), which may be an
improvement...or not.  I'd love some feedback on that decision.

Finally, a question: would it be useful to duplicate all Syscall and
Stdlib string-using APIs to have an overload taking an Encoding, to
allow explicit specification of which encoding to marshal strings as?
This would nearly double the API size, so I'm not sure it's worthwhile,
but I'd love some advice.

 - Jon




More information about the Mono-list mailing list