[Mono-list] Why UTF-16 strings in Mono.Unix?

Florian Weimer fw at deneb.enyo.de
Tue Oct 18 07:18:26 EDT 2005


* Jonathan Pryor:

> This won't work with a great deal more than just Mono applications.
> This will likely also "break" for every app that uses a runtime (Java,
> Perl, Python),

It doesn't break for Perl or Python, nor Emacs or vi.

> and certainly won't work with GTK+/Gnome applications
> unless the user explicitly sets the G_FILENAME_ENCODING environment
> variable to contain the character set name that should instead be used
> (and how many users will know about G_FILENAME_ENCODING, much less set
> it?), or the user sets G_BROKEN_FILENAMES=1.

I just tested gedit.  It can access such files even without setting
those environment variables.  The file selection dialog warns that the
encoding is invalid.

>> A first step in a direction to fix that would be to use native strings
>> (multibyte strings) for accessing native APIs.
>
> What does that mean, exactly?  Mono is already generating multibyte
> strings for the Native APIs -- UTF-8 strings, yes, but UTF-8 is a
> multibyte encoding -- so your statement is effectively meaningless.

Native strings on UNIX are NUL-terminated arrays of bytes.  All
strings interpreted by the operating system (mainly file system paths)
are of this form.  At some point, you have to make a conversion.  I
hope that Mono will expose the byte strings in its Mono.Unix API
because the functionality you can implement using byte strings is a
strict superset of what is possible if you only permit UTF-16 strings.

(There is also a possibility to encode invalid byte strings (which are
not valid UTF-8) using invalid surrogate sequences, but this is kludge
and has potential security implications, as the conversion from UTF-16
to UTF-8 can suddenly produce invaldi UTF-8 sequences.)

> It sounds like what you *really* want is for Mono's string marshaler to
> marshal to the user's preferred character set/encoding instead of UTF-8.

No, this is not what I want.  I want to be able to write code which
can access all files the user has read access to, irrespective of
their names.


More information about the Mono-list mailing list