[Mono-list] Why UTF-16 strings in Mono.Unix?
fw at deneb.enyo.de
Tue Oct 18 07:18:26 EDT 2005
* Jonathan Pryor:
> This won't work with a great deal more than just Mono applications.
> This will likely also "break" for every app that uses a runtime (Java,
> Perl, Python),
It doesn't break for Perl or Python, nor Emacs or vi.
> and certainly won't work with GTK+/Gnome applications
> unless the user explicitly sets the G_FILENAME_ENCODING environment
> variable to contain the character set name that should instead be used
> (and how many users will know about G_FILENAME_ENCODING, much less set
> it?), or the user sets G_BROKEN_FILENAMES=1.
I just tested gedit. It can access such files even without setting
those environment variables. The file selection dialog warns that the
encoding is invalid.
>> A first step in a direction to fix that would be to use native strings
>> (multibyte strings) for accessing native APIs.
> What does that mean, exactly? Mono is already generating multibyte
> strings for the Native APIs -- UTF-8 strings, yes, but UTF-8 is a
> multibyte encoding -- so your statement is effectively meaningless.
Native strings on UNIX are NUL-terminated arrays of bytes. All
strings interpreted by the operating system (mainly file system paths)
are of this form. At some point, you have to make a conversion. I
hope that Mono will expose the byte strings in its Mono.Unix API
because the functionality you can implement using byte strings is a
strict superset of what is possible if you only permit UTF-16 strings.
(There is also a possibility to encode invalid byte strings (which are
not valid UTF-8) using invalid surrogate sequences, but this is kludge
and has potential security implications, as the conversion from UTF-16
to UTF-8 can suddenly produce invaldi UTF-8 sequences.)
> It sounds like what you *really* want is for Mono's string marshaler to
> marshal to the user's preferred character set/encoding instead of UTF-8.
No, this is not what I want. I want to be able to write code which
can access all files the user has read access to, irrespective of
More information about the Mono-list