[Gtk-sharp-list] File name encodings
Federico Mena Quintero
federico@ximian.com
Thu, 17 Feb 2005 13:56:11 -0600
Hi,
There's a problem in the way file names are extracted from
FileChooserDialog and then represented internally:
1. The generator spits out something like
public string Filename {
get {
IntPtr raw_ret = gtk_file_chooser_get_filename(Handle);
string ret = GLib.Marshaller.PtrToStringGFree(raw_ret);
return ret;
}
}
2. In turn, Glib.Marshaller.PtrToStringGFree() uses
Marshal.PtrToStringAnsi() internally.
3. PtrToStringAnsi() is implemented with mono_string_new().
4. mono_string_new() uses g_utf8_to_utf16(). If the conversion results
in an error (for example, if the source string is not valid UTF-8), no
string is created.
The problem is that gtk_file_chooser_get_filename() returns filenames in
the "Glib filename encoding" [1]. This is the same as the on-disk
representation, whose encoding is hopefully listed in the
G_FILENAME_ENCODING environment variable [2].
If a filename is not UTF-8 on disk, then mono_string_new() will fail and
no filename will be returned at the gtk-sharp level.
This is a general problem in Unix with respect to representing
filenames; they are really raw chunks of bytes rather than strings in a
known encoding.
(I only have an old copy of mono which may not reflect the current state
of things, but it also had trouble with this in
mono/io-layer/io.c:FindNextFile() --- if it can't convert the on-disk
filename to UTF-8, it ignores the filename).
[1] http://developer.gnome.org/doc/API/2.0/gtk/GtkFileChooser.html#gtkfilechooser-encodings
[2] http://developer.gnome.org/doc/API/2.0/glib/glib-Character-Set-Conversion.html#file-name-encodings
http://developer.gnome.org/doc/API/2.0/glib/glib-running.html#G_FILENAME_ENCODING
Federico