[Mono-list] string encoding

Havoc Pennington hp@redhat.com
Sun, 22 Jun 2003 10:51:48 -0400


Hi,

Hmm, on source code reading the MonoString stuff looks like it needs
some love... 

First it looks like "ANSI" means Windows 1252, which isn't quite 
Latin-1 and isn't ASCII either.  
http://www.hclrss.demon.co.uk/demos/ansi.html
Hopefully "ANSI" doesn't mean "the 8-bit encoding for this local
version of Windows" and is always the 1252 flavor.

Mono looks like it uses UTF-8 instead of ANSI, see appended code for
example.

A couple other issues there:

 - the "ANSI to UTF-16" conversion can't fail, but from UTF-8 can, and
   so Mono PtrToStringAnsi has a failure mode that isn't in the docs.

 - making one copy of the data in utf8_to_utf16 then another copy of
   that string seems kind of inefficient.

 - you can pass in NULL for the GError** if you don't care which 
   error occurs and are just going to free it.

I dunno. Anyhow, I guess I'll just use string for now without a custom
marshaller, and file a bug report.

If "ANSI" does change to mean a different encoding on different local
versions of Windows, then just pretending Linux is a strange Windows
version that uses UTF-8 instead maybe isn't breaking things more than
they are already. But I can't tell if that's how it works.

Havoc

MonoString*
mono_string_new (MonoDomain *domain, const char *text)
{
        GError *error = NULL;
        MonoString *o = NULL;
        guint16 *ut;
        glong items_written;
        int l;

        l = strlen (text);
        
        ut = g_utf8_to_utf16 (text, l, NULL, &items_written, &error);

        if (!error)
           o = mono_string_new_utf16 (domain, ut, items_written);
        else 
           g_error_free (error);

        g_free (ut);

        return o;
}

MonoString *
ves_icall_System_Runtime_InteropServices_Marshal_PtrToStringAnsi (char
*ptr)
{
        MONO_ARCH_SAVE_REGS;

        return mono_string_new (mono_domain_get (), ptr);
}