[Mono-list] string encoding

Jaroslaw Kowalski jaroslaw.kowalski@atm.com.pl
Sun, 22 Jun 2003 20:38:34 +0200


Charset.Ansi is dependent on windows installation. In Polish Windows it uses
1250 codepage.
All ANSI versions of WIN32 APIs (like GetCurrentDirectoryA) use this
codepage.

I don't think it's very easy to change the setting after Windows
installation. I remember having serious problems with this after I
incorrectly installed WinNT 4.0 with English regional settings. But, things
may have changed since then.

To make things more complex, there's also SetThreadLocale() which can be
used to change the locale for the calling thread. I don't know if it affects
the codepage used by Win32 APIs. Maybe someone can verify it?

Jarek

----- Original Message -----
From: "Havoc Pennington" <hp@redhat.com>
To: "Marcus" <mathpup@mylinuxisp.com>
Cc: <mono-list@lists.ximian.com>
Sent: Sunday, June 22, 2003 4:51 PM
Subject: Re: [Mono-list] string encoding


> Hi,
>
> Hmm, on source code reading the MonoString stuff looks like it needs
> some love...
>
> First it looks like "ANSI" means Windows 1252, which isn't quite
> Latin-1 and isn't ASCII either.
> http://www.hclrss.demon.co.uk/demos/ansi.html
> Hopefully "ANSI" doesn't mean "the 8-bit encoding for this local
> version of Windows" and is always the 1252 flavor.
>
> Mono looks like it uses UTF-8 instead of ANSI, see appended code for
> example.
>
> A couple other issues there:
>
>  - the "ANSI to UTF-16" conversion can't fail, but from UTF-8 can, and
>    so Mono PtrToStringAnsi has a failure mode that isn't in the docs.
>
>  - making one copy of the data in utf8_to_utf16 then another copy of
>    that string seems kind of inefficient.
>
>  - you can pass in NULL for the GError** if you don't care which
>    error occurs and are just going to free it.
>
> I dunno. Anyhow, I guess I'll just use string for now without a custom
> marshaller, and file a bug report.
>
> If "ANSI" does change to mean a different encoding on different local
> versions of Windows, then just pretending Linux is a strange Windows
> version that uses UTF-8 instead maybe isn't breaking things more than
> they are already. But I can't tell if that's how it works.
>
> Havoc
>
> MonoString*
> mono_string_new (MonoDomain *domain, const char *text)
> {
>         GError *error = NULL;
>         MonoString *o = NULL;
>         guint16 *ut;
>         glong items_written;
>         int l;
>
>         l = strlen (text);
>
>         ut = g_utf8_to_utf16 (text, l, NULL, &items_written, &error);
>
>         if (!error)
>            o = mono_string_new_utf16 (domain, ut, items_written);
>         else
>            g_error_free (error);
>
>         g_free (ut);
>
>         return o;
> }
>
> MonoString *
> ves_icall_System_Runtime_InteropServices_Marshal_PtrToStringAnsi (char
> *ptr)
> {
>         MONO_ARCH_SAVE_REGS;
>
>         return mono_string_new (mono_domain_get (), ptr);
> }
> _______________________________________________
> Mono-list maillist  -  Mono-list@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
>