[Mono-list] string encoding

Miguel de Icaza miguel@ximian.com
28 Jun 2003 23:41:44 -0400


Hello,

> First it looks like "ANSI" means Windows 1252, which isn't quite 
> Latin-1 and isn't ASCII either.  

This area is not very well specified on the ECMA spec.  We did discuss
this recently, and we proposed to use a few extra bits to identify
exactly the type of conversion required (we ran into problems with
Python's dual mode for supporting Unicode).

Microsoft has an action item to confirm if there are any reasons not to
use those extra bits.  Once that is approved, we should have better
control over this.

> If "ANSI" does change to mean a different encoding on different local
> versions of Windows, then just pretending Linux is a strange Windows
> version that uses UTF-8 instead maybe isn't breaking things more than
> they are already. But I can't tell if that's how it works.

In the discussion we had, it turns out that *today* the interpretationof
ANSI and Unicode are highly implementation specific even for Microsoft
implementations (.NET, SPOT, Rotor and Compact Framework), and they
equate to "Whatever the underlying platform considers ANSI or Unicode".

I hope things will be clarified in the next iteration of the spec.

Miguel