[Mono-list] Regression tool required and learn while you hack ;-)

A Rafael D Teixeira rafaelteixeirabr@hotmail.com
Mon, 20 Aug 2001 20:13:53 -0300

>Miguel de Icaza <miguel@ximian.com>

>It could be called `mc', but `mc' is already a command, so I suggest
>to keep the name, and mcs will mean `Mono Compiler Suite' ;-)

OK, itīll be a Suite then.

I couldnīt keep on my promise to have a new driver today, because while 
changing the GenericParser class, I drifted to try to figure out what was 
happening with the accented identifiers in my test .vb file, and we have a 
small but pesky problem there...

MS Visual Studio DEFAULTS to saving files in "Western European (Windows) - 
Codepage 1252", what means ANSI, and StreamReader defaults to UTF-8.

First, I thought we could just use a different constructor to specify which 
encoding StreamReader should use, but besides having only Unicode (UCS-2 in 
truth), UTF-8 and UTF-7 'Encoding' classes out-of-the-box (so that weīll 
have to find or implement some other 'Encoding' derived class for ANSI), 
thereīs the problem of how to find out which encoding to use.

We canīt assume that files will come with a signature or byte-marking and 
rely on the detectEncodingFromByteOrderMarks parameter, which probably can 
only work over the Unicode formats (Unicode, Big-Endian Unicode, UTF8, 

Any Ideas?

Just thinking aloud: UTF8 uses less characters above 0x7F than ANSI, so if 
changing decoders, or decoder strategy, in midstream is a possibility, we 
can go with the default UTF-8 and change to ANSI during the reading, but 
some characters may be lost before the switch. Anyway we canīt figure out 
which of the ANSI codepages is being used (we just can guess itīs the 
windows default 1252).

Hey people, I think that I write a bit too much. If you donīt like it, let 
me know and Iīll try to refrain myself.

Rafael Teixeira
Brazilian Developer

Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp