[Mono-list] Regression tool required and learn while you hack ;-)
A Rafael D Teixeira
rafaelteixeirabr@hotmail.com
Mon, 20 Aug 2001 20:13:53 -0300
>Miguel de Icaza <miguel@ximian.com>
>It could be called `mc', but `mc' is already a command, so I suggest
>to keep the name, and mcs will mean `Mono Compiler Suite' ;-)
OK, itīll be a Suite then.
I couldnīt keep on my promise to have a new driver today, because while
changing the GenericParser class, I drifted to try to figure out what was
happening with the accented identifiers in my test .vb file, and we have a
small but pesky problem there...
MS Visual Studio DEFAULTS to saving files in "Western European (Windows) -
Codepage 1252", what means ANSI, and StreamReader defaults to UTF-8.
First, I thought we could just use a different constructor to specify which
encoding StreamReader should use, but besides having only Unicode (UCS-2 in
truth), UTF-8 and UTF-7 'Encoding' classes out-of-the-box (so that weīll
have to find or implement some other 'Encoding' derived class for ANSI),
thereīs the problem of how to find out which encoding to use.
We canīt assume that files will come with a signature or byte-marking and
rely on the detectEncodingFromByteOrderMarks parameter, which probably can
only work over the Unicode formats (Unicode, Big-Endian Unicode, UTF8,
UTF7).
Any Ideas?
Just thinking aloud: UTF8 uses less characters above 0x7F than ANSI, so if
changing decoders, or decoder strategy, in midstream is a possibility, we
can go with the default UTF-8 and change to ANSI during the reading, but
some characters may be lost before the switch. Anyway we canīt figure out
which of the ANSI codepages is being used (we just can guess itīs the
windows default 1252).
Hey people, I think that I write a bit too much. If you donīt like it, let
me know and Iīll try to refrain myself.
Rafael Teixeira
Brazilian Developer
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp