[Mono-devel-list] 7 regressions appeared for MS.VB.dll becauseof change in mcs

Rafael Teixeira monoman at gmail.com
Mon Nov 1 18:00:25 EST 2004


Hi Jonathan,

> In Visual Studio .NET, one of the save options is "Unicode (UTF-8 with
> signature)." Obviously, mono cannot arbitrarily detect what encoding a
> given byte sequence is in (though there are some good heuristics out
> there), but if an explicit signature is present, will mono treat the file
> as UTF-8?
> 
> Jonathan Gilbert

Well, first in this thread we are talking about mcs, mono's C#
compiler, not mono itself. Well nowadays mcs has such code in place:

try {
	encoding = Encoding.GetEncoding (28591);
} catch {
	Console.WriteLine ("Error: could not load encoding 28591, trying 1252");
	encoding = Encoding.GetEncoding (1252);
}

and then

SeekableStreamReader reader = new SeekableStreamReader (input,
encoding, using_default_encoder);

So answering as using_default_encoder starts as true, mcs will try to
detect bytemarks and recognize the encoding, BUT if the byte marks
aren't present it will default to the ISO-8859-1 or Windows-1252
codepages.

That was the change in mcs I was talking about. 

mbas still uses this code:

//   We are here forcing StreamReader to assume current system codepage,

//   because normally it defaults to UTF-8

input = new StreamReader(fileName, System.Text.Encoding.Default); 


that simply uses the codepage/encoding set for the system.

Probably we may have to add a /codepage option for mbas also. Filled
#69004 at bugzilla to that end.

Fun,


On Sun, 31 Oct 2004 14:29:37 -0500, Jonathan Gilbert
<2a5gjx302 at sneakemail.com> wrote:
> At 12:08 PM 31/10/2004 -0500, Miguel de Icaza wrote:
> >Hello,
> >> <rant>
> >> As mcs no more defaults to the encoding set in the LANG environment
> >> variable (mine says LANG=en_US.UTF-8) one edits sources with, say,
> >> gedit like I do where you see every accented letter or another
> >> international character correctly represented in the source and then
> >> mcs compiles then all wrong.
> >
> >It never defaulted to it.  You just upgraded your OS and that is why you
> >get that behavior.
> >
> >If you want the VB tests to pass completely, you should instead encode
> >any non-7bit characters using the \uXXXX syntax.
> 


-- 
Rafael "Monoman" Teixeira
---------------------------------------
Just the 'crazy' me in a sane world, or would it be the reverse? I dunno...



More information about the Mono-devel-list mailing list