[Mono-dev] mcs patch for default encoding

Tue Aug 23 14:35:28 EDT 2005

I support your latest patch. It does the same as csc.exe except UTF-8
detection using parsing the entire file. Using BOM regardless of /codepage
argument is a good decission and csc.exe does this as well.

I think /codepage:restore should be named /codepage:default and
/codepage:utf8 should be /codepage:utf-8. csc.exe does not support named
code pages but this may cause incompatiblity with older mcs version.

There is no need to use cp = Encoding.Default.CodePage;

I advise to use this:

switch (value)
{
 case "utf8":
  encoding = Encoding.UTF8;
  break;
 case "reset":
  encoding = Encoding.Default;
  break;
 default:
  try
  {
   encoding = Encoding.GetEncoding (int.Parse (value));
  }
  catch
  {
   Report.Error (2016, "Code page `{0}' is invalid or not installed",
value);
  }
  break;
}
return true;

And there is no use to comment out lines in UTF8Encoding. Old version will
be available in SVN.

Note that none of these remarks affect behaviour they only affect the actual
implementation.

Kornél

----- Original Message -----
From: "Atsushi Eno" <atsushi at ximian.com>
To: "Kornél Pál" <kornelpal at hotmail.com>
Cc: "mono-devel mailing list" <mono-devel-list at lists.ximian.com>; "Marek
Safar" <marek.safar at seznam.cz>
Sent: Tuesday, August 23, 2005 5:55 PM
Subject: Re: [Mono-dev] mcs patch for default encoding

> Hi,
>
> I still personally don't like such almost-extraneous solution.
> I actually made similar fix when I was trying to fix bug #75679
> ( http://bugzilla.ximian.com/show_bug.cgi?id=75679 ) but the code
> ran pretty slower than as it is now.
>
> So, the matters to solve are:
>
> 1) Should we read entire stream ahead of compilation as
>    Kornél suggested ?
> 2) Can we apply my patch except for the part that is
>    related default encoding part in driver.cs ?
> 3) Can we apply driver.cs patch that is related to
>    default encoding ?
>
> Atsushi Eno
>
> Kornél Pál wrote:
>> I've tried to compile a 2 GB size file using csc.exe: I got out of memory
>> error. The I reduced the size to 500 MB but I still got out of memory.
>> Finally I was able to compile a 200 MB file.
>>
>> I got error CS1034: Compiler limit exceeded: Line cannot exceed 2046
>> characters
>>
>> So I added line breaks as well. And added // to the beginning of each
>> line
>> to add some non-whitespace chars just for fun and to test the compiler.:)
>>
>> The first non-ASCII character is very near to the end of the file.
>> csc.exe
>> compiled it correctly. UTF-8 and ACP as well. DétectEncoding was compiled
>> correctly in both cases. I attached the test cases (about 200 MB each).
>>
>> So I think csc.exe parses the whole file to detect UTF-8 and has poor
>> memory
>> management in addition.:) Maybe it chaches the source file using it's own
>> allocated memory.
>>
>> Kornél
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>