[Mono-dev] mcs patch for default encoding

Atsushi Eno atsushi at ximian.com
Tue Aug 23 03:55:42 EDT 2005


Oh, actually I have.

I even have a case that does not work with mcs but works with csc -
i.e. the case that csc detects utf-8 regardless of BOM.


I forgot one thing - with regard to that remaining problem, we need
to fix WinForms build (because KeyboardLayout.cs seems to have
raw non-ASCII character:

syntax error, got token `IDENTIFIER'
System.Windows.Forms\KeyboardLayouts.cs(93,51): error CS1526: A new 
expression requires () or [] after type
System.Windows.Forms\KeyboardLayouts.cs(97,62): error CS8025: Parsing error
Compilation failed: 2 error(s), 0 warnings

They should be replaced by \uXXXX but I have no idea what those
characters actually are :|

Atsushi Eno


Marek Safar wrote:
> Hello Eno,
> 
> Could you write some tests to cover this functionality. I mean e.g. 
> simple test file with UTF header.
> 
> Thanks,
> Marek
> 
>> Hi again,
>>
>>> Agreed. In fact, I was also fixing bug #75065, maybe duplicate.
>>> I have a fix for UTF8Encoding, but it uncovered another mcs bug
>>> which does not handle files with BOM with specific encoding.
>>> To summarize the situation:
>>>
>>>     - Currently driver.cs does not process source files with
>>>       default encoding.
>>>     - UTF8Encoding.cs does not handle U+FEFF correctly.
>>>     - When we fix UTF8Encoding.cs to handle U+FEFF, it starts
>>>       to reject some source files which has BOM.
>>>       (CS8025:Parsing error)
>>>     - Even if we fix driver.cs to let StreamReader consider BOM
>>>       (currently we disable it), there are still some files
>>>       borking.
>>>
>>> Am digging into this bug in depth. Hopefully I'll post a set of
>>> fixes later.
>>
>>
>> ... and now I finished the fixes as was done in the attached patch:
>>
>>     - driver.cs :
>>       a) uses Encoding.Default for the default input.
>>       b) Always use true for detecting BOM at any time.
>>     - support.cs : Handle preamble_size precisely.
>>     - UTF8Encoding.cs : it should not skip U+FEFF. This fixes
>>       bug #73086 and #75065.
>>
>> They should be applied at a time, except for a).
>>
>> Atsushi Eno
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: autodetect-encoding-bom.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050823/a8f2f2f0/attachment.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: autodetect-encoding-notworking.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050823/a8f2f2f0/attachment-0001.pl 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mwf-build.patch
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050823/a8f2f2f0/attachment-0002.pl 


More information about the Mono-devel-list mailing list