[Mono-dev] mcs patch for default encoding
Atsushi Eno
atsushi at ximian.com
Tue Aug 23 03:55:42 EDT 2005
Oh, actually I have.
I even have a case that does not work with mcs but works with csc -
i.e. the case that csc detects utf-8 regardless of BOM.
I forgot one thing - with regard to that remaining problem, we need
to fix WinForms build (because KeyboardLayout.cs seems to have
raw non-ASCII character:
syntax error, got token `IDENTIFIER'
System.Windows.Forms\KeyboardLayouts.cs(93,51): error CS1526: A new
expression requires () or [] after type
System.Windows.Forms\KeyboardLayouts.cs(97,62): error CS8025: Parsing error
Compilation failed: 2 error(s), 0 warnings
They should be replaced by \uXXXX but I have no idea what those
characters actually are :|
Atsushi Eno
Marek Safar wrote:
> Hello Eno,
>
> Could you write some tests to cover this functionality. I mean e.g.
> simple test file with UTF header.
>
> Thanks,
> Marek
>
>> Hi again,
>>
>>> Agreed. In fact, I was also fixing bug #75065, maybe duplicate.
>>> I have a fix for UTF8Encoding, but it uncovered another mcs bug
>>> which does not handle files with BOM with specific encoding.
>>> To summarize the situation:
>>>
>>> - Currently driver.cs does not process source files with
>>> default encoding.
>>> - UTF8Encoding.cs does not handle U+FEFF correctly.
>>> - When we fix UTF8Encoding.cs to handle U+FEFF, it starts
>>> to reject some source files which has BOM.
>>> (CS8025:Parsing error)
>>> - Even if we fix driver.cs to let StreamReader consider BOM
>>> (currently we disable it), there are still some files
>>> borking.
>>>
>>> Am digging into this bug in depth. Hopefully I'll post a set of
>>> fixes later.
>>
>>
>> ... and now I finished the fixes as was done in the attached patch:
>>
>> - driver.cs :
>> a) uses Encoding.Default for the default input.
>> b) Always use true for detecting BOM at any time.
>> - support.cs : Handle preamble_size precisely.
>> - UTF8Encoding.cs : it should not skip U+FEFF. This fixes
>> bug #73086 and #75065.
>>
>> They should be applied at a time, except for a).
>>
>> Atsushi Eno
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: autodetect-encoding-bom.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050823/a8f2f2f0/attachment.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: autodetect-encoding-notworking.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050823/a8f2f2f0/attachment-0001.pl
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: mwf-build.patch
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050823/a8f2f2f0/attachment-0002.pl
More information about the Mono-devel-list
mailing list