[Mono-dev] mcs patch for default encoding

Atsushi Eno atsushi at ximian.com
Tue Aug 23 05:50:58 EDT 2005


I don't think this is acceptable because of its significant
performance loss (reading the entire stream)...

Atsushi Eno

Kornél Pál wrote:
> Hi,
> 
> Character set detection.
> 
> This code uses a UTF8Encoding with throwOnInvalidBytes. StreamReader 
> detects
> BOM (UTF-8, Unicode, Unicode (Big-Endian)). UTF-8 is easy to validate as it
> has strict rules regarding the byte
> representation of character. So it's safe to assume that a text is UTF-8 if
> it can be parsed as UTF-8. UTF8Encoding (with throwOnInvalidBytes) throws
> ArgumentException when it is
> not UTF-8. In this case fall back to Encoding.Default.
> 
> Unicode (16-bit) is not detected by csc.exe without BOM so I think we
> shouldn't deal with it.
> 
> Kornél




More information about the Mono-devel-list mailing list