[Mono-dev] mcs patch for default encoding
Atsushi Eno
atsushi at ximian.com
Tue Aug 23 05:50:58 EDT 2005
I don't think this is acceptable because of its significant
performance loss (reading the entire stream)...
Atsushi Eno
Kornél Pál wrote:
> Hi,
>
> Character set detection.
>
> This code uses a UTF8Encoding with throwOnInvalidBytes. StreamReader
> detects
> BOM (UTF-8, Unicode, Unicode (Big-Endian)). UTF-8 is easy to validate as it
> has strict rules regarding the byte
> representation of character. So it's safe to assume that a text is UTF-8 if
> it can be parsed as UTF-8. UTF8Encoding (with throwOnInvalidBytes) throws
> ArgumentException when it is
> not UTF-8. In this case fall back to Encoding.Default.
>
> Unicode (16-bit) is not detected by csc.exe without BOM so I think we
> shouldn't deal with it.
>
> Kornél
More information about the Mono-devel-list
mailing list