[Mono-devel-list] [XSP] encoding bug reloaded

Artur Brodowski bzdurqa at wp.pl
Fri Dec 19 16:52:44 EST 2003


Hello again,

This bug
http://bugzilla.ximian.com/show_bug.cgi?id=51988
was closed due to same results appearing under IIS.
But after some research I found that the results are not exactly
the same, and also, that it's kind of .NET 'this-is-not-a-bug,
this-is-a-feature' ;-D

Unicode standard defines BOM, a signature at the beginning of data
stream, that helps to recognize "whether they [files] are in big or
little endian format — it can also serve as a hint indicating that 
the file is in Unicode" [1]. 
This signature is non-obligatory though, even MSDN states: "It used
to be thought that placing a UTF-8 BOM at the beginning of a file 
was undesirable, but should be respected if it's present. However,
this has been challenged recently [...] So, UTF-8 BOMs are acceptable, 
but don't indicate byte-ordering."[2]

Now back to the case:

- when a .aspx file contains some national characters inside html 
part, or utf string is generated by Response.Write method, XSP 
output is invalid , national chars are shown as they were UTF-16(?),
even though file is recognized as proper UTF-8.
Yes, I've tried setting Response.Charset, http content header and
globalization ;)

- IIS acts the same way, but after you put UTF-8 BOM at the beginning
of the file (for UTF-8 it's three byte sequence: EF BB BF) IIS sends
right chars to the browser. This does not work on XSP.
Another thing that I found out (but I did not check it) is that IIS
treats files wthout signatures as 'defaultly encoded'. Same goes for
Mono/XSP, but default config settings:
<globalization  requestEncoding="utf-8"
                responseEncoding="utf-8"
                fileEncoding="utf-8"/>
seem to be ignored in this case.

- since BOM is 'acceptable' (not mandatory), I think XSP should not
require it. Mono does work this way in other cases, i.e. when parsing
UTF-8 (no BOM) XML files, national characters are displayed properly.
BOM should recognizd though, does IBM ICU library cover this?

Sources:
[1] http://www.unicode.org/unicode/faq/utf_bom.html#BOM
[2] http://msdn.microsoft.com/msdnmag/issues/03/06/WebQA/default.aspx

Gonzalo, should I reopen the bug, or report a new one?

best regards,
Artur Brodowski.
-- 
http://go-mono.pl/




More information about the Mono-devel-list mailing list