[Mono-list] UTF-16 and XmlTextReader questions

Atsushi Eno atsushi at ximian.com
Fri Jul 29 12:37:53 EDT 2005


Hello,

> Hi
> 
> I've been feeding some UTF-16 documents to an XmlTextReader lately¹, and
> I've encountered some behavior I have trouble understanding. 
> 
> I'm working on the basis of a UTF-16-encoded file ("test.xml" in the
> following) containing just the character U+00E1 LATIN SMALL LETTER A
> WITH ACUTE between the opening and the closing of a "foo" tag.
> 
> - If this file has no BOM² and no XML text declaration, the
> XmlTextReader chokes on the U+00E1 character (System.ArgumentException:
> Arg_InvalidUTF8), wich is logical since it expects UTF-8³. However :

Yes, it is correct.

> - If this file has no BOM, but an erroneous XML text declaration telling
> it's UTF-8, the XmlTextReader processes the file, simply discarding the
> offending U+00E1. Shouldn't it produce an error in the exact same way as
> the previous case ?

This does not happen on my box. It raises an exception at U+E1 where
XmlTextReader expects a name character and expects '>' since it is
not a valid name character.

> - If the file has a BOM (hexa FE FF), but no XML text declaration, the
> XmlTextReader chokes on the BOM, outputting :
> 
>  Unhandled Exception: System.Xml.XmlException: Text node cannot appear
>  in this state.
>   file://test.xml Line 1, position 1.

This does not happen either. Can you please post the exact XML files
that raises the errors?

I tried such document whose binary dump is:
FE FF 00 3C 00 74 00 65 00 73 00 74 002F 00 3E  (BOM + <test/> in UTF16)

Atsushi Eno




More information about the Mono-list mailing list