[Mono-list] UTF-16 and XmlTextReader questions
Atsushi Eno
atsushi at ximian.com
Fri Jul 29 12:37:53 EDT 2005
Hello,
> Hi
>
> I've been feeding some UTF-16 documents to an XmlTextReader lately¹, and
> I've encountered some behavior I have trouble understanding.
>
> I'm working on the basis of a UTF-16-encoded file ("test.xml" in the
> following) containing just the character U+00E1 LATIN SMALL LETTER A
> WITH ACUTE between the opening and the closing of a "foo" tag.
>
> - If this file has no BOM² and no XML text declaration, the
> XmlTextReader chokes on the U+00E1 character (System.ArgumentException:
> Arg_InvalidUTF8), wich is logical since it expects UTF-8³. However :
Yes, it is correct.
> - If this file has no BOM, but an erroneous XML text declaration telling
> it's UTF-8, the XmlTextReader processes the file, simply discarding the
> offending U+00E1. Shouldn't it produce an error in the exact same way as
> the previous case ?
This does not happen on my box. It raises an exception at U+E1 where
XmlTextReader expects a name character and expects '>' since it is
not a valid name character.
> - If the file has a BOM (hexa FE FF), but no XML text declaration, the
> XmlTextReader chokes on the BOM, outputting :
>
> Unhandled Exception: System.Xml.XmlException: Text node cannot appear
> in this state.
> file://test.xml Line 1, position 1.
This does not happen either. Can you please post the exact XML files
that raises the errors?
I tried such document whose binary dump is:
FE FF 00 3C 00 74 00 65 00 73 00 74 002F 00 3E (BOM + <test/> in UTF16)
Atsushi Eno
More information about the Mono-list
mailing list