[Mono-list] XmlTextReader: MS compatibility, or W3C conformance?

Ian MacLean ianm@ActiveState.com
Fri, 09 Jul 2004 13:26:26 +0900

Atsushi Eno wrote:

> Ian MacLean wrote:
>> Atsushi Eno wrote:
>>> Hello,
>>> On bugzilla we (Ian and I) were discussing on XmlTextReader conformance
>>> to XML specification. MS XmlTextReader is buggy since it accepts
>>> XML declaration as element content (that violates W3C XML specification
>>> section 3 Logical Structures).
>>> http://bugzilla.ximian.com/show_bug.cgi?id=61274
>>> However, there is another discussion that it is useful that new
>>> XmlTextReader (xmlText, XmlNodeType.Element, null) accepts XML 
>>> declaration.
>>> Well, I agree that
>>>     - that error-prone XmlTextReader might be useful (especially
>>>       for people who already depends on that behavior).
>>>     - we did not always reject Microsoft badness; for example
>>>       we are copying System.Xml.XmlCDataSection that violates
>>>       W3C DOM interface hierarchy (!)
>>> So it is case by case. I believe we should not allow such use of
>>> XmlTextReader, but I understand what Ian wants me to do. The "fix"
>>> can be very easily done.
>>> I don't think it is major problem. Users can easily fix this problem
>>> by calling MoveToContent(), or by skipping XmlDeclaration node with
>>> Read() method (well, to call Read() safely, users have to check if
>>> the reader state is Initial or not).
>> Its not a major problem but your workaround above only works if every 
>> fragment you want to parse follows Document constraints - eg single 
>> root node. What I have done now is check the incoming xml fragment 
>> for an xml decl and if present use XmlNodeType.Document otherwise use 
>> XmlNodeType.Element.
> If XmlTextReader created with XmlNodeType.Element does not accept
> multiple top-level element, that is a bug (if so, please create
> another bugzilla entry). If you want such xml that has "XML
> declaration and multiple top-level elements", that sounds curious
> needs and I wonder what kind of use case appreciates such fix :-?

you misunderstand. XmlNodeType.Element does accept multiple top-level 
elements fine. And I have no need of documents with xml decl and 
multiple top level elements.
What I have is some code that parses xml text fragments and those 
fragments can be either:
- a complete document ( with or without xml decl )
- an element or nodelist ( of course without xml decl ).

Since my parsing method just takes a string argument I need to determine 
which of the above it is so that I know the appropriate XmlNodeType 
argument to pass to XmlValidatingReader.