[Mono-list] Problem with XmlTextReader

Atsushi Eno atsushi@ximian.com
Thu, 22 Jul 2004 03:14:37 +0900


Assuming you also wanted to reply to the list...

David Waite wrote:

> I might be misunderstanding the problem, but...
> It seems like the stream could simply be Read() from; if there is no
> data, the read will block, if there is data it can go into an internal
> buffer. That buffer gets examined to see if there is enough
> information to identify and represent the next event in the pull
> parser; if not, you call Read() again, which may block again. MoveTo*
> and Read* methods will just keep reading until the conditions
> specified by the method are satisfied.

What I've written in the previous post is "how Read() events do not
match with the *required* text stream consumption for one Read() call".
The TextReader (stream) is likely to be set as empty (not set any
contents) and thus it easily freezes. In such case are those stream
developers aware of how XmlTextReader consumes the stream at any time?

(BTW MoveTo*() never consumes the stream. Those methods must be
available for iterating attributes.)

> I don't think the original poster's problem was that it read a little
> more than needed, but that it seems to read a lot more than needed,
> i.e. after enough data about the event has been received by the local
> stream, more data is still needed in order to exit Read()

I could not get how we can share the concept of "a lot". If just
the buffer size matters? (If my monitoring && my memory were correct)
Microsoft implementation always read 4096 bytes from the stream.
Our buffer size is smaller.

What is the benefit by changing XmlTextReader as such?

	- As discussed above, developers won't be aware of how
	  XmlTextReader consumes steam unlike their expectations.
	- At least it will require 1024 times call to Read()
	  instead of one call to Read(char[], int, int).
	- There is GetRemainder() method that implies XmlTextReader
	  consumes the stream more than "required" (they think) size.
	  Developers could be aware of such stream consumption.

I think there we just lose performance improvements, in vain.

Atsushi Eno

> On Wed, 21 Jul 2004 13:09:14 +0900, Atsushi Eno <atsushi@ximian.com> wrote:
>> > Thanks for answering my question. but the problem is that i
>> > feel the XmlTextReader, just reads to the end of the stream
>> > when it is instantiated (thats why it takes so long to instantiate)
>> > and keeps this in memory. Then when the client asks for a Read()
>> > he just gives the next element or whatever. Shouldn't the parser
>> > be reading off the stream when the client gives a Read()?
>>In short, no. It is impossible.
>>I have one easy answer: for such TextReader whose CanSeek
>>is false (can not Peek()), we will have to cache the peek character
>>anyways. It have to continue reading until it encounters '<', but
>>you won't send the next '<foo>' element. Thus, such "wait & see"
>>way won't work anyways.
>>Another complicated case: Suppose you are going to read such XML
>>document like:
>>        <!DOCTYPE foo SYSTEM "foo.dtd">
>>        <root>
>>        &amp; &quot; &apos; &lt; &gt; are character entity.
>>        external &ent; &amp; &not; are defined in foo.dtd.
>>        </root>
>>There general entity "ent" and "not" are defined in foo.dtd.
>>When you call the third Read(), then XmlTextReader tries to
>>read the text node inside root element. Read() will return
>>true and then it represents Text node that ends with "external "
>>(immediate before &ent;).
>>When XmlTextReader found '&', it MUST NOT stop the parse since
>>it cannoy identify if following markup represents character
>>entity (&amp;, &quot;, &apos;, &lt; or &gt;) or general entity
>>without reading the folloing text stream.
>>Atsushi Eno
>>Mono-list maillist  -  Mono-list@lists.ximian.com