[Mono-devel-list] System.Data.DataSet.ReadXml reimplementation
Konstantin Triger
kostat at mainsoft.com
Mon Jun 20 05:39:23 EDT 2005
Hi Eno,
The main problem was that sometimes the schema was not a root element,
but its child. This is a common case when DataSet is serialized in .Net.
The existing ReadXml() failed in this case and did read the schema as a
content.
Possibly, what needs to be done is when a schema is found at some level,
it should be inferred and right after it the data should be consumed
without "caching" into XmlDocument. If you agree with this assumption I
can make the needed changes, so the performance beauty would be restored.
Regards,
Konstantin Triger
Atsushi Eno wrote:
> When schema inference is required, existing ReadXml() actually
> did read the content into XmlDocument, inferred schema structures,
> and then continued instance consumption using XmlNodeReader.
> The patch just broke the performance beauty.
>
> Anyways you committed the patch without further approval that I
> am not aware of, maybe you must have been approved by anyone else.
> So I have no comments anymore.
>
> Atsushi Eno
>
> Konstantin Triger wrote:
>
>> Hello Eno,
>>
>> Some answers:
>>
>> 1. I need to collect the data into the XmlDocument to store the data
>> while infering schema. Maybe it can be done only with specific values
>> of XmlReadMode.
>>
>> 2. The WhitespaceHandling part is for optimization, as we ignore
>> whitespace anyway.
>>
>> 3. Are you sure that XML Schema inside data content is handled to
>> define data structure definition, while it already started filling
>> the data?
>>
>> Yes, this is a solution to the problems we encountered while sending
>> typed datasets over WebServices. There we had diffgrams and I made a
>> generalization for schema too.
>>
>> 4. protected virtual void ReadXmlSerializable (XmlReader reader)
>> {
>> - ReadXml (reader);
>> + ReadXml (reader, XmlReadMode.DiffGram);
>> }
>>
>> This is the only mode .Net is ready to eat. Considering that the
>> dataset is serialized to a diffgram, it seems to be true.
>>
>> Regards,
>> Konstantin Triger
>>
>>
>>
>> Atsushi Eno wrote:
>>
>>> Hi Boris,
>>>
>>>>>> Hello all
>>>>>> We're currently working on reimplementation of DataSet.ReadXml.
>>>>>> The need for this fist raised as a result of xml serialization
>>>>>> bugs in our System.WebServices testsiute, and supported by the
>>>>>> will to provide
>>>>>> more "clean" implementation.
>>>>>
>>>>>
>>>
>>> Having clean implementation is a nice idea, yes.
>>>
>>>>>> Attached is a diff with current svn version.
>>>>>> The current implementation deals with the task by inspecting some
>>>>>> pieces of xml while reading it and invoking the corresponding
>>>>>> actions (read xml, infer schema etc.) if needed. It looks that
>>>>>> there is a collection of solutions for a lot of private cases
>>>>>> (each eliminated by some test) threated.
>>>>>
>>>>>
>>>
>>> Sorry but I cannot understand this line, maybe especially
>>> "private cases" treated.
>>>
>>>>>> The main idea of new implementation is to loop through the xml
>>>>>> reader (until we're on the same depth level), collect its
>>>>>> attributes and nodes into root of xml document, and after _all_
>>>>>> the data is collected - act accordingly. The diffgram and schema
>>>>>> at first element are threated in special manner.
>>>>>
>>>>>
>>>
>>>>>> What is your opinion about the new implementation?
>>>>>
>>>>>
>>>
>>> Well, I don't really understand why you needed to read all the data
>>> into XmlDocument before filling the data into DataSet. It will harm
>>> performance so significantly, in case that it does not invoke
>>> InferXmlSchema() internally. That complexity is not to read all
>>> the content up into DOM unnecessarily.
>>>
>>> Besides that idea, there seems some reorganization of switching
>>> (if () {...} ... and switch () {...}) which looks nice to me.
>>>
>>>>>> Once again : this is not a ready patch, so do not apply it on
>>>>>> your working copy, but on the "standalone" one.
>>>>>
>>>>>
>>>
>>> Ok...so I wonder how it makes sense but I put some comments inline:
>>>
>>>
>>>> - // FIXME: We need more decent code here, but for now
>>>> - // I don't know the precise MS.NET behavior, I just
>>>> - // delegate to specific read process.
>>>> - switch (mode) {
>>>> - case XmlReadMode.IgnoreSchema:
>>>> - return ReadXmlIgnoreSchema (input, mode, true);
>>>> - case XmlReadMode.ReadSchema:
>>>> - return ReadXmlReadSchema (input, mode, true);
>>>> + if (reader is XmlTextReader) {
>>>> + ((XmlTextReader) reader).WhitespaceHandling =
>>>> WhitespaceHandling.None;
>>>
>>>
>>>
>>> Why did you add that part?
>>>
>>>> + // If diffgram, then read the first element as
>>>> diffgram + if (reader.LocalName == "diffgram" &&
>>>> reader.NamespaceURI == XmlConstants.DiffgrNamespace) {
>>>> + switch (mode) {
>>>> + case XmlReadMode.Auto:
>>>> + case XmlReadMode.DiffGram:
>>>> + if (DiffLoader == null)
>>>> + DiffLoader = new XmlDiffLoader (this);
>>>> + DiffLoader.Load (reader);
>>>> + // (and leave rest of the reader as is)
>>>> + return XmlReadMode.DiffGram;
>>>> + case XmlReadMode.Fragment:
>>>> + reader.Skip ();
>>>> + // (and continue to read)
>>>> + break;
>>>> + default:
>>>> + reader.Skip ();
>>>> + // (and leave rest of the reader as is)
>>>> + return mode;
>>>
>>>
>>>
>>> Pull one indent level down here (and every other places like that).
>>>
>>>> + if (reader.LocalName == "schema" &&
>>>> reader.NamespaceURI == XmlSchema.Namespace) {
>>>> + switch (mode) {
>>>> + case XmlReadMode.IgnoreSchema:
>>>> + case XmlReadMode.InferSchema:
>>>> + reader.Skip ();
>>>> + break;
>>>> + + default:
>>>> + ReadXmlSchema (reader);
>>>> + retMode = XmlReadMode.ReadSchema;
>>>> + schemaLoaded = true;
>>>> + // (and leave rest of the reader as is)
>>>> + break;
>>>> + }
>>>
>>>
>>>
>>> Are you sure that XML Schema inside data content is handled to
>>> define data structure definition, while it already started filling
>>> the data?
>>>
>>>> + if ((reader.LocalName == "diffgram") &&
>>>> (reader.NamespaceURI == XmlConstants.DiffgrNamespace)) {
>>>> + if ((mode == XmlReadMode.DiffGram) || (mode ==
>>>> XmlReadMode.IgnoreSchema)
>>>> + || mode == XmlReadMode.Auto) {
>>>> + if (DiffLoader == null)
>>>> + DiffLoader = new XmlDiffLoader (this);
>>>> + DiffLoader.Load (reader);
>>>> + // (and leave rest of the reader as is)
>>>> + retMode = XmlReadMode.DiffGram;
>>>> + }
>>>> + else
>>>> + reader.Skip();
>>>
>>>
>>>
>>> And the same discussion applies to diffgram.
>>>
>>>> @@ -1166,7 +1118,7 @@
>>>> protected virtual void ReadXmlSerializable
>>>> (XmlReader reader)
>>>> {
>>>> - ReadXml (reader);
>>>> + ReadXml (reader, XmlReadMode.DiffGram);
>>>> }
>>>
>>>
>>>
>>> Is this correct?
>>>
>>> Atsushi Eno
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>>
>>
>>
>
More information about the Mono-devel-list
mailing list