[Mono-devel-list] System.Data.DataSet.ReadXml reimplementation

Mon Jun 20 05:39:23 EDT 2005

Hi Eno,

The main problem was that sometimes the schema was not a root element, 
but its child. This is a common case when DataSet is serialized in .Net. 
The existing ReadXml() failed in this case and did read the schema as a 
content.

Possibly, what needs to be done is when a schema is found at some level, 
it should be inferred and right after it the data should be consumed 
without "caching" into XmlDocument. If you agree with this assumption I 
can make the needed changes, so the performance beauty would be restored.

Regards,
Konstantin Triger

Atsushi Eno wrote:

> When schema inference is required, existing ReadXml() actually
> did read the content into XmlDocument, inferred schema structures,
> and then continued instance consumption using XmlNodeReader.
> The patch just broke the performance beauty.
>
> Anyways you committed the patch without further approval that I
> am not aware of, maybe you must have been approved by anyone else.
> So I have no comments anymore.
>
> Atsushi Eno
>
> Konstantin Triger wrote:
>
>> Hello Eno,
>>
>> Some answers:
>>
>> 1. I need to collect the data into the XmlDocument to store the data 
>> while infering schema. Maybe it can be done only with specific values 
>> of XmlReadMode.
>>
>> 2. The WhitespaceHandling part is for optimization, as we ignore 
>> whitespace anyway.
>>
>> 3. Are you sure that XML Schema inside data content is handled to
>> define data structure definition, while it already started filling
>> the data?
>>
>> Yes, this is a solution to the problems we encountered while sending 
>> typed datasets over WebServices. There we had diffgrams and I made a 
>> generalization for schema too.
>>
>> 4. protected virtual void ReadXmlSerializable (XmlReader reader)
>>         {
>> -            ReadXml (reader);
>> +            ReadXml (reader, XmlReadMode.DiffGram);
>>         }
>>
>> This is the only mode .Net is ready to eat. Considering that the 
>> dataset is serialized to a diffgram, it seems to be true.
>>
>> Regards,
>> Konstantin Triger
>>
>>
>>
>> Atsushi Eno wrote:
>>
>>> Hi Boris,
>>>
>>>>>> Hello all
>>>>>> We're currently working on reimplementation of DataSet.ReadXml.
>>>>>> The need for this fist raised as a result of xml serialization 
>>>>>> bugs in our System.WebServices testsiute, and supported by the 
>>>>>> will to provide
>>>>>> more "clean" implementation.
>>>>>
>>>>>
>>>
>>> Having clean implementation is a nice idea, yes.
>>>
>>>>>> Attached is a diff with current svn version.
>>>>>> The current implementation deals with the task by inspecting some 
>>>>>> pieces of xml while reading it and invoking the corresponding 
>>>>>> actions (read xml, infer schema etc.) if needed. It looks that 
>>>>>> there is a collection of solutions for a lot of private cases 
>>>>>> (each eliminated by some test) threated.
>>>>>
>>>>>
>>>
>>> Sorry but I cannot understand this line, maybe especially
>>> "private cases" treated.
>>>
>>>>>> The main idea of new implementation is to loop through the xml 
>>>>>> reader (until we're on the same depth level), collect its 
>>>>>> attributes and nodes into root of xml document, and after _all_ 
>>>>>> the data is collected - act accordingly. The diffgram and schema 
>>>>>> at first element are threated in special manner.
>>>>>
>>>>>
>>>
>>>>>> What is your opinion about the new implementation?
>>>>>
>>>>>
>>>
>>> Well, I don't really understand why you needed to read all the data
>>> into XmlDocument before filling the data into DataSet. It will harm
>>> performance so significantly, in case that it does not invoke
>>> InferXmlSchema() internally. That complexity is not to read all
>>> the content up into DOM unnecessarily.
>>>
>>> Besides that idea, there seems some reorganization of switching
>>> (if () {...} ... and switch () {...}) which looks nice to me.
>>>
>>>>>> Once again : this is not a ready patch, so do not apply it on 
>>>>>> your working copy, but on the "standalone" one.
>>>>>
>>>>>
>>>
>>> Ok...so I wonder how it makes sense but I put some comments inline:
>>>
>>>
>>>> -            // FIXME: We need more decent code here, but for now
>>>> -            // I don't know the precise MS.NET behavior, I just
>>>> -            // delegate to specific read process.
>>>> -            switch (mode) {
>>>> -            case XmlReadMode.IgnoreSchema:
>>>> -                return ReadXmlIgnoreSchema (input, mode, true);
>>>> -            case XmlReadMode.ReadSchema:
>>>> -                return ReadXmlReadSchema (input, mode, true);
>>>> +            if (reader is XmlTextReader) {
>>>> +                ((XmlTextReader) reader).WhitespaceHandling = 
>>>> WhitespaceHandling.None;
>>>
>>>
>>>
>>> Why did you add that part?
>>>
>>>> +            // If diffgram, then read the first element as 
>>>> diffgram +            if (reader.LocalName == "diffgram" && 
>>>> reader.NamespaceURI == XmlConstants.DiffgrNamespace) {
>>>> +                switch (mode) {
>>>> +                    case XmlReadMode.Auto:
>>>> +                    case XmlReadMode.DiffGram:
>>>> +                        if (DiffLoader == null)
>>>> +                            DiffLoader = new XmlDiffLoader (this);
>>>> +                        DiffLoader.Load (reader);
>>>> +                        // (and leave rest of the reader as is)
>>>> +                        return  XmlReadMode.DiffGram;
>>>> +                    case XmlReadMode.Fragment:
>>>> +                        reader.Skip ();
>>>> +                        // (and continue to read)
>>>> +                        break;
>>>> +                    default:
>>>> +                        reader.Skip ();
>>>> +                        // (and leave rest of the reader as is)
>>>> +                        return mode;
>>>
>>>
>>>
>>> Pull one indent level down here (and every other places like that).
>>>
>>>> +                if (reader.LocalName == "schema" && 
>>>> reader.NamespaceURI == XmlSchema.Namespace) {
>>>> +                    switch (mode) {
>>>> +                        case XmlReadMode.IgnoreSchema:
>>>> +                        case XmlReadMode.InferSchema:
>>>> +                            reader.Skip ();
>>>> +                            break;
>>>> +                       +                        default:
>>>> +                            ReadXmlSchema (reader);
>>>> +                            retMode = XmlReadMode.ReadSchema;
>>>> +                            schemaLoaded = true;
>>>> +                            // (and leave rest of the reader as is)
>>>> +                            break;
>>>> +                    }
>>>
>>>
>>>
>>> Are you sure that XML Schema inside data content is handled to
>>> define data structure definition, while it already started filling
>>> the data?
>>>
>>>> +                if ((reader.LocalName == "diffgram") && 
>>>> (reader.NamespaceURI == XmlConstants.DiffgrNamespace)) {
>>>> +                    if ((mode == XmlReadMode.DiffGram) || (mode == 
>>>> XmlReadMode.IgnoreSchema)
>>>> +                        || mode == XmlReadMode.Auto) {
>>>> +                        if (DiffLoader == null)
>>>> +                            DiffLoader = new XmlDiffLoader (this);
>>>> +                        DiffLoader.Load (reader);
>>>> +                        // (and leave rest of the reader as is)
>>>> +                        retMode = XmlReadMode.DiffGram;
>>>> +                    }
>>>> +                    else
>>>> +                        reader.Skip();
>>>
>>>
>>>
>>> And the same discussion applies to diffgram.
>>>
>>>> @@ -1166,7 +1118,7 @@
>>>>                  protected virtual void ReadXmlSerializable 
>>>> (XmlReader reader)
>>>>          {
>>>> -            ReadXml (reader);
>>>> +            ReadXml (reader, XmlReadMode.DiffGram);
>>>>          }
>>>
>>>
>>>
>>> Is this correct?
>>>
>>> Atsushi Eno
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>>
>>
>>
>