[Mono-dev] Performance issue with DataTable.Load on "large" data sets

Nicklas Overgaard nicklas at isharp.dk
Thu Apr 21 02:25:28 EDT 2011


That's true!

However, I'm currently very hung up on finishing the client's project,
but once I have finished that, I will have some spare-time to dig into
this issue.

Thanks for the guidance so far :)

Happy easter to everyone!

/Nicklas

On Wed, 2011-04-13 at 13:07 +0100, Alan wrote:
> Hey,
> 
> On Tue, Apr 12, 2011 at 11:09 AM, Nicklas Overgaard <nicklas at isharp.dk> wrote:
> > Hey Alan,
> >
> > Thanks for picking it up :)
> >
> >> Firstly the simple change of moving the BeginLoad/EndLoad out of the
> >> loop could easily be committed as a separate patch. If it's possible
> >> to verify this change with an additional unit test, all the better! It
> >> means it can never regress again.
> >
> > Well, the thing is that the simple move of Begin/End load actually
> > breaks four of the tests. However, after reviewing the test code, i'm
> > seriously doubting that the test is correct - hence the question about
> > having verified it on windows :)
> 
> In that cast running those 4 tests on the microsoft implementation
> would be the way forward. If they pass there then you know the change
> requires further modifications to be correct. If they fail, then you'd
> just have to update them so that they pass. Note that in that case
> you'll have to run the tests under the 2.0, 3.0 and 4.0 frameworks in
> case it was a behavioural change between newer and older runtimes. The
> perf improvement is definitely worth the time this will take :)
> 
> Alan
> 
> >
> > The patch along with a little graph showing the performance improvement
> > has been attached.
> >
> > I hope that someone with more insigt in System.Data can shed some light
> > on the now-broken unit tests.
> >
> > I will get back when i have "fixed" the remaining issues, which also
> > gives more performance.
> >
> > And thanks for the tips about testing it on windows. I will figure
> > something out.
> >
> > Best regards,
> >
> > Nicklas
> >
> > On Tue, 2011-04-12 at 10:38 +0100, Alan wrote:
> >> Hey,
> >>
> >> Firstly the simple change of moving the BeginLoad/EndLoad out of the
> >> loop could easily be committed as a separate patch. If it's possible
> >> to verify this change with an additional unit test, all the better! It
> >> means it can never regress again.
> >>
> >> As for the failing tests, the simplest thing to do would be to
> >> copy/paste the test assembly from linux to windows and execute it
> >> there to see if all the tests pass. If that doesn't work you could try
> >> copying/pasting the individual tests you want to verify, compiling
> >> them on windows and executing that. The complicated way of testing
> >> would be to check out mono from git, build it on windows and then run
> >> the tests. Either way, a commit which regresses tests can't be
> >> accepted unless those tests can be proven to be incorrect (i.e. the
> >> fail under MS .NET). It's also possible that these are behavioural
> >> differences between .NET 3 and .NET 4, in which case the modifications
> >> would have to be conditionally built.
> >>
> >> Alan
> >>
> >> On Tue, Apr 12, 2011 at 9:41 AM, Nicklas Overgaard <nicklas at isharp.dk> wrote:
> >> > Hi again,
> >> >
> >> > I have now made further optimizations, which brings the Load method up
> >> > to speed with the .net implementation. However, 5 of the
> >> > regression-tests are now failing.
> >> >
> >> > Have all these System.Data regression tests been verified on a windows
> >> > machine with .net? I just don't want to chase bugs / regressions that
> >> > does not exist/are not valid :)
> >> >
> >> > Best regards,
> >> >
> >> > Nicklas
> >> >
> >> > On Thu, 2011-04-07 at 20:13 +0200, Nicklas Overgaard wrote:
> >> >> Hi again,
> >> >>
> >> >> Sorry for the spamming.
> >> >>
> >> >> Moving out the "Begin" and "End" load methods reduced DataTable.Load
> >> >> time to 1.7 seconds on my test machine, so we are getting there!
> >> >>
> >> >> /Nicklas
> >> >>
> >> >> On Thu, 2011-04-07 at 19:29 +0200, Nicklas Overgaard wrote:
> >> >> > Hi again,
> >> >> >
> >> >> > I now have a profile log, created with the new mono profiler. It shows,
> >> >> > that the method "EndLoadData" is using up almost all of the time (16
> >> >> > minutes of the 17 minutes it took to create the dump).
> >> >> >
> >> >> > When looking in the file "DbDataAdapter.cs" line 355 in current GIT
> >> >> > head, the "BeginLoadData" and "EndLoadData" methods are called for each
> >> >> > iteration in the DataReader's data.
> >> >> >
> >> >> > This means that for each row we add to the DataTable, the DataSet is
> >> >> > begin asked to enforce constraints and other stuff in the datatable.
> >> >> >
> >> >> > According to MSDN:
> >> >> > http://msdn.microsoft.com/en-us/library/system.data.datatable.beginloaddata.aspx
> >> >> >
> >> >> > "BeginLoadData Turns off notifications, index maintenance, and
> >> >> > constraints while loading data."
> >> >> >
> >> >> > So would'nt it make sense to move "BeginLoad.." and "EndLoad.." out of
> >> >> > the loop?
> >> >> >
> >> >> > Well, I'm trying it out :)
> >> >> >
> >> >> > Best regards,
> >> >> >
> >> >> > Nicklas Overgaard
> >> >> >
> >> >> > On Thu, 2011-04-07 at 14:58 +0200, Nicklas Overgaard wrote:
> >> >> > > Hi mono-devers!
> >> >> > >
> >> >> > > I'm currently working on a rather large webproject, where we are using a
> >> >> > > combination of mono 2.10.1 and MySQL.
> >> >> > >
> >> >> > > Over the past week, I have observed that loading "large" datasets (5000+
> >> >> > > rows) from mysql into a DataTable takes a very long time.
> >> >> > >
> >> >> > > It's done somewhat like this:
> >> >> > > <code>
> >> >> > >
> >> >> > > comm.CommandText = query;
> >> >> > > comm.CommandTimeout = MySQLConnection.timeout;
> >> >> > > MySqlDataReader reader = (MySqlDataReader)comm.ExecuteReader();
> >> >> > > DataTable dt = new DataTable();
> >> >> > > dt.Load(reader); // <- this is killing mono
> >> >> > > reader.Close();
> >> >> > >
> >> >> > > </code>
> >> >> > >
> >> >> > > I have created a small testprogram, compiled it on my linux machine and
> >> >> > > executed it.
> >> >> > >
> >> >> > > It takes 15 seconds to do such operation under mono - but on windows it
> >> >> > > takes only 0.4 seconds (with the same executable, fetching the same
> >> >> > > data). I have profiled the application on windows, and it seems that
> >> >> > > the .net framework is using specialized methods for loading data from a
> >> >> > > datareader.
> >> >> > >
> >> >> > > I have been looking through the implementation in mono, in regard to
> >> >> > > DataTable.Load, and I can see that a lot of validation and other stuff
> >> >> > > is going on, which could explain the huge difference. I'm also working
> >> >> > > on a mono log profile trace, to dig a little deeper.
> >> >> > >
> >> >> > > Would it be OK, if I tried to patch the current mono implementation to
> >> >> > > gain the same speeds as .net? The reason for asking, is that I know that
> >> >> > > I cannot contribute to Mono if I have seen the actual code in .NET (but
> >> >> > > does a profile result count as "seeing the code"?)
> >> >> > >
> >> >> > > Best regards,
> >> >> > >
> >> >> > > Nicklas Overgaard
> >> >> > >
> >> >> > > _______________________________________________
> >> >> > > Mono-devel-list mailing list
> >> >> > > Mono-devel-list at lists.ximian.com
> >> >> > > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >> >> >
> >> >> > _______________________________________________
> >> >> > Mono-devel-list mailing list
> >> >> > Mono-devel-list at lists.ximian.com
> >> >> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >> >>
> >> >> _______________________________________________
> >> >> Mono-devel-list mailing list
> >> >> Mono-devel-list at lists.ximian.com
> >> >> http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >> >
> >> > _______________________________________________
> >> > Mono-devel-list mailing list
> >> > Mono-devel-list at lists.ximian.com
> >> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >> >
> >
> > _______________________________________________
> > Mono-devel-list mailing list
> > Mono-devel-list at lists.ximian.com
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >
> >




More information about the Mono-devel-list mailing list