[Mono-dev] [Mono-devel-list] System.Data : DataTable.Select performance

Boris Kirzner borisk at mainsoft.com
Tue Sep 13 09:29:42 EDT 2005

Hello all,

The example attached demonstrates the performance problem in
DataTable.Select - we run 30-40 times slower than .Net.
The cause for this problem is creating a new index for each select
executed without actually adding the newly created index into the table
indexes list.

Attached is the (not final yet) proposed patch for solving this problem.
The solution can be divided into the following main points :
- The indexes for select is achieved using GetIndex() so if the new
index is created it is also added into table indexes list to be reused
in the future
- All the classes implementing IExpression got an implementation of
Equals() method, so each index once created for the particular select
statement can be easily found using Index.Equals() (propagates to
Key.Equals() and, among others, to IExpression.Equals()). Note a bit
tricky implementation of RelatedDataView.Equals().
- Some bugs in index updating were solved (was previously hidden since
the indexes were never reused).

This patch improves a performance to be "only" 2 times slower than .Net.
Note that there is no performance improvement on the commented out
section since indexes are created per each different select expression.

There is two additional problems :
- Not that overall number of the rows returned differs from .Net. It is
a separate bug (it happens before the update also).
- As you can see,  MonoTests_System.Data.DataTableTest2.Select_ByFilter
fails fails with this fix. The failure is caused by the reason that the
previous test one throws an exception while executing Select() (as
supposed), thus leaving the index in the inconsistent state, so
additional Select() that reuses this index returns wrong result. The
problem is a bit more general - there are some more cases we suppose
index update or rebuild procedure to fail with exception, leaving index
in inconsistent state. 

The question is what side we want to recognize the problem and to be
responsible for removing the broken index : the index or the callee?
Each of the solutions have both good and bad sides : solution on the
index side looks bad because index needs to intervent into its parent
object data structures and the callee side solution requires changes in
multiple places plus integration of the solution in each of the future
index uses.

Any critics/ideas on the issue?


Boris Kirzner
Mono R&D team, Mainsoft Corporation.
Blogging at http://boriskirzner.blogspot.com/ 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: system.data.indexes.patch
Type: application/octet-stream
Size: 15080 bytes
Desc: system.data.indexes.patch
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050913/a0ff6a1e/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Class1.cs
Type: application/octet-stream
Size: 1716 bytes
Desc: Class1.cs
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050913/a0ff6a1e/attachment-0001.obj 

More information about the Mono-devel-list mailing list