[Mono-devel-list] System.Data indexes redesign

Wed Jul 21 06:57:07 EDT 2004

Hey Boris,

Couple of questions/suggestions. (some of them may be due to my lack of
understanding of your redesign)

1.  How are record numbers generated? This is going to be the key to
the performance of the every class using the indexes.
2.  how is it going to take care of the versions of the row? Are you
going to generate a new number for a version? And how long is the new
record going to be stored.
3. The Index is going to be a array ? of recordnumbers.. Why an array?
Why not use something like a hashtable where we would get a faster
lookup..
Also deletion/additions to positions are going to pose us problems.
When does an record number change? Does it change at all??
4.The node object will have to remain to hold the datarow. Or are you
thinking of something new?  
5. Another suggestion for implementation is that we leave the tree of
indexes the way it is currently and  keep the recordnumbers to access it
in hashtable along with the position of the node.
Can also have rowversion arrays that we take care of the different
versions if they exist..
6. Multiple dataviews will again impose an additional problem of
rowviews in different versions.. How are we going to take care of it in
the approach mentioned below..

Also I have just started working on ADO.NET 2.0, but for some of the
batch processing, paging etc, we will reuse lots of this logic. So we
may want to keep that in mind when we design  the cache.

Regards
Uma

>>> Boris Kirzner <borisk at mainsoft.com> 7/20/2004 3:44:32 PM >>>
Hello all
In the last  week we started thinking about redesign of System.Data 
indexes towards Uma's  DataView implementation  and the latest 
System.Data changes.
Following is the brief description  of redesign  guidelines :

Motivation:
Currently index implemented as balanced tree of nodes, while each node

points to a specific row. Tree is build based on row compare
(sequential 
comparing of row key values in natural order of table columns). Each 
time we want to access row, a lookup performed in the tree based on key

values.
One of the major disadvantages of this structure is that if we insert 
row based on   particular version of its key values, it is impossible
to 
search the row based on another version of its key values. To solve the

problem, all the actions on tree always based on current row version.
Additional weakness of this approach (in the meaning of performance) is

that huge amount of objects are allocated and deallocated while working

with index.
The last main drawback is that it is impossible to build index to hold

only rows of some particular state. This makes indexes hardly
applicable 
for serving data views.

The main goals of the indexes redesign are:
- Simplify management of indexes and take advantage of the changes in 
the field of data storage recently made.
- Avoid massive objects allocation and deallocation to speed up the 
performance.
- Prepare indexes towards the full implementation of data views.

The main classes affected by redesign are:

Key:
Key is a source of the following basic operations: row/record filtering

(returns number of record for the current row or -1 if the row does not

pass state filter) and record comparing (including columns sort
order).
Each key defined on set of columns, columns sorting order and row state

filter.

Index:
Index implemented as array of record numbers and provides the following

services: rebuild index, update (replace one record of row by another),

delete (remove row record) and find.
Index rebuild is managed by key, as well as record comparing used to 
form sorted index.
Index can hold non-unique values, but it manages an up-to-date boolean

indication about this.
Notice: since the record retrieval for row (managed by key) based on
row 
state, any operations of index update should be performed prior to 
changing row state.

Tables, views and constraints:
In the past table used to hold the list of its indexes, but most of the

index actions performed in the spirit of "one index per one
constraint".
Since we want to extend usage of indexes to view also, this approach 
have to be changed.
By the new design table used not only to hold, but also to manage all
of 
its indexes. Both constraints and views use indexes services to perform

some particular tasks.
For example, assert of unique constraint does not update index anymore,

but just uses index duplicates indication to decide whether the 
assertion succeeded or not.
This implementation will enable easy integration of indexes with data
views.

Record cache:
New design requires fast reverse record-to-row mapping service. This 
task carried out by record cache, which holds a reverse record-to-index

mapping.

Row:
To support record cache reverse mapping, the original, current and 
proposed records numbers become data row internal properties (thus 
providing an easy way to keep the mapping updated).

Any comments/ideas ?

Thanks
Boris

-- 
Boris Kirzner <borisk at mainsoft.com>

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com 
http://lists.ximian.com/mailman/listinfo/mono-devel-list