[Mono-devel-list] System.Data indexes redesign

Tue Jul 20 06:14:32 EDT 2004

Hello all
In the last  week we started thinking about redesign of System.Data 
indexes towards Uma's  DataView implementation  and the latest 
System.Data changes.
Following is the brief description  of redesign  guidelines :

Motivation:
Currently index implemented as balanced tree of nodes, while each node 
points to a specific row. Tree is build based on row compare (sequential 
comparing of row key values in natural order of table columns). Each 
time we want to access row, a lookup performed in the tree based on key 
values.
One of the major disadvantages of this structure is that if we insert 
row based on   particular version of its key values, it is impossible to 
search the row based on another version of its key values. To solve the 
problem, all the actions on tree always based on current row version.
Additional weakness of this approach (in the meaning of performance) is 
that huge amount of objects are allocated and deallocated while working 
with index.
The last main drawback is that it is impossible to build index to hold 
only rows of some particular state. This makes indexes hardly applicable 
for serving data views.

The main goals of the indexes redesign are:
- Simplify management of indexes and take advantage of the changes in 
the field of data storage recently made.
- Avoid massive objects allocation and deallocation to speed up the 
performance.
- Prepare indexes towards the full implementation of data views.

The main classes affected by redesign are:

Key:
Key is a source of the following basic operations: row/record filtering 
(returns number of record for the current row or -1 if the row does not 
pass state filter) and record comparing (including columns sort order).
Each key defined on set of columns, columns sorting order and row state 
filter.

Index:
Index implemented as array of record numbers and provides the following 
services: rebuild index, update (replace one record of row by another), 
delete (remove row record) and find.
Index rebuild is managed by key, as well as record comparing used to 
form sorted index.
Index can hold non-unique values, but it manages an up-to-date boolean 
indication about this.
Notice: since the record retrieval for row (managed by key) based on row 
state, any operations of index update should be performed prior to 
changing row state.

Tables, views and constraints:
In the past table used to hold the list of its indexes, but most of the 
index actions performed in the spirit of "one index per one constraint".
Since we want to extend usage of indexes to view also, this approach 
have to be changed.
By the new design table used not only to hold, but also to manage all of 
its indexes. Both constraints and views use indexes services to perform 
some particular tasks.
For example, assert of unique constraint does not update index anymore, 
but just uses index duplicates indication to decide whether the 
assertion succeeded or not.
This implementation will enable easy integration of indexes with data views.

Record cache:
New design requires fast reverse record-to-row mapping service. This 
task carried out by record cache, which holds a reverse record-to-index 
mapping.

Row:
To support record cache reverse mapping, the original, current and 
proposed records numbers become data row internal properties (thus 
providing an easy way to keep the mapping updated).

Any comments/ideas ?

Thanks
Boris

-- 
Boris Kirzner <borisk at mainsoft.com>