[Mono-devel-list] System.Data indexes redesign
Boris Kirzner
borisk at mainsoft.com
Tue Jul 20 06:14:32 EDT 2004
Hello all
In the last week we started thinking about redesign of System.Data
indexes towards Uma's DataView implementation and the latest
System.Data changes.
Following is the brief description of redesign guidelines :
Motivation:
Currently index implemented as balanced tree of nodes, while each node
points to a specific row. Tree is build based on row compare (sequential
comparing of row key values in natural order of table columns). Each
time we want to access row, a lookup performed in the tree based on key
values.
One of the major disadvantages of this structure is that if we insert
row based on particular version of its key values, it is impossible to
search the row based on another version of its key values. To solve the
problem, all the actions on tree always based on current row version.
Additional weakness of this approach (in the meaning of performance) is
that huge amount of objects are allocated and deallocated while working
with index.
The last main drawback is that it is impossible to build index to hold
only rows of some particular state. This makes indexes hardly applicable
for serving data views.
The main goals of the indexes redesign are:
- Simplify management of indexes and take advantage of the changes in
the field of data storage recently made.
- Avoid massive objects allocation and deallocation to speed up the
performance.
- Prepare indexes towards the full implementation of data views.
The main classes affected by redesign are:
Key:
Key is a source of the following basic operations: row/record filtering
(returns number of record for the current row or -1 if the row does not
pass state filter) and record comparing (including columns sort order).
Each key defined on set of columns, columns sorting order and row state
filter.
Index:
Index implemented as array of record numbers and provides the following
services: rebuild index, update (replace one record of row by another),
delete (remove row record) and find.
Index rebuild is managed by key, as well as record comparing used to
form sorted index.
Index can hold non-unique values, but it manages an up-to-date boolean
indication about this.
Notice: since the record retrieval for row (managed by key) based on row
state, any operations of index update should be performed prior to
changing row state.
Tables, views and constraints:
In the past table used to hold the list of its indexes, but most of the
index actions performed in the spirit of "one index per one constraint".
Since we want to extend usage of indexes to view also, this approach
have to be changed.
By the new design table used not only to hold, but also to manage all of
its indexes. Both constraints and views use indexes services to perform
some particular tasks.
For example, assert of unique constraint does not update index anymore,
but just uses index duplicates indication to decide whether the
assertion succeeded or not.
This implementation will enable easy integration of indexes with data views.
Record cache:
New design requires fast reverse record-to-row mapping service. This
task carried out by record cache, which holds a reverse record-to-index
mapping.
Row:
To support record cache reverse mapping, the original, current and
proposed records numbers become data row internal properties (thus
providing an easy way to keep the mapping updated).
Any comments/ideas ?
Thanks
Boris
--
Boris Kirzner <borisk at mainsoft.com>
More information about the Mono-devel-list
mailing list