[Mono-devel-list] System.Data indexes redesign
S Umadevi
sUmadevi at novell.com
Wed Jul 21 06:57:07 EDT 2004
Hey Boris,
Couple of questions/suggestions. (some of them may be due to my lack of
understanding of your redesign)
1. How are record numbers generated? This is going to be the key to
the performance of the every class using the indexes.
2. how is it going to take care of the versions of the row? Are you
going to generate a new number for a version? And how long is the new
record going to be stored.
3. The Index is going to be a array ? of recordnumbers.. Why an array?
Why not use something like a hashtable where we would get a faster
lookup..
Also deletion/additions to positions are going to pose us problems.
When does an record number change? Does it change at all??
4.The node object will have to remain to hold the datarow. Or are you
thinking of something new?
5. Another suggestion for implementation is that we leave the tree of
indexes the way it is currently and keep the recordnumbers to access it
in hashtable along with the position of the node.
Can also have rowversion arrays that we take care of the different
versions if they exist..
6. Multiple dataviews will again impose an additional problem of
rowviews in different versions.. How are we going to take care of it in
the approach mentioned below..
Also I have just started working on ADO.NET 2.0, but for some of the
batch processing, paging etc, we will reuse lots of this logic. So we
may want to keep that in mind when we design the cache.
Regards
Uma
>>> Boris Kirzner <borisk at mainsoft.com> 7/20/2004 3:44:32 PM >>>
Hello all
In the last week we started thinking about redesign of System.Data
indexes towards Uma's DataView implementation and the latest
System.Data changes.
Following is the brief description of redesign guidelines :
Motivation:
Currently index implemented as balanced tree of nodes, while each node
points to a specific row. Tree is build based on row compare
(sequential
comparing of row key values in natural order of table columns). Each
time we want to access row, a lookup performed in the tree based on key
values.
One of the major disadvantages of this structure is that if we insert
row based on particular version of its key values, it is impossible
to
search the row based on another version of its key values. To solve the
problem, all the actions on tree always based on current row version.
Additional weakness of this approach (in the meaning of performance) is
that huge amount of objects are allocated and deallocated while working
with index.
The last main drawback is that it is impossible to build index to hold
only rows of some particular state. This makes indexes hardly
applicable
for serving data views.
The main goals of the indexes redesign are:
- Simplify management of indexes and take advantage of the changes in
the field of data storage recently made.
- Avoid massive objects allocation and deallocation to speed up the
performance.
- Prepare indexes towards the full implementation of data views.
The main classes affected by redesign are:
Key:
Key is a source of the following basic operations: row/record filtering
(returns number of record for the current row or -1 if the row does not
pass state filter) and record comparing (including columns sort
order).
Each key defined on set of columns, columns sorting order and row state
filter.
Index:
Index implemented as array of record numbers and provides the following
services: rebuild index, update (replace one record of row by another),
delete (remove row record) and find.
Index rebuild is managed by key, as well as record comparing used to
form sorted index.
Index can hold non-unique values, but it manages an up-to-date boolean
indication about this.
Notice: since the record retrieval for row (managed by key) based on
row
state, any operations of index update should be performed prior to
changing row state.
Tables, views and constraints:
In the past table used to hold the list of its indexes, but most of the
index actions performed in the spirit of "one index per one
constraint".
Since we want to extend usage of indexes to view also, this approach
have to be changed.
By the new design table used not only to hold, but also to manage all
of
its indexes. Both constraints and views use indexes services to perform
some particular tasks.
For example, assert of unique constraint does not update index anymore,
but just uses index duplicates indication to decide whether the
assertion succeeded or not.
This implementation will enable easy integration of indexes with data
views.
Record cache:
New design requires fast reverse record-to-row mapping service. This
task carried out by record cache, which holds a reverse record-to-index
mapping.
Row:
To support record cache reverse mapping, the original, current and
proposed records numbers become data row internal properties (thus
providing an easy way to keep the mapping updated).
Any comments/ideas ?
Thanks
Boris
--
Boris Kirzner <borisk at mainsoft.com>
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list
More information about the Mono-devel-list
mailing list