[Mono-devel-list] System.Data indexes redesign

Boris Kirzner borisk at mainsoft.com
Thu Jul 22 03:06:12 EDT 2004


Hello Uma

S Umadevi wrote:

>Hey Boris,
>  
>
>Couple of questions/suggestions. (some of them may be due to my lack of
>understanding of your redesign)
>
>1.  How are record numbers generated? This is going to be the key to
>the performance of the every class using the indexes.
>  
>
Record numbers are not generated, but we use the actual record numbers 
from record manager

>2.  how is it going to take care of the versions of the row? Are you
>going to generate a new number for a version? And how long is the new
>record going to be stored.
>  
>
When updating an index, one of row's record is choosed depending on 
record state and key row state filter

>3. The Index is going to be a array ? of recordnumbers.. Why an array?
>Why not use something like a hashtable where we would get a faster
>lookup..
>  
>
I don't think that hash is applicable here, since there is no both 
simple and effective way to generate unique key values for rows.
We can compare two rows to determine which is "bigger" in the meaning of 
some key, but can't create row key value itself.

>Also deletion/additions to positions are going to pose us problems.
>  
>
We keep index sorted, so update will take O(log(n)).

>When does an record number change? Does it change at all??
>  
>
Record number does not change, but it can be replaced with another 
record number when row state changes (EndEdit() etc.).

>4.The node object will have to remain to hold the datarow. Or are you
>thinking of something new?  
>  
>
We'll manage reverse mapping array of records to rows in record manager, 
so having record number we can get its row by O(1)

>5. Another suggestion for implementation is that we leave the tree of
>indexes the way it is currently and  keep the recordnumbers to access it
>in hashtable along with the position of the node.
>Can also have rowversion arrays that we take care of the different
>versions if they exist..
>  
>
Reverse mapping solve this problem

>6. Multiple dataviews will again impose an additional problem of
>rowviews in different versions.. How are we going to take care of it in
>the approach mentioned below..
>  
>
Each time we want to serve data view with unique version requirement, 
new index should be built with corresponding row state filter

>Also I have just started working on ADO.NET 2.0, but for some of the
>batch processing, paging etc, we will reuse lots of this logic. So we
>may want to keep that in mind when we design  the cache.
>
>
>Regards
>Uma
>
>
>
>
>  
>
Thanks
Boris

>>>>Boris Kirzner <borisk at mainsoft.com> 7/20/2004 3:44:32 PM >>>
>>>>        
>>>>
>Hello all
>In the last  week we started thinking about redesign of System.Data 
>indexes towards Uma's  DataView implementation  and the latest 
>System.Data changes.
>Following is the brief description  of redesign  guidelines :
>
>Motivation:
>Currently index implemented as balanced tree of nodes, while each node
>
>points to a specific row. Tree is build based on row compare
>(sequential 
>comparing of row key values in natural order of table columns). Each 
>time we want to access row, a lookup performed in the tree based on key
>
>values.
>One of the major disadvantages of this structure is that if we insert 
>row based on   particular version of its key values, it is impossible
>to 
>search the row based on another version of its key values. To solve the
>
>problem, all the actions on tree always based on current row version.
>Additional weakness of this approach (in the meaning of performance) is
>
>that huge amount of objects are allocated and deallocated while working
>
>with index.
>The last main drawback is that it is impossible to build index to hold
>
>only rows of some particular state. This makes indexes hardly
>applicable 
>for serving data views.
>
>The main goals of the indexes redesign are:
>- Simplify management of indexes and take advantage of the changes in 
>the field of data storage recently made.
>- Avoid massive objects allocation and deallocation to speed up the 
>performance.
>- Prepare indexes towards the full implementation of data views.
>
>The main classes affected by redesign are:
>
>Key:
>Key is a source of the following basic operations: row/record filtering
>
>(returns number of record for the current row or -1 if the row does not
>
>pass state filter) and record comparing (including columns sort
>order).
>Each key defined on set of columns, columns sorting order and row state
>
>filter.
>
>Index:
>Index implemented as array of record numbers and provides the following
>
>services: rebuild index, update (replace one record of row by another),
>
>delete (remove row record) and find.
>Index rebuild is managed by key, as well as record comparing used to 
>form sorted index.
>Index can hold non-unique values, but it manages an up-to-date boolean
>
>indication about this.
>Notice: since the record retrieval for row (managed by key) based on
>row 
>state, any operations of index update should be performed prior to 
>changing row state.
>
>Tables, views and constraints:
>In the past table used to hold the list of its indexes, but most of the
>
>index actions performed in the spirit of "one index per one
>constraint".
>Since we want to extend usage of indexes to view also, this approach 
>have to be changed.
>By the new design table used not only to hold, but also to manage all
>of 
>its indexes. Both constraints and views use indexes services to perform
>
>some particular tasks.
>For example, assert of unique constraint does not update index anymore,
>
>but just uses index duplicates indication to decide whether the 
>assertion succeeded or not.
>This implementation will enable easy integration of indexes with data
>views.
>
>Record cache:
>New design requires fast reverse record-to-row mapping service. This 
>task carried out by record cache, which holds a reverse record-to-index
>
>mapping.
>
>Row:
>To support record cache reverse mapping, the original, current and 
>proposed records numbers become data row internal properties (thus 
>providing an easy way to keep the mapping updated).
>
>Any comments/ideas ?
>
>Thanks
>Boris
>
>  
>

-- 
Boris Kirzner <borisk at mainsoft.com>




More information about the Mono-devel-list mailing list