[Mono-dev] DataContext Implementation Advice

Fri Sep 21 06:11:31 EDT 2007

Ok, so I've been able to create a pretty generic (if highly adaptable)
IQueryProvider which implements very basic Linq -> Sql support(More on
that later). This project was to better understand the roll played by
different components in the linq->sql pipeline. While the more
complicated and optimized we try to make the Linq-SQL conversion, the
harder it becomes, the core of implementing a Provider is pretty easy.
We could write custom _query_ providers all day long and have linq to
the world. The problem is that we want updates, deletes, sprocs etc.
That is where the DataContext comes in, and my headaches start.

The principal of the DataContext is easy enough, its like the ADO.NET
2.0 DataSets in that it can be generated as a strongly-typed extension
of the core DataContex, or as a generic and adaptable class. The
DataContext sits in System.Data.Linq, and is the gatekeeper between
the query language and a provider. In addition to just being in the
way, the DataContext provides metadata about a database, has an
understanding of tables and relationships, as well as stored
procedures. It is also responsible for object caches.

This is all fun and good, with some decent mapping code we can do all
that without breaking a sweat. The problem is
DataContext.SubmitChanges(). As tables are manipulated etc. these
changes are tracked in-memory (or some other awesometastic way) and
eventually either reverted or submitted. More importantly, to anyone
using the DataContext's tables gets the modified information. Again,
things don't sound too bad, and there are already implementations
(DataTable, DataSet etc.). The difference is they have all the data,
and execute a modification upon call, then store the old value and the
current state of the data. Linq is all deferred execution, we don't
have an in memory representation of the data on the database, we just
have a list of instructions, each of which will return a result.

So, basically I've determined that out DataContext will have to track
everything happening in each of its tables. This got me thinking, an
event driven model would be perfect, we would only act when actual
action was being taken.  Now, I basically have both ends of the
spectrum, we get every object associated with an expression utilizing
a DataContext, and flag it in our cool DiffTracker, using a Dictionary
or Hashtable for fast lookups based upon the entity itself, then we
process each event as it happens (which is pretty much the last thing
an expression will do). I think we can get away with only the current
state of each noteworthy object and the most recent past.I figure if
the state is Insert, who cares what the object was before, if the
state is Update, we just update all values to their current value,
selecting on the old object, and if its delete.... yeah. There is a
States.cs Enum in the System.Data.Linq directory with all the states a
DataContext  is aware of.

The problem is the massive middleman, I've gotten absolutely worlds
better with some of C#'s more advanced language structures in the few
days I've been working on this, but I am not 100% sure of hows the
best way to go about this. As a result, I would really love some help.
I can get the DataContext to respond intelligently to Linq queries,
and continue to process the subtree into a table. Once we are at that
point, its all on the individual Sql implementation (which isn't
important yet, and isn't really on the immeidate radar.).

If someone could help me out a bit with the actual construction of the
DiffTracker and DiffWrapper classes, it would be greatly appreciated.
You aren't obligated to use my design, or even help at all. But I
would _greatly_ appreciate some feedback or any ideas anyone has about
this. Specifically with regard too keeping this light and simple, as I
feel like it could quickly become a massive buggy mess.

-- 
Cheers,
Kevin Kubasik
http://kubasik.net/blog