[Monodevelop-devel] Using SQLite as parser database

Mon Jul 28 21:36:06 EDT 2008

El dl 28 de 07 de 2008 a les 23:44 +0200, en/na Mike Krüger va escriure:
> Hi
> 
> > This discussion should have been done before committing anything to
> > trunk, but here it is anyway.
> > 
> > Migrating to SQLite only makes sense if it provides noticeable
> > improvements in performance and memory use. Guessing that it will be
> > better is not enough. We need real numbers before taking the decision
> > to
> > switch, and only do it if the numbers are so much better that pay off
> > the burden of having a dependency on SQLite.
> > 
> > I might be wrong, but I don't believe that SQLite will be better than
> > the ad-hoc database we are using in MD 1.0. I spent a lot of time
> > tuning
> > up the parser database, and I'm quite happy about how is it
> > performing.
> > 
> 
> Some more benefits for using a real database over an own implementation:
> 
> - It's reliable. Atomic transactions, threading - all solved.

Transactions and threading are not a problem for the parser database, so
that's not a real benefit.

> - It's easy to look into the data using a command line client and SQL
> statements

This is cool, but it has never been a requirement.

> 
> Its not just pure performance. Using a database will allow us for
> example to switch the database software to a new implementation. And
> databases and SQL are very easy to understood and to change.
> 
> I had to think about it too (I implemented the #develop database some
> years ago which monodevelop inherited (but optimized I admit ^^)) -

That's not correct, MD's parser database was written from scratch. The
inherited implementation would load everything in memory and would
collapse the system trying to open the MD solution.

> after thinking about it the decision was easy - A database makes it
> easier to change the model and to make complicated querys more
> efficient.

But curiously they are especially bad at querying hierarchies of objects
(such as namespace hierarchies or subclass hierarchies), unless you
denormalize the tables.

Yes, databases are powerful. They can do amazing things with little
effort. But generic databases don't fit everywhere. A well designed
ad-hoc database may be better than a generic RDBMS, even if less
powerful. Everything depends on the requirements and constraints of the
application. For the specific case of the parser database, here are some
requirements:

      * Type and namespace lookup must be lightning fast. Type and
        namespace queries are used by the parser when resolving types,
        and by code completion when showing completion entries. I don't
        think we can afford hitting the disk for that kind of queries.
      * Most of operations are read operations, so the database must be
        optimized for reading. Write operations are much less common
        than reads.
      * The data is the database is discardable. It is always possible
        to regenerate a database by re-parsing a project or an assembly.
      * Memory usage must be contained. Project and assembly databases
        may contain a huge amount of information.
      * The set of queries that it has to support is known and limited.
        Those are basically:
              * Get all types.
              * Get all types implemented in a file, given the file
                name.
              * Get all subclasses of a class, given class name.
              * Get a type, given the type name.
              * Get all types and sub-namespaces of a namespace.
              * Get all sub-namespaces of a namespace.

A designer has to find the solution that better fulfills the constraints
and requirements of an application. In the parser case, given the
constraints we have, I think an ad-hoc database may perform better. Of
course I might be wrong, since I haven't done any tests with SQLite, but
I yet have to see a proof of the contrary.

> 
> There are many more reasons using database software instead of own data
> storage solutions - otherwise the whole database software would be
> obsolete.

There are also many more reasons for not using databases, otherwise all
applications would be using them.

Lluis.