[Monodevelop-devel] Using SQLite as parser database

Michael Hutchinson m.j.hutchinson at gmail.com
Mon Jul 28 23:40:21 EDT 2008


On Mon, Jul 28, 2008 at 9:36 PM, Lluis Sanchez Gual <lluis at novell.com> wrote:
> El dl 28 de 07 de 2008 a les 23:44 +0200, en/na Mike Krüger va escriure:
>> Hi
>>
>> > This discussion should have been done before committing anything to
>> > trunk, but here it is anyway.
>> >
>> > Migrating to SQLite only makes sense if it provides noticeable
>> > improvements in performance and memory use. Guessing that it will be
>> > better is not enough. We need real numbers before taking the decision
>> > to
>> > switch, and only do it if the numbers are so much better that pay off
>> > the burden of having a dependency on SQLite.
>> >
>> > I might be wrong, but I don't believe that SQLite will be better than
>> > the ad-hoc database we are using in MD 1.0. I spent a lot of time
>> > tuning
>> > up the parser database, and I'm quite happy about how is it
>> > performing.
>> >
>>
>> Some more benefits for using a real database over an own implementation:
>>
>> - It's reliable. Atomic transactions, threading - all solved.
>
> Transactions and threading are not a problem for the parser database, so
> that's not a real benefit.

Multithreaded *writes* could be useful if we're parsing on multiple
cores, but those aren't going to happen with SQLite just yet anyway...

>> - It's easy to look into the data using a command line client and SQL
>> statements
>
> This is cool, but it has never been a requirement.

I found this appealing :-)

>> Its not just pure performance. Using a database will allow us for
>> example to switch the database software to a new implementation. And
>> databases and SQL are very easy to understood and to change.
>>
>> I had to think about it too (I implemented the #develop database some
>> years ago which monodevelop inherited (but optimized I admit ^^)) -
>
> That's not correct, MD's parser database was written from scratch. The
> inherited implementation would load everything in memory and would
> collapse the system trying to open the MD solution.
>
>> after thinking about it the decision was easy - A database makes it
>> easier to change the model and to make complicated querys more
>> efficient.
>
> But curiously they are especially bad at querying hierarchies of objects
> (such as namespace hierarchies or subclass hierarchies), unless you
> denormalize the tables.

Isn't that true of an ad-hoc DB too?

> Yes, databases are powerful. They can do amazing things with little
> effort. But generic databases don't fit everywhere. A well designed
> ad-hoc database may be better than a generic RDBMS, even if less
> powerful. Everything depends on the requirements and constraints of the
> application. For the specific case of the parser database, here are some
> requirements:
>
>      * Type and namespace lookup must be lightning fast. Type and
>        namespace queries are used by the parser when resolving types,
>        and by code completion when showing completion entries. I don't
>        think we can afford hitting the disk for that kind of queries.

AFAIK, SQLite maintains an in-memory cache of configurable size, so as
long as the databases are properly indexed, this shouldn't be an
issue.

>      * Most of operations are read operations, so the database must be
>        optimized for reading. Write operations are much less common
>        than reads.

This is true of SQLite.

>      * The data is the database is discardable. It is always possible
>        to regenerate a database by re-parsing a project or an assembly.
>      * Memory usage must be contained. Project and assembly databases
>        may contain a huge amount of information.
>      * The set of queries that it has to support is known and limited.
>        Those are basically:
>              * Get all types.
>              * Get all types implemented in a file, given the file
>                name.
>              * Get all subclasses of a class, given class name.
>              * Get a type, given the type name.
>              * Get all types and sub-namespaces of a namespace.
>              * Get all sub-namespaces of a namespace.

I would also like to be able to extend the parser DB to store the
C/C++ symbols. This should be relatively easy with SQLite.

It would also be nice to store the amount of times a particular item
has been selected from the code completion list, so that if there are
a number of completion entries that match the currently typed prefix,
the completion window can select the most likely one. Visual Studio
does this and it's *very* useful. Things like this would be pretty
easy to implement with SQLite. We could also maintain various other
tables and indices to accelerate things like "find references", which
would help with speeding up renaming and other such refactorings.

> A designer has to find the solution that better fulfills the constraints
> and requirements of an application. In the parser case, given the
> constraints we have, I think an ad-hoc database may perform better. Of
> course I might be wrong, since I haven't done any tests with SQLite, but
> I yet have to see a proof of the contrary.

This is a fair point, and we should definitely perform benchmarks
before coming to a conclusion.

>> There are many more reasons using database software instead of own data
>> storage solutions - otherwise the whole database software would be
>> obsolete.
>
> There are also many more reasons for not using databases, otherwise all
> applications would be using them.

The main reason I prefer SQLite would be maintainability; I found the
old parser database to be somewhat hard to understand when I was
trying to fix problems.

However, I fully agree that if the performance tests show that the
ad-hoc database is significantly faster and/or less memory hungry,
it's probably worth the effort of maintaining it.

-- 
Michael Hutchinson
http://mjhutchinson.com


More information about the Monodevelop-devel-list mailing list