[Monodevelop-devel] Using SQLite as parser database

Tue Jul 29 05:31:57 EDT 2008

El dl 28 de 07 de 2008 a les 23:40 -0400, en/na Michael Hutchinson va escriure:
> On Mon, Jul 28, 2008 at 9:36 PM, Lluis Sanchez Gual <lluis at novell.com> wrote:
> > El dl 28 de 07 de 2008 a les 23:44 +0200, en/na Mike Krüger va escriure:
> >> Hi
> >>
> >> > This discussion should have been done before committing anything to
> >> > trunk, but here it is anyway.
> >> >
> >> > Migrating to SQLite only makes sense if it provides noticeable
> >> > improvements in performance and memory use. Guessing that it will be
> >> > better is not enough. We need real numbers before taking the decision
> >> > to
> >> > switch, and only do it if the numbers are so much better that pay off
> >> > the burden of having a dependency on SQLite.
> >> >
> >> > I might be wrong, but I don't believe that SQLite will be better than
> >> > the ad-hoc database we are using in MD 1.0. I spent a lot of time
> >> > tuning
> >> > up the parser database, and I'm quite happy about how is it
> >> > performing.
> >> >
> >>
> >> Some more benefits for using a real database over an own implementation:
> >>
> >> - It's reliable. Atomic transactions, threading - all solved.
> >
> > Transactions and threading are not a problem for the parser database, so
> > that's not a real benefit.
> 
> Multithreaded *writes* could be useful if we're parsing on multiple
> cores, but those aren't going to happen with SQLite just yet anyway...

It isn't a real benefit, given the low rate of writes we require.

> 
> >> - It's easy to look into the data using a command line client and SQL
> >> statements
> >
> > This is cool, but it has never been a requirement.
> 
> I found this appealing :-)

Me too, but software design can't be driven by appealingness.

> 
> >> Its not just pure performance. Using a database will allow us for
> >> example to switch the database software to a new implementation. And
> >> databases and SQL are very easy to understood and to change.
> >>
> >> I had to think about it too (I implemented the #develop database some
> >> years ago which monodevelop inherited (but optimized I admit ^^)) -
> >
> > That's not correct, MD's parser database was written from scratch. The
> > inherited implementation would load everything in memory and would
> > collapse the system trying to open the MD solution.
> >
> >> after thinking about it the decision was easy - A database makes it
> >> easier to change the model and to make complicated querys more
> >> efficient.
> >
> > But curiously they are especially bad at querying hierarchies of objects
> > (such as namespace hierarchies or subclass hierarchies), unless you
> > denormalize the tables.
> 
> Isn't that true of an ad-hoc DB too?

Not always. A hierarchy of objects can be efficiently scanned in memory
without 'denormalizing' it.

> 
> > Yes, databases are powerful. They can do amazing things with little
> > effort. But generic databases don't fit everywhere. A well designed
> > ad-hoc database may be better than a generic RDBMS, even if less
> > powerful. Everything depends on the requirements and constraints of the
> > application. For the specific case of the parser database, here are some
> > requirements:
> >
> >      * Type and namespace lookup must be lightning fast. Type and
> >        namespace queries are used by the parser when resolving types,
> >        and by code completion when showing completion entries. I don't
> >        think we can afford hitting the disk for that kind of queries.
> 
> AFAIK, SQLite maintains an in-memory cache of configurable size, so as
> long as the databases are properly indexed, this shouldn't be an
> issue.

Shouldn't, but we don't know.

> 
> >      * Most of operations are read operations, so the database must be
> >        optimized for reading. Write operations are much less common
> >        than reads.
> 
> This is true of SQLite.

This is a design constraint to take into account to choose the right
solution. I'm not saying that this is a problem in SQLite.

> 
> >      * The data is the database is discardable. It is always possible
> >        to regenerate a database by re-parsing a project or an assembly.
> >      * Memory usage must be contained. Project and assembly databases
> >        may contain a huge amount of information.
> >      * The set of queries that it has to support is known and limited.
> >        Those are basically:
> >              * Get all types.
> >              * Get all types implemented in a file, given the file
> >                name.
> >              * Get all subclasses of a class, given class name.
> >              * Get a type, given the type name.
> >              * Get all types and sub-namespaces of a namespace.
> >              * Get all sub-namespaces of a namespace.
> 
> I would also like to be able to extend the parser DB to store the
> C/C++ symbols. This should be relatively easy with SQLite.
> 
> It would also be nice to store the amount of times a particular item
> has been selected from the code completion list, so that if there are
> a number of completion entries that match the currently typed prefix,
> the completion window can select the most likely one. Visual Studio
> does this and it's *very* useful. Things like this would be pretty
> easy to implement with SQLite.

And with an ad-hoc database even easier. I don't think VS needs a
relational database to implement this.

> We could also maintain various other
> tables and indices to accelerate things like "find references", which
> would help with speeding up renaming and other such refactorings.

Again, you don't have to convince me about how cool having all that info
in a database is. I already know it.

> 
> > A designer has to find the solution that better fulfills the constraints
> > and requirements of an application. In the parser case, given the
> > constraints we have, I think an ad-hoc database may perform better. Of
> > course I might be wrong, since I haven't done any tests with SQLite, but
> > I yet have to see a proof of the contrary.
> 
> This is a fair point, and we should definitely perform benchmarks
> before coming to a conclusion.
> 
> >> There are many more reasons using database software instead of own data
> >> storage solutions - otherwise the whole database software would be
> >> obsolete.
> >
> > There are also many more reasons for not using databases, otherwise all
> > applications would be using them.
> 
> The main reason I prefer SQLite would be maintainability; I found the
> old parser database to be somewhat hard to understand when I was
> trying to fix problems.

Yes, it is hard to understand because it is complex, and it is complex
because it is solving a complex problem. We should refrain the urge of
rewriting all complex software we find, since there may be a good reason
for it being complex.

> 
> However, I fully agree that if the performance tests show that the
> ad-hoc database is significantly faster and/or less memory hungry,
> it's probably worth the effort of maintaining it.

And if tests show that SQLite is better than the ad-hoc database, I also
agree on using it.

Lluis.