[Monodevelop-devel] Using SQLite as parser database

Mon Jul 28 16:59:08 EDT 2008

Hi,

This discussion should have been done before committing anything to
trunk, but here it is anyway.

Migrating to SQLite only makes sense if it provides noticeable
improvements in performance and memory use. Guessing that it will be
better is not enough. We need real numbers before taking the decision to
switch, and only do it if the numbers are so much better that pay off
the burden of having a dependency on SQLite.

I might be wrong, but I don't believe that SQLite will be better than
the ad-hoc database we are using in MD 1.0. I spent a lot of time tuning
up the parser database, and I'm quite happy about how is it performing.

The old .pidb files are split in two sections. The first section is an
index of the database contents. It has the names of all types in the
project/assembly, sub-class relations, the hierarchy of namespaces, the
source code files, and all the relations between them. This index is
fully loaded in memory and has enough information for the most common
queries: type lookup, getting types in a namespace or file, querying
subclasses, etc.

The second section contains the full information for every class. This
section is never fully loaded in memory. Every class entry in the index
contains the file offset of the full information of the class in the
second section. When needed, that class information is loaded using an
ad-hoc binary serializer, which is as fast as it can be. The number of
classes which are fully loaded in memory is limited to 100. Only the
most 100 recently used classes are fully kept in memory. This limit
prevents the parser database from using too much memory and still keep a
good performance.

However, all this infrastructure is transparent to the database user.
For example, the method GetSubclasses will return a list of IClass
objects, but those IClass objects are not fully loaded, they are just
'proxies'. They only contain information from the index, that is, the
class name, namespace and visibility flags. This is all the information
needed to fill the code completion window for the 'new' keyword, for
example. If more information is requested, such as the class
documentation or the list of members, all that information is lazily
loaded from the database.

The result is that queries on the database are fast (because in many
cases the required information is already in memory), and have a good
memory use (because the biggest section of the database is never fully
loaded). Could it be better? maybe, but I think it is good enough
(especially regarding memory use, it is much better than Visual Studio
and #develop).

Lluis.