R: [Mono-devel-list] Analyzing Subversion logs

Paolo Marini lists at paolomarini.it
Wed Jan 12 10:36:53 EST 2005


Hi Ben,
thanks a lot for your really interesting and quick response.
The core of my thesis consists of extracting and analyzing process
information from Subversion repositories. I have defined several metrics and
I am trying to use them against Open Source projects hosted on Subversion.
The goal is to show that the enormous information available in the
repositories can be used for software engineering studies. For example, it
would be interesting to find how much Agile Metodologies are in use in the
Open Source community.

Here is what happened.
First of all, I have grabbed the repository log (in xml format). It is made
of 37194 commits (from rev. 1 to rev. 38225) and 193237 operations. There
are 202 different authors.
Browsing the log I have extracted a number of interesting statistics
together with something that I found a bit strange.
If we take a look at the three most important authors (in terms of number of
commits) and the distribution of the "distance between commits", we find
that more than 90% of the commits have been done within 24 hours from other
commits (of the same author). Nothing strange, it's just a sign of really
high activity in the project.
Going deep in resolution (within an hour, within a minute, within a second),
we find that the more you go to the left on the time axis, the more events
you find.
Apart to be an interesting result, this fact becomes strange when you notice
that we have (for miguel, for example):
 -  2 occourrencies of commits distant less than 1 second;
 -  1 occourrencies of commits whose distance is between 1 and 2 second;
 -  2 occourrencies of commits whose distance is between 2 and 3 second;
 -  3 occourrencies of commits whose distance is between 3 and 4 second;
 - 57 occourrencies of commits whose distance is between 4 and 5 second;
 - 15 occourrencies of commits whose distance is between 5 and 6 second;
 - 13 occourrencies of commits whose distance is between 6 and 7 second;
the distribution then tends slowly to zero.

Essentially I need to understand how it is possible to have really close
commits.
Is there a tool that gives you the ability to prepare a serie of commits
(each one with his own message) and then run all the operations
automatically?

Thanks a lot, again.

Paolo




More information about the Mono-devel-list mailing list