[MonoDevelop] Code analysis soc project

Thu Mar 25 03:48:38 EDT 2010

On Wed, Mar 24, 2010 at 5:40 PM, nikhil sarda <diff.operator at gmail.com> wrote:
> What I thought was that the code analysis addin could be run in a
> separate thread. It will periodically check the state of the project
> and if it finds that documents
> have been changed it will reconstruct the AST using the parser service
> and feed it to the analysis engine. The engine will then run all the
> valid rules on the parser units on another thread and then update the
> editor accordingly. A time lag is inevitable, and obviously an initial
> implementation will lag for very large code bases.

The parser service already detects changes and parses documents on the
fly - that's how we get code completion and on-the-fly syntax error
underlining. You just need to add an event to the parser service so
that the analysis service can subscribe to new parsed documents, then
hand those off to a queue for the analysis thread to process.

> I have become quite familiar with the new DOM. I have partially
> implemented a rule for CS5001 (source file must have Main) and
> checking functions for invalid returns  http://pastebin.com/zPnNwvs3
> Again these are POCs, I had to do an ugly hack to workaround the non
> availability of getting the ReturnType of a method. There are similar
> problems getting their arguments as well.
>
>>> Beyond that, there are analysis rules, that scan the parsed documents
>>> and report errors, warnings and suggestions. These rules can range
>>> from trivial to very complex, and some are much more useful than
>>> others.
>
> I was thinking of implementing some of the rules as defined here
> http://msdn.microsoft.com/en-us/library/ms228296%28VS.80%29.aspx

Well, bear in mind that the C# compiler already reports all those, and
when we have background compilation, that would duplicate such rules.
And IMHO there's not much point in running most of the rules on all
files in the project - the value of on-the-fly analysis is that it
comes up in the current file, so you can correct code as you write it
without a mental context switch. If you wanted to look at all the
solutions' warnings and errors you could just run gmcs and gendarme,
but that's a different workflow.

IMHO the coolest thing that on-the-fly analysis in the MD DOM can do
is that it can hook into MD's refactoring operations to offer
automatic fixes. So you could initially focus on rules that would
detect things that the existing refactorings can fix,
e.g.
invalid identifier, or naming conventions problems -> rename
missing interface members -> implement interface
no such member -> create member
no such class -> resolve, or create class
magic number -> introduce constant
public field -> encapsulate field with property
trivial property -> convert to auto property
huge method -> extract method

and it would be quite simple to make some other quick fixes based on
the token info in the new C# DOM,
e.g.
duplicated type names in local declaration ->  replace with 'var'
ambiguously resolved type name -> fully qualify
obvious infinitely recursive property -> find similarly named field to return
dereference 'as Foo' without null check -> change to (Foo) cast
if (foo == true) -> if (foo)

I'm not saying it's not worth duplicating gmcs and gendarme rules in
some cases - especially the easy ones, or when we can also offer an
automatic quick fix - but that it's best to focus on things that are
especially useful on-the-fly:
- fixing common typos
- exposing MD's refactoring commands
- fixing common mistakes made by new users
and also on implementing and polishing the infrastructure so that's
it's easy for people to contribute new rules.

> Won't this be a bit difficult (impossible?) to implement? I mean, how
> can I tokenize Correct lySpel ledWords ? Will some sort of a
> dictionary need to be incorporated?

We'll need a spellchecker + dictionary to feed the works to, yeah, but
there are several OSS ones around that could be used, and it's about
time we had a spellchecker in MD core for various things to use :)

Tokenizing camelCase, PascalCase and underscored_identifiers is pretty
easy - just split on whitespace, punctuation, between
lowercase-uppercase pairs, and before uppercase-lowercase pairs. Then
you can take the suggestions from the spellchecker, recombine them,
and offer as a rename refactoring.

> Many of these rules can be implemented quite straightforwardly if I
> have the corresponding parse units.
> For unresolved types and unused variables in classes and methods, I
> will probably have to save it in some sort of a dictionary to store
> and verify at each ExpressionStatement, VariableDeclaration, etc
> whether or not they have been used. Ditto for methods, I will have to
> create a dictionary first and then verify if they appear anywhere in
> various InvocationExpression. Also note that this is a solution wide

Finding unused public/protected/internal items would probably only be
viable if the code completion DBs had an index for "find all
references". Without that, it will be really slow, and probably not
worth doing.

It's much easier for private members, as you only need to consider one
class (plus other non-auto-generated partial classes), so probably
worth doing. It would be easy to implement a quick fix too - just
remove the member.

> rule. MHutch pointed out that I should focus on rules valid for
> individual files for now, which is what I plan on doing.

Yup, let's focus on the infrastructure and the easy wins. Also, don't
underestimate the GUI work for the configuration panels and for
presenting the results and fixes in the text editor.

For some ideas, take a look at ReSharper:
http://www.jetbrains.com/resharper/features/code_analysis.html#On-the-fly_Error_Detection
http://www.jetbrains.com/resharper/features/code_analysis.html#Quick-Fixes
http://www.jetbrains.com/resharper/features/coding_assistance.html#Context_Actions
and bear in mind that ReSharper only introduced solution-wide analysis
in v4.5 :)

Also, take a look at what VS does with "smart tags" for offering
refactorings and fixes - rename symbol if you edit its name in the
source editor, implement interface, resolve type. It's MUCH more
limited than R#, but still very useful.

-- 
Michael Hutchinson
http://mjhutchinson.com