[MonoDevelop] Code completion matching - input needed

Lluis Sanchez Gual slluis.devel at gmail.com
Wed Aug 11 06:17:34 EDT 2010


> Hi
> 
> Ok, we"ve a problem with the matching algorithm for the code completion. 
> I think this is one of the most important parts in mono develop and I 
> think we need opinions on that topic.

I wish this discussion was made before changing the behavior of the
Navigate dialog, two weeks ago. We would have saved a lot of time.

> 
> We've two places where we match with abbreviations:
> 
> 1) The navigate to dialog
> 2) The code completion in the text editor
> 
> I know that both features are used very often (one by force - one by 
> need :)). We had 2 algorithms for the matching.  They yield different 
> results for the same input. We came to the conclusion that this >may< 
> not the best thing to have. But that is even not decided to the end.
> 
> Therefore the 1st question is: Would you expect that both filters are 
> the same - or not ?

No, they don't need to be the same. A search window is not the same as a
code completion window, so the search algorithms doesn't need to be the
same.

If you think that using two different algorithms may lead to usability
issues, I'd like to see a concrete use case.

> 
> Now to the algorithms - how they work. First the navigate to dialog 
> algorithm. It matches a word, if the filter is a subsequence. For example:
> 
> The filter 'strm' would match 'Stream', 'stringMatch', 'StrongTypMatch' 
> but also 'FirstStorm'.
> It currently completly ignore scase - 'DBO' == 'dbo' and would match 
> 'dboField', 'DataBaseObject', 'DBOStuff' or 'OddNumberContainer'.

That's wrong. The algorithm does not ignore case. It gives the correct
result for that last case.

> 
> This makes sense for example when you have a term 'Autotools' and you 
> search for 'tools' this is matched - the other algorithm won't match this.
> The plus side of this approach is that it gives you >many< items in the 
> case you're not really sure what to look for.

Getting too many items is not a problem. If you do a search in Google
you get thousands or millions of results, is that a problem? no, because
you know that Google is good at ranking, so in 99% of cases what are you
looking for will be in the first results page. And in case you don't
find it, you can keep browsing other pages. The Navigate To dialog works
in the same way. It doesn't matter if it returns a lot of items as long
as the best ones are shown at the top.

> 
> Ok now to the other in the code completion filter. This one does >not< 
> do a full substring match instead I would call it subword prefix 
> matcher. It breaks a word that is camel or pascal cased into subwords 
> and it matches the filter at the word starts - it maes a difference if 
> the filter is lower or upper case.
> 
> For example 'strm' would NOT match 'Stream' or 'FirstStorm' ,  but 
> 'StringMatch' or 'StrongTypeMatch'.
> In the 'DBO' case it won't match for 'dboField' but 'DataBaseObject' or 
> 'DBOStuff' - because upper case letter enforces a word jump. 'dbo' would 
> match the same but 'dboField' as well - but it would >never< match 
> 'OddNumberContainer'.
> 
> This one would miss 'Autotools' because it doesn't recognize the 'tools' 
> part - but it would match 'AutoTools' on 'tools' ... but at the plus 
> side it won't never give back 'TurboBooleans' if you search for 'tools'. 
> Generally this approach will yield a subset of the results of the 1st 
> approach.
> The plus side of this approach is that you get very fast to the result 
> you're looking for - for example if you hit 'g' you could end up with 
> 'string' in the other one - would never happen with this approach. And 
> because it's just giving a subset this approach is faster.

I'm ok about using this algorithm in code completion because the
completion window is small and it is not sorted by rank, so unlike the
Navigate dialog, getting less results is better.

> 
> Ok now we've the question: which algorithm to take. Do we need both ? 
> When we take one for all occasions which one ? Are we completly idiots 
> and another abbreviation approach would be better ?

We are going to use the first algorithm for the Navigate dialog, and the
second for code completion.

Lluis.




More information about the Monodevelop-list mailing list