[Mono-dev] VBNC uses too much CPU and RAM on Mono

Wed Nov 1 11:30:48 EST 2006

Hello,

> The compiler is effectively keeing a linked list of all the tokens, and it  
> keeps them until the compiler finishes (tokens are kept since they contain  
> the source location for the token and would be necessary for any error  
> messages.) I'm quite sure I can remove the entire list pretty easily  
> though, so I'll try to fix this as soon as possible. However I don't think  
> this is the real problem, after parsing the source the list is never  
> walked, and then the only bottle-neck I can see would be the gc to take  
> too long to walk the list in order to decide that it cannot be disposed  
> of, but since Kornél's added gc collections and it worked better this does  
> not really seem logical.

MCS works like this, the tokenizer as I mentioned is called by the
parser as it needs to parse the tokens, each time its called, it returns
one token or "end of file" (which is another kind of token).

For each token returned, the tokenizer keeps a bit of state, called the
"Location" which happens to be an Int32, and in that Int32 we encode the
filename, row and location (see mcs/location.cs for details on the
encoding).

This is important for a few reasons: locations are structs, so they are
very light weight, and they fit nicely into 1-word, which is very
efficient as well. 

Now, during parsing we call the tokenizer to identify the next token,
and the tokenizer returns this 32 bit value.   The parser makes a
decision based on that and if the parser needs to construct some element
that needs to track the location, it checks a public property in the
tokenizer, the "Location" that reports the location where the current
token was found.

This means that for each parsed component we only track one location (we
might have a few cases where we store more than one location, but they
are rare).

So tokens are effectively discarded as soon as the parser has made a
decision, and locations are tracked in each internal node that we
create, and only when we actually need them (our base "Statement" and
"Expression" classes have a Location field, so we use this to track the
location).

When I wrote mcs, I thought that creating many objects would be a source
of problems (I was in particular worried about all the implicit and
explicit cast objects that we created).   But it turned out that those
objects never showed up in a profile, the major issues were all the
lists that we created in FindMembers, the overload method resolution (we
create lots of arrays that we never reuse) and most importantly, lots of
string operations that were concatenating namespaces and types ns + "."
+ type kind of operations.

> - When a member lookup is done on a type, the compiler loads all the  
> members of the type in question and all the ascendent types in order to  
> create a flattened view of the type, and then the flattened type is  
> cached. It might be better to load only the required member on the type  
> and it's ascendent types and cache that. This is a somewhat bigger change  
> though.

Some inspiration could probably be taken from the MCS MemberCache
routine, it was designed precisely because of this issue.  Am not sure
if it solves it in a way that is compatible with vbnc, am afraid it
might not, but its worth a look.

Also, this is a bit of an advanced optimization, so as you point out,
this probably can wait.

Miguel