[Mono-list] mcs compiles on linux. Now what?

Paolo Molaro lupus@ximian.com
Fri, 8 Mar 2002 16:06:07 +0100


On 03/08/02 Dan Lewis wrote:
> Is there any way to find out how much of this is spent in the lexer? MCS uses a
> custom lexer, and in particular uses a hashtable lookup to recognize keywords.
> 
> String.GetHashCode() is computed in C# at the moment. It should definitely have
> an icall (btw I'm not saying that icalls are the way to make things faster --
> but it's such a fundamental operation). Also it is not cached, although strings
> are supposed to be immutable, right? Perhaps change it to:
> 
>   public override int GetHashCode () {
>           if (!is_hashed) {
>                   // compute hash_code
>                   is_hashed = true;
>           }
> 
>           return hash_code;
>   }
> 
> This may/may not make any difference. As ever, profiling's your best weapon :)

String.GetHashCode() accounts for 1.3% of the total time spent compiling,
so its not an obvious candidate for optimizations:-)

Here is some relevant data:
Method name                                           Total (ms)  Calls
Mono.CSharp.Driver::ProcessFile(1)                    214055      28
Mono.CSharp.Driver::parse(1)                          214051      28
Mono.CSharp.CSharpParser::parse(0)                    214008      28
Mono.CSharp.CSharpParser::yyparse(1)                  214007      28
Mono.CSharp.Tokenizer::token(0)                       163886  161657
Mono.CSharp.Tokenizer::xtoken(0)                      163273  161657
Mono.CSharp.Tokenizer::peekChar(0)                     25279  888884
Mono.CSharp.Tokenizer::is_number(1)                    24166   19362
Mono.CSharp.Tokenizer::getChar(0)                      17076  888825
Mono.CSharp.Tokenizer::decimal_digits(1)               13687   19335
Mono.CSharp.Tokenizer::is_punct(2)                      7934  290123
Mono.CSharp.Tokenizer::advance(0)                       4676  161685
Mono.CSharp.Tokenizer::is_keyword(1)                    4247   56199
Mono.CSharp.Tokenizer::handle_preprocessing_directive(0)    2216     410
Mono.CSharp.Tokenizer::get_cmd_arg(2)                   2081     410
Mono.CSharp.Tokenizer::is_identifier_part_character(1)    1960  355818
Mono.CSharp.Tokenizer::escape(1)                        1544   49420
Mono.CSharp.Tokenizer::adjust_int(1)                    1430   19327
Mono.CSharp.Tokenizer::GetKeyword(1)                     980   16811

System.Text.StringBuilder::Append(1)                   77733  453475
System.Char::IsLetter(1)                                1400  734049
System.Char::IsDigit(1)                                  731  444015

So it looks like StringBuilder::Append() gets a huge chunk and next to
it are IO functions and many small functions that add up. I'd need to
add call graph info to have more precise data, but this should give an
idea.

> In general custom lexers are slower than machine generated ones. I did some
> work a long time ago on porting a fast lexer generator to C# -- I could dig it
> up if there's need for it.

This is miguel's call.

lupus

-- 
-----------------------------------------------------------------
lupus@debian.org                                     debian/rules
lupus@ximian.com                             Monkeys do it better