[Mono-list] mcs compiles on linux. Now what?

Mark Lewis lewism@businesslogic.com
Fri, 8 Mar 2002 09:29:18 -0600


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

--------------InterScan_NT_MIME_Boundary
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C1C6B6.038F6040"

------_=_NextPart_001_01C1C6B6.038F6040
Content-Type: text/plain;
	charset="iso-8859-1"

It's not that String.GetHashCode() itself is slow, it's that hashtable
overhead is slow because of poor hashing, due to the current hashcode
implementation.  But I don't see any hashtable calls in your profile report,
so maybe it's still not a big deal?
-- Mark Lewis

-----Original Message-----
From: Paolo Molaro [mailto:lupus@ximian.com]
Sent: Friday, March 08, 2002 9:06 AM
To: mono-list@ximian.com
Subject: Re: [Mono-list] mcs compiles on linux. Now what?
Importance: Low


On 03/08/02 Dan Lewis wrote:
> Is there any way to find out how much of this is spent in the lexer? MCS
uses a
> custom lexer, and in particular uses a hashtable lookup to recognize
keywords.
> 
> String.GetHashCode() is computed in C# at the moment. It should definitely
have
> an icall (btw I'm not saying that icalls are the way to make things faster
--
> but it's such a fundamental operation). Also it is not cached, although
strings
> are supposed to be immutable, right? Perhaps change it to:
> 
>   public override int GetHashCode () {
>           if (!is_hashed) {
>                   // compute hash_code
>                   is_hashed = true;
>           }
> 
>           return hash_code;
>   }
> 
> This may/may not make any difference. As ever, profiling's your best
weapon :)

String.GetHashCode() accounts for 1.3% of the total time spent compiling,
so its not an obvious candidate for optimizations:-)

Here is some relevant data:
Method name                                           Total (ms)  Calls
Mono.CSharp.Driver::ProcessFile(1)                    214055      28
Mono.CSharp.Driver::parse(1)                          214051      28
Mono.CSharp.CSharpParser::parse(0)                    214008      28
Mono.CSharp.CSharpParser::yyparse(1)                  214007      28
Mono.CSharp.Tokenizer::token(0)                       163886  161657
Mono.CSharp.Tokenizer::xtoken(0)                      163273  161657
Mono.CSharp.Tokenizer::peekChar(0)                     25279  888884
Mono.CSharp.Tokenizer::is_number(1)                    24166   19362
Mono.CSharp.Tokenizer::getChar(0)                      17076  888825
Mono.CSharp.Tokenizer::decimal_digits(1)               13687   19335
Mono.CSharp.Tokenizer::is_punct(2)                      7934  290123
Mono.CSharp.Tokenizer::advance(0)                       4676  161685
Mono.CSharp.Tokenizer::is_keyword(1)                    4247   56199
Mono.CSharp.Tokenizer::handle_preprocessing_directive(0)    2216     410
Mono.CSharp.Tokenizer::get_cmd_arg(2)                   2081     410
Mono.CSharp.Tokenizer::is_identifier_part_character(1)    1960  355818
Mono.CSharp.Tokenizer::escape(1)                        1544   49420
Mono.CSharp.Tokenizer::adjust_int(1)                    1430   19327
Mono.CSharp.Tokenizer::GetKeyword(1)                     980   16811

System.Text.StringBuilder::Append(1)                   77733  453475
System.Char::IsLetter(1)                                1400  734049
System.Char::IsDigit(1)                                  731  444015

So it looks like StringBuilder::Append() gets a huge chunk and next to
it are IO functions and many small functions that add up. I'd need to
add call graph info to have more precise data, but this should give an
idea.

> In general custom lexers are slower than machine generated ones. I did
some
> work a long time ago on porting a fast lexer generator to C# -- I could
dig it
> up if there's need for it.

This is miguel's call.

lupus

-- 
-----------------------------------------------------------------
lupus@debian.org                                     debian/rules
lupus@ximian.com                             Monkeys do it better

_______________________________________________
Mono-list maillist  -  Mono-list@ximian.com
http://lists.ximian.com/mailman/listinfo/mono-list

------_=_NextPart_001_01C1C6B6.038F6040
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: [Mono-list] mcs compiles on linux. Now what?</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>It's not that String.GetHashCode() itself is slow, =
it's that hashtable overhead is slow because of poor hashing, due to =
the current hashcode implementation.&nbsp; But I don't see any =
hashtable calls in your profile report, so maybe it's still not a big =
deal?</FONT></P>

<P><FONT SIZE=3D2>-- Mark Lewis</FONT>
</P>

<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: Paolo Molaro [<A =
HREF=3D"mailto:lupus@ximian.com">mailto:lupus@ximian.com</A>]</FONT>
<BR><FONT SIZE=3D2>Sent: Friday, March 08, 2002 9:06 AM</FONT>
<BR><FONT SIZE=3D2>To: mono-list@ximian.com</FONT>
<BR><FONT SIZE=3D2>Subject: Re: [Mono-list] mcs compiles on linux. Now =
what?</FONT>
<BR><FONT SIZE=3D2>Importance: Low</FONT>
</P>
<BR>

<P><FONT SIZE=3D2>On 03/08/02 Dan Lewis wrote:</FONT>
<BR><FONT SIZE=3D2>&gt; Is there any way to find out how much of this =
is spent in the lexer? MCS uses a</FONT>
<BR><FONT SIZE=3D2>&gt; custom lexer, and in particular uses a =
hashtable lookup to recognize keywords.</FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT SIZE=3D2>&gt; String.GetHashCode() is computed in C# at the =
moment. It should definitely have</FONT>
<BR><FONT SIZE=3D2>&gt; an icall (btw I'm not saying that icalls are =
the way to make things faster --</FONT>
<BR><FONT SIZE=3D2>&gt; but it's such a fundamental operation). Also it =
is not cached, although strings</FONT>
<BR><FONT SIZE=3D2>&gt; are supposed to be immutable, right? Perhaps =
change it to:</FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; public override int GetHashCode () =
{</FONT>
<BR><FONT =
SIZE=3D2>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
; if (!is_hashed) {</FONT>
<BR><FONT =
SIZE=3D2>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; // compute =
hash_code</FONT>
<BR><FONT =
SIZE=3D2>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; is_hashed =3D =
true;</FONT>
<BR><FONT =
SIZE=3D2>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
; }</FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT =
SIZE=3D2>&gt;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
; return hash_code;</FONT>
<BR><FONT SIZE=3D2>&gt;&nbsp;&nbsp; }</FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT SIZE=3D2>&gt; This may/may not make any difference. As ever, =
profiling's your best weapon :)</FONT>
</P>

<P><FONT SIZE=3D2>String.GetHashCode() accounts for 1.3% of the total =
time spent compiling,</FONT>
<BR><FONT SIZE=3D2>so its not an obvious candidate for =
optimizations:-)</FONT>
</P>

<P><FONT SIZE=3D2>Here is some relevant data:</FONT>
<BR><FONT SIZE=3D2>Method =
name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Total (ms)&nbsp; Calls</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Driver::ProcessFile(1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp; 214055&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Driver::parse(1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
214051&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.CSharpParser::parse(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp; 214008&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.CSharpParser::yyparse(1)&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; 214007&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::token(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp; 163886&nbsp; 161657</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::xtoken(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp; 163273&nbsp; 161657</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::peekChar(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp; 25279&nbsp; 888884</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_number(1)&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; 24166&nbsp;&nbsp; 19362</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::getChar(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp; 17076&nbsp; 888825</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::decimal_digits(1)&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
13687&nbsp;&nbsp; 19335</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_punct(2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp; 7934&nbsp; 290123</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::advance(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4676&nbsp; 161685</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_keyword(1)&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp; 4247&nbsp;&nbsp; 56199</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::handle_preprocessing_directive(0)&nbsp;&=
nbsp;&nbsp; 2216&nbsp;&nbsp;&nbsp;&nbsp; 410</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::get_cmd_arg(2)&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp; 2081&nbsp;&nbsp;&nbsp;&nbsp; 410</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_identifier_part_character(1)&nbsp;&nb=
sp;&nbsp; 1960&nbsp; 355818</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::escape(1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1544&nbsp;&nbsp; 49420</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::adjust_int(1)&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp; 1430&nbsp;&nbsp; 19327</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::GetKeyword(1)&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp; 980&nbsp;&nbsp; 16811</FONT>
</P>

<P><FONT =
SIZE=3D2>System.Text.StringBuilder::Append(1)&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp; 77733&nbsp; 453475</FONT>
<BR><FONT =
SIZE=3D2>System.Char::IsLetter(1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nb=
sp; 1400&nbsp; 734049</FONT>
<BR><FONT =
SIZE=3D2>System.Char::IsDigit(1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp; 731&nbsp; 444015</FONT>
</P>

<P><FONT SIZE=3D2>So it looks like StringBuilder::Append() gets a huge =
chunk and next to</FONT>
<BR><FONT SIZE=3D2>it are IO functions and many small functions that =
add up. I'd need to</FONT>
<BR><FONT SIZE=3D2>add call graph info to have more precise data, but =
this should give an</FONT>
<BR><FONT SIZE=3D2>idea.</FONT>
</P>

<P><FONT SIZE=3D2>&gt; In general custom lexers are slower than machine =
generated ones. I did some</FONT>
<BR><FONT SIZE=3D2>&gt; work a long time ago on porting a fast lexer =
generator to C# -- I could dig it</FONT>
<BR><FONT SIZE=3D2>&gt; up if there's need for it.</FONT>
</P>

<P><FONT SIZE=3D2>This is miguel's call.</FONT>
</P>

<P><FONT SIZE=3D2>lupus</FONT>
</P>

<P><FONT SIZE=3D2>-- </FONT>
<BR><FONT =
SIZE=3D2>---------------------------------------------------------------=
--</FONT>
<BR><FONT =
SIZE=3D2>lupus@debian.org&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp; debian/rules</FONT>
<BR><FONT =
SIZE=3D2>lupus@ximian.com&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp=
;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Monkeys do it =
better</FONT>
</P>

<P><FONT =
SIZE=3D2>_______________________________________________</FONT>
<BR><FONT SIZE=3D2>Mono-list maillist&nbsp; -&nbsp; =
Mono-list@ximian.com</FONT>
<BR><FONT SIZE=3D2><A =
HREF=3D"http://lists.ximian.com/mailman/listinfo/mono-list" =
TARGET=3D"_blank">http://lists.ximian.com/mailman/listinfo/mono-list</A>=
</FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C1C6B6.038F6040--

--------------InterScan_NT_MIME_Boundary--