[Mono-list] mcs compiles on linux. Now what?
Mark Lewis
lewism@businesslogic.com
Fri, 8 Mar 2002 09:29:18 -0600
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
--------------InterScan_NT_MIME_Boundary
Content-Type: multipart/alternative;
boundary="----_=_NextPart_001_01C1C6B6.038F6040"
------_=_NextPart_001_01C1C6B6.038F6040
Content-Type: text/plain;
charset="iso-8859-1"
It's not that String.GetHashCode() itself is slow, it's that hashtable
overhead is slow because of poor hashing, due to the current hashcode
implementation. But I don't see any hashtable calls in your profile report,
so maybe it's still not a big deal?
-- Mark Lewis
-----Original Message-----
From: Paolo Molaro [mailto:lupus@ximian.com]
Sent: Friday, March 08, 2002 9:06 AM
To: mono-list@ximian.com
Subject: Re: [Mono-list] mcs compiles on linux. Now what?
Importance: Low
On 03/08/02 Dan Lewis wrote:
> Is there any way to find out how much of this is spent in the lexer? MCS
uses a
> custom lexer, and in particular uses a hashtable lookup to recognize
keywords.
>
> String.GetHashCode() is computed in C# at the moment. It should definitely
have
> an icall (btw I'm not saying that icalls are the way to make things faster
--
> but it's such a fundamental operation). Also it is not cached, although
strings
> are supposed to be immutable, right? Perhaps change it to:
>
> public override int GetHashCode () {
> if (!is_hashed) {
> // compute hash_code
> is_hashed = true;
> }
>
> return hash_code;
> }
>
> This may/may not make any difference. As ever, profiling's your best
weapon :)
String.GetHashCode() accounts for 1.3% of the total time spent compiling,
so its not an obvious candidate for optimizations:-)
Here is some relevant data:
Method name Total (ms) Calls
Mono.CSharp.Driver::ProcessFile(1) 214055 28
Mono.CSharp.Driver::parse(1) 214051 28
Mono.CSharp.CSharpParser::parse(0) 214008 28
Mono.CSharp.CSharpParser::yyparse(1) 214007 28
Mono.CSharp.Tokenizer::token(0) 163886 161657
Mono.CSharp.Tokenizer::xtoken(0) 163273 161657
Mono.CSharp.Tokenizer::peekChar(0) 25279 888884
Mono.CSharp.Tokenizer::is_number(1) 24166 19362
Mono.CSharp.Tokenizer::getChar(0) 17076 888825
Mono.CSharp.Tokenizer::decimal_digits(1) 13687 19335
Mono.CSharp.Tokenizer::is_punct(2) 7934 290123
Mono.CSharp.Tokenizer::advance(0) 4676 161685
Mono.CSharp.Tokenizer::is_keyword(1) 4247 56199
Mono.CSharp.Tokenizer::handle_preprocessing_directive(0) 2216 410
Mono.CSharp.Tokenizer::get_cmd_arg(2) 2081 410
Mono.CSharp.Tokenizer::is_identifier_part_character(1) 1960 355818
Mono.CSharp.Tokenizer::escape(1) 1544 49420
Mono.CSharp.Tokenizer::adjust_int(1) 1430 19327
Mono.CSharp.Tokenizer::GetKeyword(1) 980 16811
System.Text.StringBuilder::Append(1) 77733 453475
System.Char::IsLetter(1) 1400 734049
System.Char::IsDigit(1) 731 444015
So it looks like StringBuilder::Append() gets a huge chunk and next to
it are IO functions and many small functions that add up. I'd need to
add call graph info to have more precise data, but this should give an
idea.
> In general custom lexers are slower than machine generated ones. I did
some
> work a long time ago on porting a fast lexer generator to C# -- I could
dig it
> up if there's need for it.
This is miguel's call.
lupus
--
-----------------------------------------------------------------
lupus@debian.org debian/rules
lupus@ximian.com Monkeys do it better
_______________________________________________
Mono-list maillist - Mono-list@ximian.com
http://lists.ximian.com/mailman/listinfo/mono-list
------_=_NextPart_001_01C1C6B6.038F6040
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2653.12">
<TITLE>RE: [Mono-list] mcs compiles on linux. Now what?</TITLE>
</HEAD>
<BODY>
<P><FONT SIZE=3D2>It's not that String.GetHashCode() itself is slow, =
it's that hashtable overhead is slow because of poor hashing, due to =
the current hashcode implementation. But I don't see any =
hashtable calls in your profile report, so maybe it's still not a big =
deal?</FONT></P>
<P><FONT SIZE=3D2>-- Mark Lewis</FONT>
</P>
<P><FONT SIZE=3D2>-----Original Message-----</FONT>
<BR><FONT SIZE=3D2>From: Paolo Molaro [<A =
HREF=3D"mailto:lupus@ximian.com">mailto:lupus@ximian.com</A>]</FONT>
<BR><FONT SIZE=3D2>Sent: Friday, March 08, 2002 9:06 AM</FONT>
<BR><FONT SIZE=3D2>To: mono-list@ximian.com</FONT>
<BR><FONT SIZE=3D2>Subject: Re: [Mono-list] mcs compiles on linux. Now =
what?</FONT>
<BR><FONT SIZE=3D2>Importance: Low</FONT>
</P>
<BR>
<P><FONT SIZE=3D2>On 03/08/02 Dan Lewis wrote:</FONT>
<BR><FONT SIZE=3D2>> Is there any way to find out how much of this =
is spent in the lexer? MCS uses a</FONT>
<BR><FONT SIZE=3D2>> custom lexer, and in particular uses a =
hashtable lookup to recognize keywords.</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> String.GetHashCode() is computed in C# at the =
moment. It should definitely have</FONT>
<BR><FONT SIZE=3D2>> an icall (btw I'm not saying that icalls are =
the way to make things faster --</FONT>
<BR><FONT SIZE=3D2>> but it's such a fundamental operation). Also it =
is not cached, although strings</FONT>
<BR><FONT SIZE=3D2>> are supposed to be immutable, right? Perhaps =
change it to:</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> public override int GetHashCode () =
{</FONT>
<BR><FONT =
SIZE=3D2>>  =
; if (!is_hashed) {</FONT>
<BR><FONT =
SIZE=3D2>>  =
; // compute =
hash_code</FONT>
<BR><FONT =
SIZE=3D2>>  =
; is_hashed =3D =
true;</FONT>
<BR><FONT =
SIZE=3D2>>  =
; }</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT =
SIZE=3D2>>  =
; return hash_code;</FONT>
<BR><FONT SIZE=3D2>> }</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> This may/may not make any difference. As ever, =
profiling's your best weapon :)</FONT>
</P>
<P><FONT SIZE=3D2>String.GetHashCode() accounts for 1.3% of the total =
time spent compiling,</FONT>
<BR><FONT SIZE=3D2>so its not an obvious candidate for =
optimizations:-)</FONT>
</P>
<P><FONT SIZE=3D2>Here is some relevant data:</FONT>
<BR><FONT SIZE=3D2>Method =
name &n=
bsp; &n=
bsp; &n=
bsp; Total (ms) Calls</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Driver::ProcessFile(1)  =
;  =
; 214055 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Driver::parse(1)  =
;  =
; =
214051 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.CSharpParser::parse(0)  =
;  =
; 214008 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.CSharpParser::yyparse(1) &nb=
sp; &nb=
sp; 214007 28</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::token(0) &n=
bsp; &n=
bsp; 163886 161657</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::xtoken(0) &=
nbsp; &=
nbsp; 163273 161657</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::peekChar(0)  =
;  =
; 25279 888884</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_number(1) &nbs=
p; &nbs=
p; 24166 19362</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::getChar(0) =
=
17076 888825</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::decimal_digits(1)  =
; =
13687 19335</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_punct(2)  =
;  =
; 7934 290123</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::advance(0) =
=
4676 161685</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_keyword(1) &nb=
sp; &nb=
sp; 4247 56199</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::handle_preprocessing_directive(0) &=
nbsp; 2216 410</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::get_cmd_arg(2) &n=
bsp; &n=
bsp; 2081 410</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::is_identifier_part_character(1) &nb=
sp; 1960 355818</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::escape(1) &=
nbsp; &=
nbsp; 1544 49420</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::adjust_int(1) &nb=
sp; &nb=
sp; 1430 19327</FONT>
<BR><FONT =
SIZE=3D2>Mono.CSharp.Tokenizer::GetKeyword(1) &nb=
sp; &nb=
sp; 980 16811</FONT>
</P>
<P><FONT =
SIZE=3D2>System.Text.StringBuilder::Append(1) &nb=
sp; &nb=
sp; 77733 453475</FONT>
<BR><FONT =
SIZE=3D2>System.Char::IsLetter(1) &nb=
sp; &nb=
sp; &nb=
sp; 1400 734049</FONT>
<BR><FONT =
SIZE=3D2>System.Char::IsDigit(1) &nbs=
p; &nbs=
p; &nbs=
p; 731 444015</FONT>
</P>
<P><FONT SIZE=3D2>So it looks like StringBuilder::Append() gets a huge =
chunk and next to</FONT>
<BR><FONT SIZE=3D2>it are IO functions and many small functions that =
add up. I'd need to</FONT>
<BR><FONT SIZE=3D2>add call graph info to have more precise data, but =
this should give an</FONT>
<BR><FONT SIZE=3D2>idea.</FONT>
</P>
<P><FONT SIZE=3D2>> In general custom lexers are slower than machine =
generated ones. I did some</FONT>
<BR><FONT SIZE=3D2>> work a long time ago on porting a fast lexer =
generator to C# -- I could dig it</FONT>
<BR><FONT SIZE=3D2>> up if there's need for it.</FONT>
</P>
<P><FONT SIZE=3D2>This is miguel's call.</FONT>
</P>
<P><FONT SIZE=3D2>lupus</FONT>
</P>
<P><FONT SIZE=3D2>-- </FONT>
<BR><FONT =
SIZE=3D2>---------------------------------------------------------------=
--</FONT>
<BR><FONT =
SIZE=3D2>lupus@debian.org  =
;  =
;  =
; debian/rules</FONT>
<BR><FONT =
SIZE=3D2>lupus@ximian.com  =
;  =
; Monkeys do it =
better</FONT>
</P>
<P><FONT =
SIZE=3D2>_______________________________________________</FONT>
<BR><FONT SIZE=3D2>Mono-list maillist - =
Mono-list@ximian.com</FONT>
<BR><FONT SIZE=3D2><A =
HREF=3D"http://lists.ximian.com/mailman/listinfo/mono-list" =
TARGET=3D"_blank">http://lists.ximian.com/mailman/listinfo/mono-list</A>=
</FONT>
</P>
</BODY>
</HTML>
------_=_NextPart_001_01C1C6B6.038F6040--
--------------InterScan_NT_MIME_Boundary--