[Mono-dev] Question about JIT performance

Thu Aug 14 01:05:44 EDT 2008

I've been doing some playing around and have written some code for
mathematical operations. Among those operations is something called a
box approximation for the integral  of a function
( http://en.wikipedia.org/wiki/Rectangle_Rule ). 

At the behest of mhutch in #mono, I tried a few things to improve the
speed of the method. The original speed was approximately 385 ms for
10,000,000 iterations of the midpoint box approximation over the
function x^3. This included 4 multiplies, 3 adds, and 1 function call
for each run of the loop, plus a one-time cost of a divide.

The things attempted to improve speed were: inlining the function, and
removing a switch() that determined if the approximation was to be
midpoint, left, or right.

Here are the results from those tests:

Iteration 1
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, no inline, w/ branch: 469
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, inlined, w/ branch: 132
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, no inline, no branch: 235
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, inlined, no branch: 250
....(Much of the same here)....
Iteration 10
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, no inline, w/ branch: 456
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, inlined, w/ branch: 130
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, no inline, no branch: 228
	Integral of x^3 over [0, 5]: 156.250062500004
	Box approximation, inlined, no branch: 245

The thing that immediately struck me as odd was: how is the inlined
BoxApproximation that HAS a switch() call to check whether it's doing
left, right or midpoint faster than everything else. It's not just a
little faster, it's a lot faster. The machine running all of these tests
is running Mono trunk as of a few days ago, x86 processor with a 2.0 GHz
clock speed and 512 MB of RAM (it's a VMware server virtual machine).
Running the same thing on Mono 1.9.1 outside of VMware yields similar
results.

My question is--why is that method the fastest? Attached are compileable
versions of the Main() method and the class that holds the function
definitions.

Thanks,
Bojan Rajkovic
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Functions.cs
Type: text/x-csharp
Size: 4937 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20080814/167e95db/attachment-0002.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MathPlay.cs
Type: text/x-csharp
Size: 2502 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20080814/167e95db/attachment-0003.bin