[Mono-dev] [PATCH] A "fastpath" dead code elimination

Massimiliano Mantione massi at ximian.com
Tue Nov 15 11:29:35 EST 2005


The attached patch implements deadce, but without using SSA.
Instead, a simple alias analysis pass takes care of understanding
when local variables [may] change value, and liveness computation
has been modified to use this aliasing info if present.

The alias analysis pass has O(n) complexity (n = code size), it is
just a linear sweep on the list of instructions.
Then, deadce operates one BB at a time, scanning the code linearly
and using the liveness bits as start/end conditions (so it is O(n)
as well).
I positioned the deadce pass just between the liveness computation
and the linear scan register allocator, so that:
- it can reuse the liveness computation already computed for the
  linear regalloc, without requiring a new pass, and
- it is "downstream" in the compilation pipeline, so it has the
  possibility to catch as many dead instructions as possible.

The name "fastpath deadce" comes from Patrik Torstensson :-)
He called it like this on IRC because it is on the fast compilation
path (the one without SSA).

But now, without other technical stuff, some numbers...
First of all, the JIT overhead: the time to JIT mscorlib.dll with
various compilation options, in seconds (time mono --compile-all):

Options      -all  -all,ssa
JIT time     1.05  1.35

Here we see that just computing SSA has a 30% overhead with respect
to a compilation with no optimizations at all.

Options       default  ssa   deadce  ssa,deadce
JIT time      1.32     1.55  1.45    1.6

And here we see that the overhead of fastpath deadce on a "default"
compilation is almost 10%, while ssa deadce has 21% (and note that
in practice, since ssa does not work with aliasing, the new fastpath
deadce is applied to more methods).

So, the JIT overhead is reasonably small.

And finally, some lovely benchmark...
I tried SciMark, and here are the results:
(MFlops, just the composite score)

default                            205.92
-O=ssa,deadce                      210.61
-O=deadce                          207.50
-O=consprop,copyprop,deadce        207.72
-O=consprop,copyprop,deadce,inline 209.82
-O=inline                          213.63
-O=deadce,inline                   214.52

So, the idea is that fastpath deadce has some effect :-)
It is not as effective as ssa deadce, but it is useful anyway and
most of all it does not incur the same JIT time overhead.

BTW, I spent a *lot* of time trying to be sure that those numbers
are accurate. I had to kill evolution, gaim, beagle, the wireless
card, and carefully monitor system load otherwise the results were
erratic (just of a few percent, but this is what we're looking at).
I then executed the benchmark several (>10) times for every case
and took the *best* score for each (with the idea that it is the
one where other things interfered less), and anyway all the best
scores were almost identical.

A "funny" thing: these are the results I have on a SciMark.exe
compiled with MS .NET 2.0:

default:                     136.65
-O=consprop,copyprop,deadce: 137.52

As you can see, they are *low*!!!
I still have to understand what the 2.0 csc does exactly, but it
looks a real disaster for us :-(
Watch out this quirk when doing/examining benchmarks!

OK, that's all... please approve the patch :-)
We can always refine it later...

Ciao,
  Massi

-------------- next part --------------
A non-text attachment was scrubbed...
Name: fastpath-deadce.patch
Type: text/x-patch
Size: 58443 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20051115/a66e3e20/attachment.bin 


More information about the Mono-devel-list mailing list