[Mono-list] Little surprise.
Miguel de Icaza
miguel@ximian.com
09 May 2002 22:12:23 -0400
Hello,
We have built a JIT engine for Mono that is pretty capable. But
early on Dietmar had identified a few problems with the code
generator: the register allocator was too simple, and implementing
register allocation seemed hard with the clean and simple model we
had. We even considered re-writing the code generator to use a
different internal representaiton.
The little surprise is that we have overcome many problems and that
we now have deployed a shiny and cool register allocator: a linear scan
register allocator. This allocator is very close optimally to graph
coloring, but is extremely simple to compute (which makes it good for
a JIT engine).
The tale is a bit more interesting though, and people tracking the
various mailing lists will know this already.
Here is a quick overview of the JIT pipeline as of a couple of
weeks ago:
The JIT engine pipeline is something like this:
* Flow analysis
A first pass is done over the CIL byte codes to compute
the basic block boundaries.
* Basic block to forest/tree.
Each basic block CIL stream is "reconstituted" into a
group of trees (think of the stack-based CIL instruction
stream as a "flat" representation of a tree).
These trees are the Mono JIT internal representation.
* Instruction selection.
The monoburg instruction selector works on the trees that
are provided and uses a cost function to drive the matching
process (matching parts of the tree to a rule).
* Register allocation.
Registers were allcoated to the various nodes in the tree,
but there was no framework for spills and reloads, and it
was hard to "flag" the registers used.
A couple of breakthroughs happened recently:
* Dietmar first implemented inlining (which was easy, not only
because he is a great hacker, or because Paolo provided
great input, our in-house guiding light, but also because
the internal representation made it easy).
* Paolo pointed out that inlining would benefit a lot more
if coupled with constant propagation (he was right), an
extreme example is mono/mono/benchmark/inline3.cs, where
a nested group of calls is turned into a single instruction.
* Dietmar then implemented constant folding in the JITer, this
was really simple: a new non-terminal was introduced to
represent "integer constants" (coni4), and a set of rules
were added, they look like this:
coni4: ADD (coni4, coni4) {
tree->data.i = tree->left->data.i + tree->right->data.i;
}
* Paolo noticed also that we could simplify our internal
representation if we made all conditional share the same
code, so a new terminal was introduced: CBRANCH, this would
allow us to centralize all the branch emision optimizations.
* Sergey contributed a CPU detection routine that could be
used to deploy a new floating-point compare routine that
Paolo discovered that would give us 17% performance increase
on floating point compares.
* Sergey also contributed multiplicatio optimizations for the
JIT engine and a new implementation for conv.i4. All of
these thing together were making the JITer better.
Most importantly:
* Dietmar out of the blue looked into a problem we had for
large expressions: since the JITer did not do have any
support for spill/reloads. Once this was implemented,
he freed up two registers that could be used for general
allocation.
Performance did not decrease noticeably (the old allocator
was not doing a good job anyways). This happened on
Friday, and Dietmar said he would look into implementing
Linear Register Allocation afterwards.
* On Monday Dietmar found out that he needed to add data
flow analysis to deploy the new allocator which was
implemented.
* By wednesday everything was ready to go, but a few tests
were failing.
* By thursday (today), Dietmar made the new linear allocator
the default in Mono. You notice its impact mostly on
computational intensive applications.
Part of the beauty, is that the changes to the core JIT engine were
relatively simple (if you track the mono-patches mailing list you
would have noticed that the code changes are very small for what they
achieve: inlining, constant folding, register spilling and linear scan
implementation).
Many small optimizations are on the queue, and have not yet been
applied to CVS. Keep your eyes open.
* The JIT engine
I believe that Mono has one of the cleanest and easiest code
generators out there. It is a fertile ground to learn about code
generation, best practices and JITing. Feel free to get your latest
bits from CVS.
Miguel.