[Mono-list] Little surprise.

Miguel de Icaza miguel@ximian.com
09 May 2002 22:12:23 -0400


Hello,

   We have built a JIT engine for Mono that is pretty capable.  But
early on Dietmar had identified a few problems with the code
generator: the register allocator was too simple, and implementing
register allocation seemed hard with the clean and simple model we
had.  We even considered re-writing the code generator to use a
different internal representaiton.

   The little surprise is that we have overcome many problems and that
we now have deployed a shiny and cool register allocator: a linear scan
register allocator.  This allocator is very close optimally to graph
coloring, but is extremely simple to compute (which makes it good for
a JIT engine).

   The tale is a bit more interesting though, and people tracking the
various mailing lists will know this already.

   Here is a quick overview of the JIT pipeline as of a couple of
weeks ago:

   The JIT engine pipeline is something like this:

	* Flow analysis

	  A first pass is done over the CIL byte codes to compute
	  the basic block boundaries.  

	* Basic block to forest/tree.

	  Each basic block CIL stream is "reconstituted" into a
	  group of trees (think of the stack-based CIL instruction
	  stream as a "flat" representation of a tree).

	  These trees are the Mono JIT internal representation. 

	* Instruction selection.

	  The monoburg instruction selector works on the trees that
	  are provided and uses a cost function to drive the matching
	  process (matching parts of the tree to a rule).

	* Register allocation.

	  Registers were allcoated to the various nodes in the tree,
	  but there was no framework for spills and reloads, and it
	  was hard to "flag" the registers used.

   A couple of breakthroughs happened recently:

	* Dietmar first implemented inlining (which was easy, not only
	  because he is a great hacker, or because Paolo provided
	  great input, our in-house guiding light, but also because
	  the internal representation made it easy).

	* Paolo pointed out that inlining would benefit a lot more
	  if coupled with constant propagation (he was right), an
	  extreme example is mono/mono/benchmark/inline3.cs, where
	  a nested group of calls is turned into a single instruction.

	* Dietmar then implemented constant folding in the JITer, this
  	  was really simple: a new non-terminal was introduced to 
	  represent "integer constants" (coni4), and a set of rules
	  were added, they look like this:

		coni4: ADD (coni4, coni4) {
			tree->data.i = tree->left->data.i + tree->right->data.i;
		}

	* Paolo noticed also that we could simplify our internal
	  representation if we made all conditional share the same
	  code, so a new terminal was introduced: CBRANCH, this would
	  allow us to centralize all the branch emision optimizations.

	* Sergey contributed a CPU detection routine that could be
	  used to deploy a new floating-point compare routine that
	  Paolo discovered that would give us 17% performance increase
	  on floating point compares. 

	* Sergey also contributed multiplicatio optimizations for the 
	  JIT engine and a new implementation for conv.i4.  All of
	  these thing together were making the JITer better.

   Most importantly:

	* Dietmar out of the blue looked into a problem we had for
	  large expressions: since the JITer did not do have any
	  support for spill/reloads.  Once this was implemented,
	  he freed up two registers that could be used for general
	  allocation.

	  Performance did not decrease noticeably (the old allocator
	  was not doing a good job anyways).   This happened on
	  Friday, and Dietmar said he would look into implementing
	  Linear Register Allocation afterwards. 
	  
	* On Monday Dietmar found out that he needed to add data
	  flow analysis to deploy the new allocator which was
	  implemented.

	* By wednesday everything was ready to go, but a few tests
	  were failing.

	* By thursday (today), Dietmar made the new linear allocator
	  the default in Mono.  You notice its impact mostly on
	  computational intensive applications.
	 
   Part of the beauty, is that the changes to the core JIT engine were
relatively simple (if you track the mono-patches mailing list you
would have noticed that the code changes are very small for what they
achieve: inlining, constant folding, register spilling and linear scan
implementation).

   Many small optimizations are on the queue, and have not yet been
applied to CVS.  Keep your eyes open. 

* The JIT engine

    I believe that Mono has one of the cleanest and easiest code
generators out there.  It is a fertile ground to learn about code
generation, best practices and JITing.  Feel free to get your latest
bits from CVS.

Miguel.