[Mono-dev] CIL to CIL optimizer

Mon Aug 14 17:30:08 EDT 2006

Again, first thing first: I am replying just to share knowledge,
as you said, both a CCO and an optimizing JIT have their place
in the world...

On Mon, 2006-08-14 at 19:39 +0200, Bjarke Hammersholt Roune wrote:
> 3. A CCO can decrease the size of an assembly by such optimizations as 
> dead code/function/class elimination (the last two only in the absence 
> of reflection), better use of the stack, better instruction selection, 
> redundancy elimination, better ordering of basic blocks and so on. This 
> decreases start up time.

I'd like to understand better what you mean by "class elimination"
(I guess "function elimination" menas "inline"...)?
And also, what does "instruction selection" mean at CIL level?

> 5. Some optimizations, like dead code elimination, can be done in a CCO 
> and then the JIT does not have to do them (or does not have to do them 
> as much). The CCO could embed some metadata in the assembly to indicate 
> which optimizations have already been performed. The same trick applies 
> to the results of expensive analyzes, though relying on that information 
> without checking it could be unsafe.

The problem is that most optimizations should be run anyway, see
below...

> 6. A CCO can output data that enables the JIT to start up faster and 
> optimize better. An example is SafeTSA which is a Java-equivalent 
> bytecode format where the code is already in SSA form and where some 
> array bounds checks and some null checks can be safely eliminated. On 
> top of this the SafeTSA format actually takes up less space then the 
> normal Java bytecode format. Another example is a paper I read that 
> explained how to annotate Java bytecode so that the JIT could do fast 
> and high quality register allocation in little time. This scheme is, 
> surprisingly, independent of the number of registers on the target machine.

Now *this* is interesting, and has been in my wild dreams for a while...
but it'll have to wait until what is *really* needed in the JIT is done.
I was especially thinking of doing some kind of global, interprocedural
analysis with a sort of CCO which would not really change the code, just
write some custom attributes that tell the JIT interesting things like
"this method is free of side effects", or "this method only modifies the
'this' object", or "const" annotations, or similar things.

> I cannot foresee how problematical the natural barrier to optimization 
> in a CCO that Miguel mentions will be. It is certainly there in the 
> sense that a JIT can do more optimization than a CCO if it had all the 
> time in the world. I guess I will have to write a CCO and see what 
> happens to find out.

This is probably the most interesting topic.
I give you some examples of things that a CCO cannot do as effectively
as a JIT, so the JIT must (well, should...) do them anyway.
[1] Dead code elimination: some code is generated internally by the
    JIT, so the CCO cannot even see it. This is particularly true with
    inlining (unless you also do inlining in the CCO itself, which is
    tricky because it is likely to break things like tracing, debugging
    and AOP frameworks).
[2] Redundancy elimination: again, some code sequences are totally
    invisible in CIL (array element and field address computations, or
    locals initializations, and, if we get smart, vtable accesses...).
    Moreover, redundancy elimination is really tricky to get right, and
    it can easily turn into an arch dependent issue.
    I have been badly burnt by the SSAPRE experience in the JIT about
    this: on paper, PRE is always a win, and my SSAPRE implementation
    does nothing else than removing redundancies, yet it can *easily*
    make the code *worse* :-(
    The problem is that sometimes storage (especially registers!) is
    more valuable than computational CPU cycles, so PRE is a tradeoff,
    and you must get it right on each architecture...
[3] Well, this is really minor, but there are cases when something is
    constant at JIT time, but it was not before, so also SCCP must be
    done again in the JIT.
I'm sure that thinking long enough one could find more examples, but
these should be enough to give you an idea of the issues.
And of course advanced things like register promotion require that you
have a strong optimization framework already, and they can be only in
the JIT...

> > Then, about the CFG shape and issues when try[/catch]/finally clauses
> > are present, this discussion already reached mono-devel-list:
> > http://marc.theaimsgroup.com/?l=mono-devel-list&m=111336503228737&w=2
> > I'd really like you to comment on that thread.
> > 
> I think that discussion is a good example of how the different 
> constraints on JITs and CCOs play out.

Yep!
Now I see why I had so many problems "seeing" your example :-)
BTW, it's hard for me to comment on it specifically because is seems
to show a JIT bug :-(

Anyway in general I'm skeptical about "omnicomprehensive" approaches
like the one you describe, not because they are not feasible, but
because I have the impression that they are not worth the effort.
I mean, when the control flow reaches the catch/finally, you do not
know the specific path it took to get there, so even in the perfect
case you will have very "large" phi functions where you cannot know
which of the arguments is the "right" one (because statically none
of them is "right"), so you cannot make inferences on the phi value.
In the end, having a dummy store statement instead of the phi is the
same thing IMHO, but cheaper.

> As a practical matter, I also think that it will take less effort to
> implement optimizations in ML than in C. On a personal note I have been
> programming a Groebner Basis computation program for quite a while in 
> C++, and I am looking forward to programming in a functional-style 
> language again.

Yep, but, as a practical matter, the runtime is written in C for
good reasons, and the JIT is part of it... even if I perfectly see
that other languages are *much* more productive and elegant!

Ciao,
  Massi