[Mono-dev] CIL to CIL optimizer

Tue Aug 15 04:23:14 EDT 2006

> On Mon, 2006-08-14 at 19:39 +0200, Bjarke Hammersholt Roune wrote:
>> 3. A CCO can decrease the size of an assembly by such optimizations as 
>> dead code/function/class elimination (the last two only in the absence 
>> of reflection), better use of the stack, better instruction selection, 
>> redundancy elimination, better ordering of basic blocks and so on. This 
>> decreases start up time.
> 
> I'd like to understand better what you mean by "class elimination"
> (I guess "function elimination" menas "inline"...)?
> And also, what does "instruction selection" mean at CIL level?
> 
By function elimination i mean removing functions that never get called
and that are not externally visible. This situation could come about as
a result of inlining, yes. By dead class I mean a class that does not
get referenced anywhere and that is not externally visible. I don't know
how prevalent those things are. It would be reasonable to have a command 
line switch to inform the CCO that the only externally visible thing 
that needs to be preserved is the main function.

Instruction selection is choosing which CIL instructions to emit. This 
can have an impact
on code size, such as by choosing the short version of some
instructions. It could also mean constructing the float 1.0 by taking an
integer 1 and casting it to float, which is shorter than loading it
directly and which the JIT should be able to optimize away. Choosing
local variable numbers in an instruction selection aware way can also 
help. E.g. loading and storing local variables is only available in 
short form instructions for the first 4, so it is better to let those 4 
be the most used ones.

> I give you some examples of things that a CCO cannot do as effectively
> as a JIT, so the JIT must (well, should...) do them anyway.
> [1] Dead code elimination: some code is generated internally by the
>     JIT, so the CCO cannot even see it. This is particularly true with
>     inlining (unless you also do inlining in the CCO itself, which is
>     tricky because it is likely to break things like tracing, debugging
>     and AOP frameworks).
 >
It would be possible to do some inlining, but I agree that the JIT can 
do a much, much better job here. For one thing, a CCO has no chance to 
inline methods from referenced assemblies that it doesn't have access to 
or that might be different on the machines where the code is to be run.

I can see how inlining might generate dead code. Btw, does Mono 
internally generate dead code?

> [2] Redundancy elimination: again, some code sequences are totally
>     invisible in CIL (array element and field address computations, or
>     locals initializations, and, if we get smart, vtable accesses...).
>     Moreover, redundancy elimination is really tricky to get right, and
>     it can easily turn into an arch dependent issue.
>     I have been badly burnt by the SSAPRE experience in the JIT about
>     this: on paper, PRE is always a win, and my SSAPRE implementation
>     does nothing else than removing redundancies, yet it can *easily*
>     make the code *worse* :-(
>     The problem is that sometimes storage (especially registers!) is
>     more valuable than computational CPU cycles, so PRE is a tradeoff,
>     and you must get it right on each architecture...
 >
Does SSAPRE have other issues than increased register pressure? Do you 
think support for rematerialization in the register allocator would 
solve most of the issues with SSAPRE?

> Anyway in general I'm skeptical about "omnicomprehensive" approaches
> like the one you describe, not because they are not feasible, but
> because I have the impression that they are not worth the effort.
> I mean, when the control flow reaches the catch/finally, you do not
> know the specific path it took to get there, so even in the perfect
> case you will have very "large" phi functions where you cannot know
> which of the arguments is the "right" one (because statically none
> of them is "right"), so you cannot make inferences on the phi value.
> In the end, having a dummy store statement instead of the phi is the
> same thing IMHO, but cheaper.
> 
I have no experience with this issue, so I don't know how much 
difference it makes. It is easy to conjure up examples where it does 
make a difference, but that does not really prove anything. 
Path-sensitive analyzes do need the CFG to express all the possible 
paths, though.

My rationale for making the CFG as expressive as possible is that it 
reduces the number of special cases in optimizations, that I suspect it 
will make a difference in some situations, and that I think it is 
necessary to do very precise analysis in order to do something like 
findbugs for Java. Also, a CCO needs to be quite good at tracking 
exceptions as they can really inhibit the opportunities for optimization.

Another nice thing with an expressive CFG is that the only barrier to 
code motion within a basic block is that data needs to be computed 
before it can be used.

>> As a practical matter, I also think that it will take less effort to
>> implement optimizations in ML than in C. On a personal note I have been
>> programming a Groebner Basis computation program for quite a while in 
>> C++, and I am looking forward to programming in a functional-style 
>> language again.
> 
> Yep, but, as a practical matter, the runtime is written in C for
> good reasons, and the JIT is part of it... even if I perfectly see
> that other languages are *much* more productive and elegant!
> 
Exactly - the JIT needs to be in something like C while a CCO does not. 
So we agree :-)

/Bjarke Roune