[Mono-list] JVM performance: JVM as a basis for CLR

Jay Freeman (saurik) saurik@saurik.com
Wed, 25 Jul 2001 19:59:15 -0500


Fergus:

First off, I would like to make the point that I wasn't taking about .NET
being usable to construct faster code by giving end users more language
features.  I agree that .NET is better there; sorry about the oversight.  I
was working under the impression that this thread was trying to show that
there is something that, given two programs that had the same logic, allowed
a compiler and a JIT for the CLR to operate faster than the a compiler and a
JIT for a JVM.  I am not looking at things like value types as those are
things that the user of the system needs to deal with, not the compiler, and
leads to a program with different basic logic.  Same with function pointer
types, that is something the end user can use, but it isn't something about
the file format that the compiler can use to make the JIT work better
without the developer explicitely doing it, rather it allows you to
drastically change how the program works.  It _is_ a good point however, it
allows you to directly support more languages, and happens to be one of the
myriad of reasons why I am using .NET (including unveriable code, which I
think is a great idea as one of the few... hell, one of the five or six MC++
developers).

OK, talking Java here, as the CLR does _need_ tail. for the reasons you
mentioned:

Tail call optimization, however, _is_ something that the compiler can easily
ascertain from the incoming source code and be placed into the binary: emit
a tail call whenever you directly return the result of a function.  I'm
sorry, but I fail to see the complex issue you are trying to espound here
for analyzing this without it being explicitely mentioned...  I am not the
most knowledgable person about these issues (or anywhere near it), so I am
likely wrong, and am going to defer to you (from this point, anyone reading
can feel free to assume "Jay is wrong"), but I'm going to give my arguments
anyway.

Here's how I see it: you need to have a location in a register or on a stack
somewhere _anyway_ just to have a return (it isn't as if the CPU keeps track
of this somewhere).  You get this from your jump and link.  Instead of using
jump and link, you clean your stack, store the return location of the
current function in the right register (the same one jump and link is going
to put it in), and then you use a jump.  Assuming you are doing a calling
convention where the callee cleans the stack (which I understand to be the
case here) you should be able to do this all of the time without any issue
of safety (under the assumption the code you are going to doesn't have more
trust, of course, in which case you can't do this, but that is a case that
you notice immediately while doing code generation) or any complex detection
operation: if you see a "call", "calli", or "callvirt" instruction
immediately followed by a "ret" (which is a requirement of .NET for
verifiable code anyway, so it isn't as if you are now increasing the output
size), turn it into a tail call.  Doing it on _all_ such calls is going to
use a little more code space (as the stack cleanup is going to get
duplicated a lot), but it's going to lower runtime stack usage.

Sure, you get the further optimization of just being able to branch to the
beginning of the function the special case of tail recursion, but that
wasn't the fun one :-).

Is there some limitation imposed by garbage collected runtime systems that
I'm missing here?  Is there some other property than "clean up my stack"
that tail calls have that I don't know about?  Does the runtime use a
calling convention where the callee has to clean up the stack for most
functions (if so, why?) ?  Am I thinking too low level?

Sincerely,
Jay Freeman (saurik)
saurik@saurik.com

* _Please_ do not reply _both_ to me and the mailing list.  I'm on the
mailing list to be on the mailing list and have no use for multiple copies
of messages that manage to get by my mail filtration software and into my
inbox where they don't belong.  --  Footer a suggestion of Micheal Poole to
deal with Reply-To: being considered harmful and the Reply-All that people
use to side-step the issue. *

----- Original Message -----
From: "Fergus Henderson" <fjh@cs.mu.oz.au>
To: "Jay Freeman (saurik)" <saurik@saurik.com>
Cc: "mono-list" <mono-list@ximian.com>
Sent: Wednesday, July 25, 2001 2:26 AM
Subject: Re: [Mono-list] JVM performance: JVM as a basis for CLR


> On 22-Jul-2001, Jay Freeman saurik" <saurik@saurik.com> wrote:
> >
> > The standard basis of the argument for the CLR's fundmental speed
> > improvements over Java is... well, now isn't that interesting.  There
isn't
> > one; at least one that I know of (so if anyone knows one, speak up).
>
> There are several reasons that I can think of as to why it should be
possible
> to make the CLR faster than the JVM:
>
> - support for tail calls
> - less heap allocation, due to
> - value types
> - function pointer types (rather than heap-allocated closures)
> - byref arguments (rather than returning multiple values
>   in heap-allocated objects or arrays)
> - support for unverifiable code (which can avoid the need for some
> runtime checks)
>
> > To me
> > a lack of argument is a pretty good indication of something being
seriously
> > wrong with the assumption.  Some might claim that "tail." is an
important
> > feature for speed of functional languages, but that isn't an advantage
in
> > the slightest.
>
> You're wrong.
>
> Some functional languages, e.g. ML, need a *guarantee* that the
> implementation will use constant stack space for tail-recursive calls.
> If the VM doesn't have an explicit "tail call" instruction, then the
> function language compiler has to resort to techniques such having every
> function return the address of the next function to call to a driver loop.
> These techniques have very significant effects on both efficiency and
> ease of interoperability.
>
> > It is important to note that many JVM's do tailcall
> > optimization, detecting functions that would benefit and _automatically_
> > emitting the right code, even if the original byte-code didn't
explicitely
> > specify it (and AFAIK C# never does anyway).  ORP, obviously my favorite
> > JIT, happens to be one of the ones that does this.
>
> There's two kinds of tail calls.  One is directly recursive tail calls,
> where the callee is the very same function as the caller.  These are the
> easy ones.  For this kind, there is no real need for a VM instruction,
> or even for the JIT to optimize them, since they are easy to optimize
> in the front-end, by compiling them to loops.
>
> The other kind are when the tail call is to a different function.
> These are known in GCC as "sibling calls".
> I'm pretty sure that ORP does *not* optimize these ones.
> These kind are much harder to optimize, both for the front-end
> and for the JIT.  To optimize these in the front-end, you need to resort
> to techniques such as driver loops that slow down the code dramatically.
> They can be optimized in the JIT, but I don't know if any existing
> JITs perform this optimization -- from what I've heard, they don't.
>
> (There is also another CLR-specific reason why a "tail call" instruction
> is needed.  In the presence of unverifiable code and stack allocation,
> it is very difficult for the VM to optimize tail calls which the front-end
> may know are safe, because the VM may not be sure that the stack allocated
> variables aren't still live (e.g. pointed to by unmanaged pointers that
> have been stored in static variables).  This doesn't apply to the JVM,
> because the JVM doesn't support stack allocation or unverifiable code.
> But the lack of a tail call instruction in the JVM is still a problem
> because of the issues with sibling call optimization mentioned earlier.)
>
> --
> Fergus Henderson <fjh@cs.mu.oz.au>  |  "I have always known that the
pursuit
> The University of Melbourne         |  of excellence is a lethal habit"
> WWW: <http://www.cs.mu.oz.au/~fjh>  |     -- the last words of T. S. Garp.