[Mono-list] Making a ruby.net compiler

Paolo Molaro lupus@ximian.com
Sat, 10 May 2003 13:48:52 +0200

Just wanted to elaborate on two bits.

On 05/09/03 Miguel de Icaza wrote:
> But I am biased against Parrot, because I think that many of the core
> premises they started with are incorrect ('register machines are better
> than stack ones, because there is register allocation research',
> `opcodes are 32-bits, because we want to have many opcodes'). 

There is a paper out there for quite some time that shows that for
interpreters, the more a single opcode does, the faster the VM is
(intuitively, more native code compiled from C is executed than
the actual bytecode and there is less opcode dispatch overhead).
Anyone that follows the development of parrot can check how much effort
has gone in reducing opcode dispatch, for example.
So they figured out two things:

1) that they need lots of opcodes so that they can add as many
specialized opcodes as they want
2) that a register VM uses less opcodes that a stack machine
and as such the opcode dispatch overhead is reduced

Note that both of these things make sense per se. They real issue is
part in the implementation and part on what you really want to build.
Implementation-wise, for example, to add two integers, the bytecode
looks like:

	"32-bit integer add opcode" "32-bit src1/dest register" "32-bit src2 register"

so it's 12 bytes. On the CLR this is:

	*) a single 1-byte add opcode if the operands are already on the
	eval stack
	*) or the sequence:
		ldloc/ldarg/etc 1
		ldloc/ldarg/etc 2
		stloc/starg/etc 1
	this can be from as little as 2 bytes to 13 bytes (but it will
	usually be 7 bytes or less)

So, you see that while the register machine design reduces the dispatch
overhead vs a stack machine _interpreter_, the actual implementation has
a quite big tradeoff: the memory required to store the bytecode is a lot
larger. This won't show up in microbenchmarks, but the cost of this will
be many cache misses that are going to be more expensive as time goes by.
There is another implementation issue with having the ability to load
arbitrary libraries of opcodes, whose overhead has not been studied yet,
AFAIK: to be practical opcodes will need to have their values remapped
on load (to avoid clashes) and this may have a huge impact on dispatch.
There's more: having a large set of opcodes also means that you need an
istruction selector that makes programs take advantage of them or the
compiler writer will have to use them explicitly. Of course, the
instruction selector will have a hard time to find the best instruction
to use if the actual instructions avalable can change at runtime:-)
Note, both of these issues can be fixed in the parrot design, by using
a different, more memory careful, encoding for opcodes and by defining
a fixed (even if large) set of opcode that can't change at runtime.

So, if you want to build an interpreter, avoiding getting too creative
about the bytecode format, a register machine may make perfect sense.
There are two other issues to consider, though, that are not strictly
implementation details, but that can explain why the CLR and parrot
designs are so different (and why some of us, if they had to design the
CLR, wouldn't have choosed the register machine model).

The first issue is the complexity of implementation: it's very easy to
generate code for a stack machine, while generating good code for a
register machine requires hard work both in the VM and in the compiler
that generates code for the VM. Yes, imcc, makes this somewhat simpler,
but it doesn't solve the other issues.
The second issue is related to the call convention.
A stack machine has a very simple call convention: just push the
arguments on the stack. In a register design, the call conventions
can be quite varied and complex (especially with 4 different register
files:-) and you need to explicitly deal with argument overflow. This
adds complexity, requires that the call convention be set in stone to
avoid breaking compatibility (so later the call convention can't be
changed, even if it turns out that it was not ideal). When you mix in
different compilers for different languages that target the same
register-based VM, it can become a real interoperability and
maintainability issue.

The last issue and the root of it all is more fundamental. You see that
all the reasons given above for using a register VM are tied to having
an interpreter. The CLR, and mono, instead, doesn't exists to be yet
another VM for some language. The goal is to be a complete system that
provides good performance and allows interoperability between different
languages and a better development environment. Parrot aims to address
latter two objectives (at least for a large part, many langauges like C#
won't run well if at all on parrot), while we want to address the first
point as well. And to address it, we have a JIT and an AOT compiler and
when you have a JIT, all the reasons for using a register machine go
away. In a JIT there is no opcode overhead, so designing to reduce it
doesn't make sense. The need of coarse opcodes is vastly reduced,
because an optimizing jit can produce native code roughtly equivalent
to the C compiler and, as a plus, it can take advantage of the local
environment where the opcode is inserted. There's more, the issues with
dealing with a call convention go away, since the JIT can use whatever
call convention it wants and it can choose the better at runtime.
The only downside is that porting a jit is harder than porting an
interpreter, but we hope to address this issue with the new mono jit.
Anyway, tradeoffs, tradeoffs, there are always tradeoffs:-)

Hope this helps people understand better some of the comments made on
this list re parrot (well, I don't know if miguel really agrees with
what I wrote, but I guess this time he might be:-).

All of these comments, though, doesn't mean that Parrot doesn't have its
place and appeal: it's important that parrot raised the issue of
providing a VM that could run different dynamic languages. Whether it
will succeed in the end at its goals is mostly up to the hackers which
work on it and to the support they'll get from the communities that
surround the various dynamic langauges efforts and is not much on-topic
with this list:-)

Happy hacking to them (and us), after all, one of the important things,
is to have fun at doing it:-)


lupus@debian.org                                     debian/rules
lupus@ximian.com                             Monkeys do it better