[Mono-devel-list] Compiler thoughts, 2
Miguel de Icaza
miguel at ximian.com
Thu Mar 17 11:07:29 EST 2005
Hello Jb,
> There is a big loss of efforts in maintening multiple compilers. gmcs,
> bmcs, jscript.net, etc. Can we not imagine some kind of library that
> would be the kernel for each Mono compiler ? In my mind, this library
> would rely on the Cecil library to emit assemblies. There is a plan to
> write optimizers for CIL stream using Cecil. So we could have one code
> base, producing optimized CIL stream, (ie like the C++/CLI does), that
> compiler writer could use easily.
Cecil today is similar to Reflection.Emit; Hopefully Cecil will not
have some of the most annoying limitations of Reflection.Emit
This topic is a recurrent one: why not write a library that all
compilers can reuse to minimize the compiler size?
I have had this discussion a number of times, so I will try to summarize
some of the aspects of it:
* Compilers in the .NET world are only half-compilers, they take
an input language and have to produce CIL code, which is not
directly native code. The translation of CIL to native code
happens in another place, something that traditional compilers
have had to address in the past.
* What remains of a .NET compiler is fairly minimal: a
tokenizer, a parser, an internal AST that matches the source
language, and a translation phase to CIL.
The parser, the tokenizer and the AST are likely going to be
intimately tied to the language being implemented, so sharing
there is almost minimal.
* People have suggested introducing a new intermediate
representation between the AST and CIL, but there is no reason
to do so; And nobody has shown that this would bring any
benefits, nor would it reduce the complexity of the compiler.
In addition, a new intermediate layer between the AST and the
CIL means that this layer would have to be prototyped and
tested with a number of compilers before we could consider it
ready for production and complete enough for production.
Keeping in mind that such new intermediate representation is
of dubious use.
* If such a library existed, it also means that all the
compilers consuming it during the development process would
have to be synchronized to it: and making changes to the
library would probably break more than one compiler.
It is of course possible to do this, but with a vastly
under-staffed effort (we still do not have production
compilers for JScript, VB.NET nor C++, and none of those
compilers would benefit from this library) it seems unclear
why we would take into another task.
* Maintenance of a library with a solid, stable API will take a
long time to develop.
Not only this, but the resources that need to be committed to
develop a stable API are fairly large and the burden imposed
on the implementors of the compilers and the library will take
a large toll.
And the most important point, I think is:
* Many compiler authors will want to write their compiler in
their own language, as their own test of completeness and as
their own way of dogfooding the compiler which would make the
sharing of anything above (parser, tokenizer) almost minimal.
Now, I believe that it would be nice to build a higher-level abstraction
than CIL on top of Cecil for the purpose of doing the same kind of
decoding that the JIT compiler does, and to find the higher level
constructions of a program from the CIL stream: basic blocks, trees of
statements, the DU/UD relations and other bits that might be used for an
optimizer.
Such representation could be used by an optimizer to improve the output
produced by standard CIL compilers, removing the burden from normal
compilers to implement very complicated optimizations.
> We can go even further, I think about the lexing/parsing, I would love
> to write a fully object oriented lexer/parser. The compiler writer could
> feed the library, telling it what are the primitives of its language,
> and giving it its implementation of statements, using a pure object
> model. This would obviously be slowler than a jay approach, but this
> could simply lots of things.
>
> What do you think about that ?
As for parsing and lexing: the options are infinite here, and the
standard practice of using generators or hand-tuned parsers and lexers
is not only fairly common, but has been tuned extensively.
Adding abstractions like the one your describe and trying to evolve it
to match all the possible languages in the world is in my opinion not
going to produce any results. The worst problem with this approach is
that you would have to cater to new compiler writers, because existing
compilers are unlikely going to switch, and you do not know what these
new compilers might want to do.
Not only this, but a language designer wants to have flexibility,
flexibility that he can gain by writing a manual parser/tokenizer or
using a code generator, some of which are very advanced and embody many
man years of research and development. Making people use and learn a
new abstraction is just getting in the way of innovation.
My opinion is that the compiler problem is very well understood today,
compilers are not hard to write, but they are not small projects: they
are like a marathon.
Miguel.
More information about the Mono-devel-list
mailing list