[Mono-list] C++ to CIL compilers?

Tyson Dowd trd@cs.mu.OZ.AU
Wed, 30 Jan 2002 13:45:47 +1100


On 21-Jan-2002, Miguel de Icaza <miguel@ximian.com> wrote:
> > > It took me only 8-months of part-time work on the compiler to write a C#
> > > compiler, so it is something within the scope of a one semester class to
> > > create other languages.
> > 
> > Sure, if these languages are designed to target CIL and are (therefore)
> > pretty much just an alternative syntax for C#.
> > 
> > If you want your language to do anything that CIL doesn't directly
> > support, or you need to interoperate with CIL but your language doesn't
> > have all the features of CIL (or at least the CLS) then you are going to
> > have a much tougher time writing a compiler.
> 
> There are a number of reasons why I believe that adapting g++ to support
> "Managed C++" is not that hard:
> 
> 	* Supporting the C-like constructs is already possible by the
> 	  CLR.  Pointer arithmetic is there, value types are there and
> 	  roll-your-own vtable support is there as well.

This helps a LOT, but it only solves a couple of problems -- it does
give you the tools to solve all the remaining problems, which is why it
is even *possible* to integrate C++ code with .NET.

But there are a lot of other changes you need to make to g++ other than
just some code gen -- for starters you have to implement 
	#using <mscorlib.dll> 
in g++, which means that g++ has to either be (partially) .NET hosted
(to use System.Reflection) or be able to parse the metadata.  Once you
do that you have to convert the metadata into the appropriate C++
conventions, represent that in g++ internal formats (as if it was a
#include) and handle error messages and so on.  I guess the work done
recently to support precompiled interfaces might help somewhat.

Next you have to modify the parser to accept all the new keywords and so
on, and modify the g++ internal data structures to hold the appropriate
information for later code generation.  Hopefully not too hard.

Then you need to define a whole new architecutre to output CIL code.
Fortunately you can probably reuse a lot of the Java work.
You have to also output that appropriate custom attributes so that you
can round trip MC++ signatures in the metadata.

Up to this stage I estimate you  have to do pretty much as much work as
went into GCJ, but hopefully their work will help you do this more
quickly.

You have to then deal with the strangeness that is assemblies.  C++
doesn't work in assemblies, it works with modules that get compiled into
.o files and put into .a (or .so) files.  The entire linking process is
different, so you may need to do what MC++ does and generate special kind of
.o files that contain PEs with partial metadata that needs to be filled
in later with assembly names (when you actually create DLLs or EXEs).
This pretty much means you have to implement a little linker.
If you don't do this you somehow need to arrange for individual .cpp
files that are compiled to know which assembly they are going to be part
of later on.

Finally you have to deal with all the interactions -- I'm assuming that
like MC++ you want to be able to interface with all the exisitng g++
compiled code by generating managed wrapper classes.  This means every
possible interaction between g++ generated code and runtime stuff must map to
.NET and vice versa.  Exceptions, multiple inhertitance, threads,
destructors, streams, etc, all have to be considered.  I would expect
the folk at Microsoft could write a small book on all the cases that
needed to be considered when they implemented MC++.  Even if you don't
support something, you need to document and give error messages.

> 	* An existing "reference" implementation exists, it should be
> 	  relatively simple to learn from the Microsoft Managed C++
> 	  compiler what needs do be done, and how the language maps to
> 	  CIL.

It will be relatively easy to see what MC++ does in any particular case,
that is agreed.

Whether that is useful in the case of g++ is another matter.
Sometimes it will help, sometimes it will be useless.  

Certainly it will be faster than having no reference implementation at
all!

> I agree that starting from scratch would be a much harder task. 
> 
> > C# is going to be the easiest case, as it is practically the same
> > feature set as CIL.  The more different your language is from C#, the
> > more work you are going to have to do.
> 
> I kind of disagree with this general statement.  If your language
> implements features that are hard to express with the current CIL you
> will require to spend a significant amount of time researching the
> mapping, but if your language can be properly expressed in CIL terms
> then the work is less.

I think this is what I was trying to say.  Perhaps we are in violent
agreement?

> > I would certainly think modifying gcc to be like the Managed C++
> > compiler would be way way more work than 1 semester.
> 
> You might be right.  I am known for never guessing correctly schedule
> times. 

I wouldn't try to guess until someone writes a document outlining all
the things that have to be modified.
And of course it helps if you know who is planning on doing the work
before trying to guess how long it will take them ;-)

-- 
       Tyson Dowd           # 
                            #  Surreal humour isn't everyone's cup of fur.
     trd@cs.mu.oz.au        # 
http://www.cs.mu.oz.au/~trd #