[Mono-list] C++ to CIL compilers?

Miguel de Icaza miguel@ximian.com
30 Jan 2002 15:35:32 -0500


Tyson said:

> This helps a LOT, but it only solves a couple of problems -- it does
> give you the tools to solve all the remaining problems, which is why it
> is even *possible* to integrate C++ code with .NET.
> 
> But there are a lot of other changes you need to make to g++ other than
> just some code gen -- for starters you have to implement 
> 	#using <mscorlib.dll> 

You are absolutely right, I had forgotten about this.  This adds to the
time frame to implement such a beast.

The good news is that reading metadata could be done by using one of the
existing metadata loading libraries (Mono has one, Portable.NET has
another one and the Microsoft SDK ships with another one as part of
their sample "SMC" compiler, which is kind of a subset of C++, not sure
yet what exactly it is).

> in g++, which means that g++ has to either be (partially) .NET hosted
> (to use System.Reflection) or be able to parse the metadata.  Once you
> do that you have to convert the metadata into the appropriate C++
> conventions, represent that in g++ internal formats (as if it was a
> #include) and handle error messages and so on.  I guess the work done
> recently to support precompiled interfaces might help somewhat.

Right on the spot again.  The advantage though is that the metadata
already contains the "pre-compiled" information, so instead of talking
to the parser (which is what the precompiled headers would do for
example), you can talk directly to the backend engine and speak to the
semantic analysis piece of the compiler.

> Then you need to define a whole new architecutre to output CIL code.
> Fortunately you can probably reuse a lot of the Java work.

Yep.  The description I have read about egcj mentions that you actually
get two compilers in one:

	.class and .java to native code.

	.java to .class 

The .java to .class compiler will actually not use the internal GCC code
generator but will "escape" the gcc pipeline shortly before.  Not sure
how well documented this is. 

> You have to then deal with the strangeness that is assemblies.  C++
> doesn't work in assemblies, it works with modules that get compiled into
> .o files and put into .a (or .so) files.  The entire linking process is
> different, so you may need to do what MC++ does and generate special kind of
> .o files that contain PEs with partial metadata that needs to be filled
> in later with assembly names (when you actually create DLLs or EXEs).
> This pretty much means you have to implement a little linker.

Oh yes, this sounds very non trivial.  

> Certainly it will be faster than having no reference implementation at
> all!

Heh ;-)

I guess my main worry is that C++ is a large language with some large
features that are not well understood.   Undertaking a C++ compiler from
scratch is significantly harder than butchering an existing and pretty
advanced compiler. 

> I think this is what I was trying to say.  Perhaps we are in violent
> agreement?

Yes ;-)

Miguel.