[Mono-list] Might Have Found A Way To Get GCC .NET !

Zaphod j0k3rin@yahoo.co.in
Tue, 27 Aug 2002 19:09:56 +0530

On Mon, Aug 26, 2002 at 08:13:16PM +0000, Cyber Wiz wrote:
> Guarav had a good idea to use the GCJ sources to build a GCIL 
> looked into i saw that the GCJ sources are heavily confusing to browse and 

GCJ cannot compile C/C++/ADA or any other of the GCC languages ... so it's
typically a waste of time trying to hack GCJ unless you want a Java to IL
compiler ... (duh !)

> GCJ and Jilc 

AFAIK, JILC does not convert JVM to IL fully ... I had a peek ... The
jdasm code works (except for my AWT code , coz the internal anonymous
'$0' class fail miserably)... Gaurav, could you fix that ?

> any ideas, suggestions, assistance, etc., would be welcome 

Read the below passage from Portable.Net FAQ .. for your answer ...
IMO, I'm incapable of hacking GCC for such a major change, But if you
can , good for you ! (and all gcc users)


Why don't you use gcc as the basis for your C# compiler?

A common question that arises is why we aren't using gcc to compile
C# code to IL.  Strategically, we would like to be able to reuse all
of the good work that has gone into gcc.  The DotGNU Project currently
has an open request for someone to volunteer to modify gcc to generate
IL bytecode.

However it isn't quite as easy as it looks.  The following script of
a hypothetical discussion provides a blow by blow account of why this
is so hard.  This script is based in part on e-mails we have exchanged
with users in the past.

Why don't you add C# to the list of languages gcc supports?

Because it won't solve the problem that we need to solve.

Initially we need a C# compiler that can generate IL bytecode for
the .NET platform.  Later, we may need a C# compiler that can
generate native code as well, but that is optional.

Putting a C# parser on the front of gcc would give us a native
compiler, but it won't give us an IL bytecode compiler.

So what?  Add an IL bytecode backend to gcc, and you'll solve your
problem, and also be able to compile C, C++, Fortran, etc, to .NET.

This is not as easy as it looks.  Gcc is divided into a number of
phases: parsing, semantic analysis, tree-to-RTL conversion, RTL
handling (including optimization), and final native code generation.

The hard part is RTL (Register Transfer Language).  This part of
gcc is hard-wired to generate code for register-based CPU's such
as i386, PPC, Sparc, etc.  RTL is not designed for generating code
for stack-based abstract machines such as IL.

Also, RTL loses a lot of the type and code structure information
that IL needs in the final output file.  By the time RTL gets the
code, information about whether a value is an integer or an object
reference is mostly lost.  Information about the class structure
of the code is lost.  This information is critical for correct
compilation of C# to IL.

But hang on a second!  Gcj, the Java back-end for gcc, does stack
machines!  Why not do something like that?

Err ... no it doesn't.  The Java bytecode stuff in gcj is not
organised as an RTL back-end.

When gcj compiles Java, it performs parsing and semantic analysis
in the front-end, like the other supported languages.  Then the
parse tree is sent in one of two different directions.

If gcj is compiling to native, the parse tree is handed to the RTL
core of the compiler, and it takes over.

If gcj is compiling to bytecode, the parse tree is handed to a
completely separate code generator that knows about Java bytecode.

Because gcj does NOT implement a bytecode RTL back-end for gcc, it
cannot compile C, C++, etc down to bytecode.  Java bytecode is a
special case that only works for the Java front-end.

But what about egcs-jvm?  Doesn't it compile C to Java bytecode?

It's a hack.  The code that it generates is horrible, and does not
conform to the usual conventions that the JVM requires.  If one
compiled Java code using this back-end, it wouldn't work with
normal Java code due to the differences in calling conventions
and what-not.

The biggest problem that the author of egcs-jvm he had was
trying to work around the register machine assumptions in the code.
The result wasn't pretty.  He has said that it would be easier to
throw the JVM away and invent a register-based abstract machine
than try to make gcc generate efficient stack machine code

Isn't there a gcc port to the Transputer, which is stack-based?

Yes there is, for an older version of gcc (2.7.2).  The source can
be found at http://wotug.ukc.ac.uk/parallel/transputer/software/

It appears to compile the code to a pseudo-register machine, and then
fixes up the code to be stack based afterwards.  It takes advantage of
some register stack features in gcc that egcs-jvm didn't use.

The Transputer is still a traditional CPU despite being stack-based.
The gcc port puts pointer values into integer pseudo-registers, which would
violate the security requirements of IL.

The i386 gcc port uses a regular set of registers for integer/pointer
values, and a register stack for floating point values.  The Transputer
port uses two register stacks: one for integer/pointer values, and
the other for floating point values.  It may be possible to use three
register stacks for IL: one for integer values, another for pointer values,
and a third for floating point values.

However, this still may not give a useful result.  This fixes the security
problems for the pseudo-registers, but it doesn't fix the security problems
for memory.  RTL assumes that main memory is a flat, untyped, address space,
where any kind of value can be stored in any word.  Partitioning main memory
into separate types may not be possible without a rewrite of RTL.

OK, so do something similar to gcj for C#.  Use two code generators.
That would work right?

Yes it would, except for one small catch.

Because there are so many people who don't understand how gcc works,
they will assume that they can compile C and C++ to IL bytecode after
we release the C# patches.

Then they will discover that this isn't the case and will get
extremely angry that we didn't build what they thought we were
building.  *sigh*

Now matter how we attack the problem, we will end up having to
write an IL bytecode backend for RTL, which is extremely difficult
because of the various assumptions in the code.

Realistically, someone with a great deal of gcc knowledge needs to
go into the gcc core, rip RTL completely out, throw it away, and
replace it with something that knows about both register machines
and stack machines.

Alternatively, someone could create a STL (Stack Transfer Language),
that passes all languages through a separate code generator that
knows about stack machines.  Then we can write STL back-ends for
IL and JVM bytecode.  Both gcj and DotGNU would benefit from this.

We're not buying it.  It's not as hard as you think.?

Fine.  Prove us wrong.  Download the gcc sources and have at it.
The Transputer port may be a good place to start to get ideas,
or it may not.


JILC does not work and gcj does not compile C++ ... otherwise your 
idea seems ok to me >:-)...