No subject


Fri Feb 8 08:55:55 EST 2008


compiler.  Please understand that this is mostly my OPINION, not necessarily the truth or 
even the best way to do things.  Feel free to disagree (but please do so constructively!).

COMPILER STAGES:

The compiler consists of several stages of operation.  It is possible in describing a 
compiler to state that the stages occur sequentially without over simplification.  In actual 
operation, the various stages are tied together and they operate in an interleaved fashion; 
the compiler driver controls this interoperation.

THE DRIVER
 
The compiler driver is a simple construct that creates the user-interface of the compiler.  
It is responsible for parsing the command line parameters and for printing status and error 
messages.

THE SCANNER:

The scanner is actually a multiple function component.  It is responsible for reading a 
stream of text characters from storage and providing a stream of tokens to the parser.

The initial stream of text may come from multiple files by use of the {$I <filename>} 
directive.

Much like the preprocessor directives of the C language, Object Pascal uses compiler 
directives to modify the behavior of the compiler.  The capability of the OP6 language 
does not provide any method of text replacement or macros, but does provide conditional 
compilation.  The scanner needs to be able to enable or disable blocks of text based on 
compiler directives.

In addition, the compiler directives can be used to override the command line options.

For error reporting, it is necessary that the tokens returned by the scanner have the file 
name, line number and character position of the original text that generated the token.

Tokens should also contain the text of the token, a value and a token type.

THE PARSER:

The parser definition (I'm assuming that it will be a Jay generated parser!) needs to be 
kept as slim and readable as possible.

Due to the namespace feature and out-of-order compile nature of C#, it is possible to 
build a set of helper routines fairly easily and just use simple calls to the helper routines 
in the definition file.

These helper routines should ideally be a part of the parse-tree component.

THE PARSE-TREE:

The parse-tree is a simple data structure that encapsulates the form of the text being 
compiled in a syntactically and (mostly) semantically correct fashion.  

The parse-tree is built from individual node elements.  Each class of element is defined to 
encapsulate one specific semantic concept of the Object Pascal language and is used to 
contain only structures that match.  When the parser tries to build a structure in the parse-
tree that is incorrect, an error can be flagged.

During the first portion of the compile process, the scanner and parser work to fill the 
parse-tree.

Once the tree is built, the driver can (optionally) cause an optimization pass to improve 
the performance of the structure.  (Not my area!  I've studied it, but it still doesn't make 
much sense to me!
Following the optimization pass, the parse-tree is traversed in order to generate code.  
Because each node of the tree embodies a specific semantic concept, the node can 
generate appropriate code to implement that concept.  Any contained nodes are simply 
called to generate their own code in the appropriate order.

NOTE:

It looks like I've chopped off the sequence of the compiler in kind of non-standard way.  
That's just me thinking ahead and forgetting to write what I'm thinking… ?

OBJECT PASCAL NOTES:

Unlike the C family of languages, Object Pascal enforces a rigid structure on compilation 
units.  A Pascal UNIT is a file that can be compiled to an object file,  

The structure of a UNIT is that is an interface definition and an implementation of that 
interface.  This is NOT the concept of classes and interfaces from object oriented 
programming, although it is quite similar.

To compile a group of Pascal UNIT files, it is often necessary to know the interface of 
one UNIT before that unit can be compiled (mutual dependencies).  This is entirely legal 
according the language specification.  

To enable compilation of these units requires that the interface of a unit can be compiled 
separately from the implementation.  Some implementations do this by compiling the unit 
interface to a symbol file and then producing a second object file.  Other implementations 
provide both symbol and object information in the same file.

The practical result of this issue is that the compiler must be able to compile an interface 
of a unit and create a symbol file for that unit, even if the remainder of the unit cannot be 
compiled.

A second part of the Pascal Unit is that the dependencies for the unit are explicit.  Pascal 
is strict about declaring elements before they are used.  This means that the compiler 
must know what an element is before it can compile code to use that element.

What this means is that when the compiler reaches the USES statement in a file, it will 
attempt to load the symbol information for the referenced UNIT.  A USES statement is 
how Object Pascal imports UNIT files.

When attempting to load a unit's symbol file, a process similar to the make program is 
used.  If a symbol file exists, but the source file does not, the symbol file is loaded.  If a 
source file exists and is newer than the symbol file, then it is re-compiled.  If a source file 
exists and no symbol file exists, then it is compiled.

What does this mean to us?  Our compiler MUST BE ABLE to suspend compiling one 
source file, compile another, and then resume the first file.  This must be able to happen 
to an arbitrary depth of recursion.  Some older commercial Pascal compilers limited this 
to a depth of 16 files.  In addition, due to the mutual dependence requirements mentioned 
above, it is necessary for the compiler to be able to compile an interface of a UNIT and 
then abort without error to resume the first file.  When this happens, it is necessary for the 
aborted unit to be re-compiled completely at some later time when the prerequisite units 
have all been compiled.

It is a complex issue, but is part of the standard so we need to take care of it.



OK,

That's enough blather for now.

Comments to the mono-devel mailing list please (mono-devel-list at lists.ximian.com).
 
Thanks,
 
--Grant

--------------090909040302030503050700--



More information about the Mono-devel-list mailing list