No subject
Fri Feb 8 08:55:55 EST 2008
compiler. Please understand that this is mostly my OPINION, not necessarily the truth or
even the best way to do things. Feel free to disagree (but please do so constructively!).
COMPILER STAGES:
The compiler consists of several stages of operation. It is possible in describing a
compiler to state that the stages occur sequentially without over simplification. In actual
operation, the various stages are tied together and they operate in an interleaved fashion;
the compiler driver controls this interoperation.
THE DRIVER
The compiler driver is a simple construct that creates the user-interface of the compiler.
It is responsible for parsing the command line parameters and for printing status and error
messages.
THE SCANNER:
The scanner is actually a multiple function component. It is responsible for reading a
stream of text characters from storage and providing a stream of tokens to the parser.
The initial stream of text may come from multiple files by use of the {$I <filename>}
directive.
Much like the preprocessor directives of the C language, Object Pascal uses compiler
directives to modify the behavior of the compiler. The capability of the OP6 language
does not provide any method of text replacement or macros, but does provide conditional
compilation. The scanner needs to be able to enable or disable blocks of text based on
compiler directives.
In addition, the compiler directives can be used to override the command line options.
For error reporting, it is necessary that the tokens returned by the scanner have the file
name, line number and character position of the original text that generated the token.
Tokens should also contain the text of the token, a value and a token type.
THE PARSER:
The parser definition (I'm assuming that it will be a Jay generated parser!) needs to be
kept as slim and readable as possible.
Due to the namespace feature and out-of-order compile nature of C#, it is possible to
build a set of helper routines fairly easily and just use simple calls to the helper routines
in the definition file.
These helper routines should ideally be a part of the parse-tree component.
THE PARSE-TREE:
The parse-tree is a simple data structure that encapsulates the form of the text being
compiled in a syntactically and (mostly) semantically correct fashion.
The parse-tree is built from individual node elements. Each class of element is defined to
encapsulate one specific semantic concept of the Object Pascal language and is used to
contain only structures that match. When the parser tries to build a structure in the parse-
tree that is incorrect, an error can be flagged.
During the first portion of the compile process, the scanner and parser work to fill the
parse-tree.
Once the tree is built, the driver can (optionally) cause an optimization pass to improve
the performance of the structure. (Not my area! I've studied it, but it still doesn't make
much sense to me!
Following the optimization pass, the parse-tree is traversed in order to generate code.
Because each node of the tree embodies a specific semantic concept, the node can
generate appropriate code to implement that concept. Any contained nodes are simply
called to generate their own code in the appropriate order.
NOTE:
It looks like I've chopped off the sequence of the compiler in kind of non-standard way.
That's just me thinking ahead and forgetting to write what I'm thinking
?
OBJECT PASCAL NOTES:
Unlike the C family of languages, Object Pascal enforces a rigid structure on compilation
units. A Pascal UNIT is a file that can be compiled to an object file,
The structure of a UNIT is that is an interface definition and an implementation of that
interface. This is NOT the concept of classes and interfaces from object oriented
programming, although it is quite similar.
To compile a group of Pascal UNIT files, it is often necessary to know the interface of
one UNIT before that unit can be compiled (mutual dependencies). This is entirely legal
according the language specification.
To enable compilation of these units requires that the interface of a unit can be compiled
separately from the implementation. Some implementations do this by compiling the unit
interface to a symbol file and then producing a second object file. Other implementations
provide both symbol and object information in the same file.
The practical result of this issue is that the compiler must be able to compile an interface
of a unit and create a symbol file for that unit, even if the remainder of the unit cannot be
compiled.
A second part of the Pascal Unit is that the dependencies for the unit are explicit. Pascal
is strict about declaring elements before they are used. This means that the compiler
must know what an element is before it can compile code to use that element.
What this means is that when the compiler reaches the USES statement in a file, it will
attempt to load the symbol information for the referenced UNIT. A USES statement is
how Object Pascal imports UNIT files.
When attempting to load a unit's symbol file, a process similar to the make program is
used. If a symbol file exists, but the source file does not, the symbol file is loaded. If a
source file exists and is newer than the symbol file, then it is re-compiled. If a source file
exists and no symbol file exists, then it is compiled.
What does this mean to us? Our compiler MUST BE ABLE to suspend compiling one
source file, compile another, and then resume the first file. This must be able to happen
to an arbitrary depth of recursion. Some older commercial Pascal compilers limited this
to a depth of 16 files. In addition, due to the mutual dependence requirements mentioned
above, it is necessary for the compiler to be able to compile an interface of a UNIT and
then abort without error to resume the first file. When this happens, it is necessary for the
aborted unit to be re-compiled completely at some later time when the prerequisite units
have all been compiled.
It is a complex issue, but is part of the standard so we need to take care of it.
OK,
That's enough blather for now.
Comments to the mono-devel mailing list please (mono-devel-list at lists.ximian.com).
Thanks,
--Grant
--------------090909040302030503050700--
More information about the Mono-devel-list
mailing list