[Mono-dev] misc: C# request info
BGB
cr88192 at hotmail.com
Mon Feb 23 17:35:02 EST 2009
----- Original Message -----
From: "Marek Safar" <marek.safar at seznam.cz>
To: "BGB" <cr88192 at hotmail.com>
Cc: <mono-devel-list at lists.ximian.com>
Sent: Monday, February 23, 2009 7:56 PM
Subject: Re: [Mono-dev] misc: C# request info
> Hi,
>> well, I was looking into C# some, and admittedly I have much less
>> fammiliarity with the language than with others...
>>
>> (I have started looking at ECMA-334 some, but it is long and a little
>> awkward to answer specific questions from absent some digging...).
>>
>>
>> so, firstly, it is my guess that in order to compile C# properly, it is
>> required to load a whole group of files at once (I am uncertain whether
>> the term 'assembly' also applies to the collection of input source files,
>> or only to a produced DLL or EXE).
>>
> You can load C# source code files as well as assemblies or modules.
>
yes, ok.
>> my guess is that it works like this:
>> the group of files is loaded;
>> each file is preprocessed and parsed (it is looking like C# uses a
>> context-independent syntax?...);
>>
> Incorrect, C# uses context dependent keywords.
I meant, context independent as in, the parser can parse things without
knowing about previous declarations.
in C, it is necessary to know about declarations (typedefs, structs, ...) in
order to be able to parse (otherwise... the syntax is ambiguous...).
sadly, I am not all that familiar in detail with C#'s syntax.
in C#, this would mean having to know about all of the classes in all of the
visible namespaces before being able to parse, which would not seem to be
the case.
actually, it is looking to me (just my guess from looking at ECMA-334), that
rather than processing code a few-tokens at a time and using a syntax where
the current parsing depends on prior declarations, C# is using a syntax
where it is possible to parse everything without knowing the types at
parse-time (I will presume this is assumed/required from the way the
language is structured), but that the syntax is not disambiguated be the
next 1-or-2 tokens, but may require trying to parse everything that could
work and using the first syntactic form that does work.
this leads to a difference in parsing strategy, namely that rather than
parsing along token-by-token and reporting a parse error the first time an
unexpected token is seen, one descends into parsing branches, tries to
parse, and if there would be a parse error then returning gracefully,
allowing the next higher level to try the next possible interpretation.
for example:
'(Foo)' is ambiguous, and may be assumed to be a reference in parenthesis;
'(Foo)bar' is recognized as a cast, on the grounds that otherwise it would
not parse correctly.
well, I am not strictly following the syntax structure given in the spec,
mostly because to do so would require completely rewriting my existing
parser, rather I am figuring out how to make it work (AFAICT, the
descriptions in ECMA-334 are LR or LALR, but my parsers are typically
hand-written recursive descent, and infact I am using a C parser as the
basis for a C# parser, but am having to deal with many subtle but
fundamental differences between the languages...).
basically, one of the earlier forms of my C compiler is being used as a
template (I am using an earlier form which uses S-Exps as the basis, rather
than a later form which used DOM instead, as S-Exps are faster and less
memory-consuming than DOM nodes). not all is free though, as I had to
redirect (via search-and-replace) the old typesystem API to the new
typesystem API.
decided to leave out big chunk about my uncertainty over the convention of
usage of letter-case in said APIs naming convention (which differs some from
my newer naming rules, AKA: from 'alllower' to 'camelCase').
note: technically, I am doing all this for a few specific reasons:
a new compiler frontend will be less effort for the time being than getting
either my JBC or CIL frontend working (JBC and CIL still require some
work...);
the prior effort have already lead to me adding much of the needed machinery
into the compiler core (I am still deciding on the details of how I will do
exception handling, and am generally leaning against SEH, but there seems to
be little standardization here, so I may do my own thing and possibly hook
it into the others using VEH, ...);
it would be much less effort (and ugliness) to modify a C frontend into a C#
frontend, than to hack OO and namespaces onto C while still preserving its
"C-ness" (I beat around some ideas, they were ugly...).
as noted, this C# compiler would take the same basic compilation route as my
existing C compiler (no intermediate CIL), and will use the same shared
backend as my other compilers (since the backend and basic machinery are
shared, it doesn't matter too much which frontend uses it, so no real lost
effort here...).
I have opted against adding CPS to my backend for now (creates too many
issues, and would be too much effort, and for now generalized tail-calls and
continuations are not worth a potentially notable cost in performance).
another reason for preferring a C# route rather than a mock-up C++ route, is
that of compilation time:
I am thinking C# should be able to compile much faster than C++ could hope
for, due primarily to the lack of source-level inclusion (ignoring the
possibility of a hack being used to compile headers separately and using
'include' as a module-import feature, which had been considered in my
C-hackery musings...).
misc bit of trivia:
I guess there exists a CIL frontend and backend for GCC, which I discovered
recently (I guess the people here probably alrwady knew).
well, oh well, maybe slight competition against GCC, only that unless GCC
gains dynamic compilation/JIT abilities and becomes much easier to build and
use on Windows, my current efforts continue having at least "some" point
(well, nevermind no one else has interest, but oh well, for my own uses it
works at least, even if not amounting to much...).
I don't know, but last I looked (maybe some-odd years ago) GCC's code was
filled with terror, and for all I know maybe they have cleaned it up with
all the activity (however, the descriptions of how they have accomplished
some things give me doubt...).
sadly, my code is ugly enough... keeping code clean and modular is never as
easy as it seems.
there are invariably messes, and invariably dependencies (as one ends up
with a dependency between their linker and their threading code in order to
implement the '__thread' keyword, ...).
or such...
>> all of the namespaces and declarations are "lifted out" of the parse
>> trees;
>> each file's parse tree can then be compiled.
>>
>> from what I can tell, types are like this:
>> type = <qualifiers>* <type-name>
>>
>> so, I can type:
>> static Foo bar;
>>
>>
>> and the parser will know that 'Foo' is the type, even if the type for Foo
>> is not visible at the time of parsing (in C, this can't be done since
>> there is no clear distinction or ordering between types and qualifiers,
>> and so one would not know if 'Foo' is the type, or an intended variable
>> name with the type being assumed to be 'int').
>>
> No, parser does not yet know the type of 'Foo'.
>
I am not certain which way you mean this.
'no', as in it does not know the type, but still parses it;
'no', as in it can't yet be parsed.
>> so, in C we can have:
>> unsigned int i;
>> int unsigned i;
>> int volatile i;
>> _Complex float f;
>> double _Complex g;
>>
>> unsigned i;
>> short int j;
>> int long k;
>> ..
>>
>> so, my guess then is that C# code is "just parsed", with no need to
>> lookup, for example, is Foo a "struct or class or other typedef'ed type?"
>> ...
>>
>> as far as the parser is concerned 'int' or 'byte' is syntactically not
>> different from 'Foo' or 'Bar'?...
>>
> Correct.
>
C also has certain ugly syntactic forms which I will just assume that C#
does not, such as:
int (*foo(int x, int y))(int z);
> Marek
>
More information about the Mono-devel-list
mailing list