[Mono-devel-list] RAPGO Proposal

Mon Nov 29 17:06:14 EST 2004

I've done a second version of my propsal (more text, more motivation, 
more pictures, more spelling mistakes - in short more of everything):

http://www.wpkrenn.net/pmwiki/pmwiki.php/Willi/DPGOProposal

I'll keep updating this document in future.

Paolo Molaro schrieb:
> Described in the summit notes: the code is not in cvs yet.

Any idea when this code will hit cvs? (I was aware of the notes, but 
that doesn't show me how it is done, etc. ..)

>>When do you think stack walking will be slower than having counters?
> 
> Only in degenerate cases (many threads each many call frames into the stack,
> where many is in the thousands). Using counters would be a nightmare to 
> maintain.

Don't think so (some wrapper around the method to do the countings for 
should be sufficient - along some modifications in ...throw), but I 
agree that counting decreases runtime performance. (Although I've no 
clue by what order of magnitude)

>>Well, a sampling profiler will slow down execution too, but not that 
> 
> 
> A lot less and after the hot methods are recompiled it could be
> shut down or the sampling frequency could be reduced.

I guess the best would be to periodically enable it for some time, see 
if anything exciting happened (Program Phase change) and then disable it 
again..
Some sort of Sampled Sample Profiling :-)

>>much. However, you won't get all information you need for certain 
>>optimizations by using a sampling profiler AFAIK, so IMO it's still 
> 
> Sampling the call stack and not just the IP address should provide
> most of the interesting info.

Depends on what you want to do. More expensive optimizations may also 
need other information than call-graph and invocation %.

I'm thinking of branches (taken,not taken), memory access patterns and 
some other stuff.

> Note that we are interested in portable solutions: using arch-specific
> stuff is of course welcome, but the solution should be implementable in 
> other architectures, too.

Of course the framework and the basic algorithms are portable. But I'll 
do fine-tuning for AMD64-Novell(SuSE)-Linux first (that's what I get my 
mark for and have to present..)
In case you like what I've done and include it, I'll continue to 
fine-tune for other architectures.. (Well, as much as I can that is..)

So there might be some bonbons that work first on x86 / AMD64 Linux. And 
some things might never work on other architectures at all / in the same 
way as on x86.

>>Code that runs endlessly in a loop probably isn't a good candidate for 
>>runtime replacement at all. Probably 'transfer-points' - icalls p/i 
> 
> 
> Eh, yeah, if we could remove issues by just saying they don't exist:-)

Jokes about mathematicians come to mind! :-)

Do you know that one (bad translation from german)?

A mathematician, a physicist and an engineer get the order to enclose a 
herd of sheep.
Materials are supplied: wire mesh, posts etc., and naturally the herd of 
sheep:

Engineer takes the posts, hammers them into the earth around the herd of 
sheep and fastens the wire mesh.

The physicist employs some computations, makes an error estimation and 
says: "These are the places to set up posts, then the wire has to be 
fastened and afterwards the task is solved."

The mathematician takes the wire mesh, winds himself therein in and 
defines himself as outside.

  >>Ok, point taken. (Although if you'd move inssel into the backend and 
let
>>the inssel generate the final code into the target buffer, things should 
>>be faster. However, it's questionable how much.)
> 
> 
> That would prevent a cheap but effective peephole pass, so your proposal
> would actually slow down mono.

I meant the initial codegen would be faster. Later on the emitted code 
(if hot) is recompiled again with stronger optimizations.

AFAIK the peephole optimizer works on a small set of opcodes. So adding 
a peephole pass would still be possible.. But of course you end up 
having a small memory buffer for it and touching each opcode twice again..

> I didn't see any reference to C++ in your mail: you talked about objects,
> but we're fairly good at doing objects in C, too (see the linux kernel, 
> Gtk+, etc:-)

8-|

Ok, I can do objects, ah - structs, in C, but it's - and this is only my 
opinion - brain damaged  to use C if there is a complete language for 
doing objects and there is no-one who'll link to the code you write. Of 
course I wasn't talking about bells and whistles C++, just the basic 
set: objects and possibly exceptions. And of course none of all this in 
places where speed matters. (Kernel is another matter, as everything's a 
bit different there)

The other thing is that you can't catch C++ exceptions thrown in P/I 
code by using a C-only approach.. Yes, I know that this is not supported 
by mono, but somehow I thought adding this feature would be interesting. 
Currently mono just gets terminated by the C runtime if a C++ exception 
is thrown in native P/I code because no exception handler can be found..

> Anyway, no C++ code in the runtime.
> Thanks.

Doesn't like that, but'll stick to it.

Thanks for your response!
Willi