[Mono-devel-list] [PATCH] AMD64 Fast TLS, COF, Inline UnboxTramp, mov instead of push for virt. m.

Willibald Krenn Willibald.Krenn at gmx.at
Mon Mar 14 13:02:35 EST 2005


Hi!

Initially, I sent this message last friday, but due to the attached patch it was 
held back for administrator approval. I resend it now without the patch (that 
can be downloaded from the link inside).


 >>>>>>>>
Before explaining the attached patch and how I think it should be
handled (due to the size this is not easy), let me say a few words about
my work for the mono community in general..

The last few months were some very interesting and teaching time: I've
never worked in such a big open source project before and I've never
done that much C programming (not to speak of C programming under Linux)
as during this time. The Mono JIT has quite a steep learning curve - if
you consider the developer's side of things - which I honestly
underestimated. Mono also has some quite unique svn module/file layout,
if you are used to object oriented programming.
However, I learned over the last few months that the lead programmers
behind mono (and the JIT in particular) do have a very deep
understanding of the source and all problems surrounding it:
Paolo, Zoltan (I hope you talk more than you write ;-) ), Massi and Ben - you
have my deepest respect! Not to forget Miguel who was so brave to give
me a writable svn account...

If you think this sounds like 'good-bye', then you are partly correct: I
did this work as master thesis and as my personal dead-line is in a few
weeks from now (and I still have to write the thesis), I probably won't
be as much involved in the development of Mono as now. (Even if I wished
to be..)
Regardeless, I'll help spreading the mono spirit whereever I can :-)

Back to the attached patch: Mainly it's the foundations of the
continuous optimization framework plus the fast TLS patch for AMD64 I
already published earlier on this list. The unbox trampoline patch was
kinda necessary for the changes made for COF. (AOT is working here..)

Mods to the default runtime behaviour:
   - inlined unbox trampolines on AMD64
	the add $0x10, %rdi/rsi gets emitted right before the normal
	method start; (The JitInfo->code_start does _not_ point to the
	unbox operation!)
   - codeman.c on AMD64:
	Align code to 16 byte windows. (Matches the Athlon64 / Opteron
	code fetch window.)
   - changes to virtual method prolog/epilog on AMD64:
	Instead of push/pop this patch emits move operations that are
	up to 4 times faster (push push -> mov mov). Unfortunately
	(Sorry Ben!), one move is 5 bytes... However, I think speeding
	up virtual methods is not so bad, since the call is costlier.
	(And AMD64 has plenty of address space.)
   - MonoMethod structure
	I need a bit there to indicate non-movable methods; The current
	solution is a bad hack - maybe I can get a bit in some other
	bitfield? (So the overall alignment is preserved..)

Mods to the runtime behaviour if continuous optimization is enabled (all
AMD64 only):
   - Pretty much all calls go over a function pointer table (FPT):
	This means additionally allocated memory (2 sizeof(gpointer) per
	method) but has the pro of almost constant time MonoJitInfo
	lookup if a valid stack frame and the domain is known.
	Additionally to the FPT, there is a hash table that translates
	VMT slots to offsets. (So calls over VMT are not slowed down!)
   - MonoJitInfo is extended by 64 bits (carrying sampling profiler
     	information)
   - Additionally running threads (3)
   - Hot methods get recompiled with all possible optimizations turned on:
	The new version of the method replaces the old one. Additionally
	the old method gets invalidated.
   - SIGUSR2 taken
	This signal is used to highjack running threads and do stack
	walks inside the thread before invalidating an old method.
   - Debug output on screen:
	Prints out the recompiled and patched methods. Additionally
	it also prints out if an old method could be invalidated.
   - Virtual methods store 'this' pointer on stack (one below rbp):
	Due to the transition to moves (instead of pushes), this should
	not cause any slowdown. Note that this change enables for fast
	non-locked MonoJitInfo lookups. (Only a hash table lookup is
	necessary for methods called over VMT..)
   - TODO:
	Add tons of optimizations! The framework is basically there, so
	you can add all sorts of optimizations. (Even very costly ones.)


If you want to apply the attached patch, uncompress it and copy the
mono-funcptrtable files to the utils directory (where codeman etc is
located), then decompress the contopt tar gz inside the mini directory
and finally apply the diff to the mono (top level) module. On AMD64 you
then can enable continuous optimization with --with-contopt=yes on
configure run. Of course this is still kinda experimental code, but it
should work nevertheless. (Dynamically compiled methods could be a problem; If 
it is, simply deactivate them for COF)

If this patch is allowed to go into svn, I'll be happy to update and
improve the code in future.

Thanks for all your patience and understanding,
    Willibald Krenn
 >>>>>>>>

The link to the download is
http://www.wpkrenn.net/pmwiki/uploads/Willi.MiniAMD64/patch_contopt_11_02_2005_mono_41705_2.tar.gz

(In the mean time I've got a version that does not emit gcc warnings.)





More information about the Mono-devel-list mailing list