[Mono-devel-list] [PATCH] AMD64 Fast TLS; AMD64 COF; AMD64 Inline UnboxTramp; AMD64 mov instead of push for virt. m.

Willibald Krenn Willibald.Krenn at gmx.at
Fri Mar 11 13:38:10 EST 2005


Before explaining the attached patch and how I think it should be 
handled (due to the size this is not easy), let me say a few words about 
my work for the mono community in general..

The last few months were some very interesting and teaching time: I've 
never worked in such a big open source project before and I've never 
done that much C programming (not to speak of C programming under Linux) 
as during this time. The Mono JIT has quite a steep learning curve - if 
you consider the developer's side of things - which I honestly 
underestimated. Mono also has some quite unique svn module/file layout, 
if you are used to object oriented programming.
However, I learned over the last few months that the lead programmers 
behind mono (and the JIT in particular) do have a very deep 
understanding of the source and all problems surrounding it:
Paolo, Zoltan (I hope you talk more than you write), Massi and Ben - you 
have my deepest respect! Not to forget Miguel who was so brave to give 
me a writable svn account...

If you think this sounds like 'good-bye', then you are partly correct: I 
did this work as master thesis and as my personal dead-line is in a few 
weeks from now (and I still have to write the thesis), I probably won't 
be as much involved in the development of Mono as now. (Even if I wished 
to be..)
Regardeless, I'll help spreading the mono spirit whereever I can :-)

Back to the attached patch: Mainly it's the foundations of the 
continuous optimization framework plus the fast TLS patch for AMD64 I 
already published earlier on this list. The unbox trampoline patch was 
kinda necessary for the changes made for COF. (AOT is working here..)

Mods to the default runtime behaviour:
  - inlined unbox trampolines on AMD64
	the add $0x10, %rdi/rsi gets emitted right before the normal
	method start; (The JitInfo->code_start does _not_ point to the
	unbox operation!)
  - codeman.c on AMD64:
	Align code to 16 byte windows. (Matches the Athlon64 / Opteron
	code fetch window.)
  - changes to virtual method prolog/epilog on AMD64:
	Instead of push/pop this patch emits move operations that are
	up to 4 times faster (push push -> mov mov). Unfortunately
	(Sorry Ben!), one move is 5 bytes... However, I think speeding
	up virtual methods is not so bad, since the call is costlier.
	(And AMD64 has plenty of address space.)
  - MonoMethod structure
	I need a bit there to indicate non-movable methods; The current
	solution is a bad hack - maybe I can get a bit in some other
	bitfield? (So the overall alignment is preserved..)

Mods to the runtime behaviour if continuous optimization is enabled (all 
AMD64 only):
  - Pretty much all calls go over a function pointer table (FPT):
	This means additionally allocated memory (2 sizeof(gpointer) per
	method) but has the pro of almost constant time MonoJitInfo
	lookup if a valid stack frame and the domain is known.
	Additionally to the FPT, there is a hash table that translates
	VMT slots to offsets. (So calls over VMT are not slowed down!)
  - MonoJitInfo is extended by 64 bits (carrying sampling profiler
  - Additionally running threads (3)
  - Hot methods get recompiled with all possible optimizations turned on:
	The new version of the method replaces the old one. Additionally
	the old method gets invalidated.
  - SIGUSR2 taken
	This signal is used to highjack running threads and do stack
	walks inside the thread before invalidating an old method.
  - Debug output on screen:
	Prints out the recompiled and patched methods. Additionally
	it also prints out if an old method could be invalidated.
  - Virtual methods store 'this' pointer on stack (one below rbp):
	Due to the transition to moves (instead of pushes), this should
	not cause any slowdown. Note that this change enables for fast
	non-locked MonoJitInfo lookups. (Only a hash table lookup is
	necessary for methods called over VMT..)
  - TODO:
	Add tons of optimizations! The framework is basically there, so
	you can add all sorts of optimizations. (Even very costly ones.)

One feature I probably will implement in the remaining time is caching 
of recompiled methods. (Not the same as in AOT.C, as this code 
precompiles whole assemblies.)

If you want to apply the attached patch, uncompress it and copy the 
mono-funcptrtable files to the utils directory (where codeman etc is 
located), then decompress the contopt tar gz inside the mini directory 
and finally apply the diff to the mono (top level) module. On AMD64 you 
then can enable continuous optimization with --with-contopt=yes on 
configure run. Of course this is still kinda experimental code, but it 
should work nevertheless.

If this patch is allowed to go into svn, I'll be happy to update and 
improve the code in future.

Thanks for all your patience and understanding,
   Willibald Krenn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_contopt_11_02_2005_mono_41705_2.tar.gz
Type: application/x-gzip
Size: 87521 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050311/3af0188e/attachment.gz 

More information about the Mono-devel-list mailing list