[Mono-devel-list] [PATCH] AMD64 Fast TLS; AMD64 COF; AMD64 Inline UnboxTramp; AMD64 mov instead of push for virt. m.
Willibald Krenn
Willibald.Krenn at gmx.at
Fri Mar 11 13:38:10 EST 2005
Hi!
Before explaining the attached patch and how I think it should be
handled (due to the size this is not easy), let me say a few words about
my work for the mono community in general..
The last few months were some very interesting and teaching time: I've
never worked in such a big open source project before and I've never
done that much C programming (not to speak of C programming under Linux)
as during this time. The Mono JIT has quite a steep learning curve - if
you consider the developer's side of things - which I honestly
underestimated. Mono also has some quite unique svn module/file layout,
if you are used to object oriented programming.
However, I learned over the last few months that the lead programmers
behind mono (and the JIT in particular) do have a very deep
understanding of the source and all problems surrounding it:
Paolo, Zoltan (I hope you talk more than you write), Massi and Ben - you
have my deepest respect! Not to forget Miguel who was so brave to give
me a writable svn account...
If you think this sounds like 'good-bye', then you are partly correct: I
did this work as master thesis and as my personal dead-line is in a few
weeks from now (and I still have to write the thesis), I probably won't
be as much involved in the development of Mono as now. (Even if I wished
to be..)
Regardeless, I'll help spreading the mono spirit whereever I can :-)
Back to the attached patch: Mainly it's the foundations of the
continuous optimization framework plus the fast TLS patch for AMD64 I
already published earlier on this list. The unbox trampoline patch was
kinda necessary for the changes made for COF. (AOT is working here..)
Mods to the default runtime behaviour:
- inlined unbox trampolines on AMD64
the add $0x10, %rdi/rsi gets emitted right before the normal
method start; (The JitInfo->code_start does _not_ point to the
unbox operation!)
- codeman.c on AMD64:
Align code to 16 byte windows. (Matches the Athlon64 / Opteron
code fetch window.)
- changes to virtual method prolog/epilog on AMD64:
Instead of push/pop this patch emits move operations that are
up to 4 times faster (push push -> mov mov). Unfortunately
(Sorry Ben!), one move is 5 bytes... However, I think speeding
up virtual methods is not so bad, since the call is costlier.
(And AMD64 has plenty of address space.)
- MonoMethod structure
I need a bit there to indicate non-movable methods; The current
solution is a bad hack - maybe I can get a bit in some other
bitfield? (So the overall alignment is preserved..)
Mods to the runtime behaviour if continuous optimization is enabled (all
AMD64 only):
- Pretty much all calls go over a function pointer table (FPT):
This means additionally allocated memory (2 sizeof(gpointer) per
method) but has the pro of almost constant time MonoJitInfo
lookup if a valid stack frame and the domain is known.
Additionally to the FPT, there is a hash table that translates
VMT slots to offsets. (So calls over VMT are not slowed down!)
- MonoJitInfo is extended by 64 bits (carrying sampling profiler
information)
- Additionally running threads (3)
- Hot methods get recompiled with all possible optimizations turned on:
The new version of the method replaces the old one. Additionally
the old method gets invalidated.
- SIGUSR2 taken
This signal is used to highjack running threads and do stack
walks inside the thread before invalidating an old method.
- Debug output on screen:
Prints out the recompiled and patched methods. Additionally
it also prints out if an old method could be invalidated.
- Virtual methods store 'this' pointer on stack (one below rbp):
Due to the transition to moves (instead of pushes), this should
not cause any slowdown. Note that this change enables for fast
non-locked MonoJitInfo lookups. (Only a hash table lookup is
necessary for methods called over VMT..)
- TODO:
Add tons of optimizations! The framework is basically there, so
you can add all sorts of optimizations. (Even very costly ones.)
One feature I probably will implement in the remaining time is caching
of recompiled methods. (Not the same as in AOT.C, as this code
precompiles whole assemblies.)
If you want to apply the attached patch, uncompress it and copy the
mono-funcptrtable files to the utils directory (where codeman etc is
located), then decompress the contopt tar gz inside the mini directory
and finally apply the diff to the mono (top level) module. On AMD64 you
then can enable continuous optimization with --with-contopt=yes on
configure run. Of course this is still kinda experimental code, but it
should work nevertheless.
If this patch is allowed to go into svn, I'll be happy to update and
improve the code in future.
Thanks for all your patience and understanding,
Willibald Krenn
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch_contopt_11_02_2005_mono_41705_2.tar.gz
Type: application/x-gzip
Size: 87521 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050311/3af0188e/attachment.gz
More information about the Mono-devel-list
mailing list