[Mono-dev] JIT register binding
"Konrad M. Kruczyński"
konrad.kruczynski at gmail.com
Tue May 17 17:02:42 EDT 2011
Hello all,
lastly I was looking at n-body test on shootout
(http://shootout.alioth.debian.org/u64q/performance.php?test=nbody)
in the context of Mono. Program is very simple so it is a nice piece
to analyze
sources of performance problems. I've also contributed SIMD version,
but it has
minor meaning as it wasn't significantly faster than the usual one.
I also made a comparison of performance of this test on Windows, using
.NET.
It took about 70% of time that Mono needed, that is little interesting as
well. However reasons behind that can be interesting. I made small
analysis of
code emitted by JIT in both cases. Let's analyze a part of code,
specifically
instruction, which computes square of length of vector which is
difference between
positions of two bodies:
double dx = bi.x - bj.x, dy = bi.y - bj.y, dz = bi.z - bj.z;
double d2 = dx * dx + dy * dy + dz * dz;
On .NET emitted jit code is:
fld qword ptr [edx+4]
fsub qword ptr [eax+4]
fld qword ptr [edx+0Ch]
fsub qword ptr [eax+0Ch]
fld qword ptr [edx+14h]
fsub qword ptr [eax+14h]
// here we have differences in st - st(2)
fld st(2)
fmul st,st(3)
fld st(2)
fmul st,st(3)
faddp st(1),st
fld st(1)
fmul st,st(2)
faddp st(1),st
Here one can see code generated by Mono's jit engine (here is used AT&T
notation, but it shouldn't be a problem while reading):
fldl 0x8(%ebx)
fldl 0x8(%edi)
fsubrp %st,%st(1)
fstpl -0x18(%ebp)
fldl 0x10(%ebx)
fldl 0x10(%edi)
fsubrp %st,%st(1)
fstpl -0x20(%ebp)
fldl 0x18(%ebx)
fldl 0x18(%edi)
fsubrp %st,%st(1)
fstpl -0x28(%ebp)
// we have differences in three offsets from ebp (local variables)
fldl -0x18(%ebp)
fldl -0x18(%ebp)
fmulp %st,%st(1)
fldl -0x20(%ebp)
fldl -0x20(%ebp)
fmulp %st,%st(1)
faddp %st,%st(1)
fldl -0x28(%ebp)
fldl -0x28(%ebp)
fmulp %st,%st(1)
faddp %st,%st(1)
fstpl -0x30(%ebp)
Similarly to .NET, Mono is holding pointers to objects bi and bj in
registers (in Mono's case edi
and ebx as one can see). There is however a important difference in
fact that .NET treats registers
(floating point stack positions) as local variables while Mono always
stores result from registers
to stack allocated memory. In fact, in these case, stack variables
for dx etc can be fully eliminated
and this is what .NET jit does. Optimization which should be adopted
here is to bound local variables
to registers if possible. On the amd64 architecture problem is very
similar but with respect to xmm
registers.
Problem is very common for all programs that do a lot of computations.
Agree, this may be minority
of software runned on Mono, but still.
Was anyone investigating such kind of optimization? Is it too hard to
achieve due to nature of Mini or
any other problems? If I was interested in providing such optimization
what should be my
introduction to Mini's code?
Thanks in advance for answers,
regards,
Konrad
More information about the Mono-devel-list
mailing list