[Mono-dev] JIT register binding

Tue May 17 17:02:42 EDT 2011

Hello all,
  lastly I was looking at n-body test on shootout 
(http://shootout.alioth.debian.org/u64q/performance.php?test=nbody)
  in the context of Mono. Program is very simple so it is a nice piece 
to analyze
  sources of performance problems. I've also contributed SIMD version, 
but it has
  minor meaning as it wasn't significantly faster than the usual one.

  I also made a comparison of performance of this test on Windows, using 
.NET.
  It took about 70% of time that Mono needed, that is little interesting as
  well. However reasons behind that can be interesting. I made small 
analysis of
  code emitted by JIT in both cases. Let's analyze a part of code, 
specifically
  instruction, which computes square of length of vector which is 
difference between
  positions of two bodies:
      double dx = bi.x - bj.x, dy = bi.y - bj.y, dz = bi.z - bj.z;
      double d2 = dx * dx + dy * dy + dz * dz;
  On .NET emitted jit code is:
   fld         qword ptr [edx+4]
   fsub        qword ptr [eax+4]
   fld         qword ptr [edx+0Ch]
   fsub        qword ptr [eax+0Ch]
   fld         qword ptr [edx+14h]
   fsub        qword ptr [eax+14h]
// here we have differences in st - st(2)
   fld         st(2)
   fmul        st,st(3)
   fld         st(2)
   fmul        st,st(3)
   faddp       st(1),st
   fld         st(1)
   fmul        st,st(2)
   faddp       st(1),st
  Here one can see code generated by Mono's jit engine (here is used AT&T
  notation, but it shouldn't be a problem while reading):
   fldl   0x8(%ebx)
   fldl   0x8(%edi)
   fsubrp %st,%st(1)
   fstpl  -0x18(%ebp)
   fldl   0x10(%ebx)
   fldl   0x10(%edi)
   fsubrp %st,%st(1)
   fstpl  -0x20(%ebp)
   fldl   0x18(%ebx)
   fldl   0x18(%edi)
   fsubrp %st,%st(1)
   fstpl  -0x28(%ebp)
   // we have differences in three offsets from ebp (local variables)
   fldl   -0x18(%ebp)
   fldl   -0x18(%ebp)
   fmulp  %st,%st(1)
   fldl   -0x20(%ebp)
   fldl   -0x20(%ebp)
   fmulp  %st,%st(1)
   faddp  %st,%st(1)
   fldl   -0x28(%ebp)
   fldl   -0x28(%ebp)
   fmulp  %st,%st(1)
   faddp  %st,%st(1)
   fstpl  -0x30(%ebp)

  Similarly to .NET, Mono is holding pointers to objects bi and bj in 
registers (in Mono's case edi
  and ebx as one can see). There is however a important difference in 
fact that .NET treats registers
  (floating point stack positions) as local variables while Mono always 
stores result from registers
   to stack allocated memory. In fact, in these case, stack variables 
for dx etc can be fully eliminated
   and this is what .NET jit does. Optimization which should be adopted 
here is to bound local variables
   to registers if possible. On the amd64 architecture problem is very 
similar but with respect to xmm
   registers.

  Problem is very common for all programs that do a lot of computations. 
Agree, this may be minority
  of software runned on Mono, but still.

  Was anyone investigating such kind of optimization? Is it too hard to 
achieve due to nature of Mini or
  any other problems? If I was interested in providing such optimization 
what should be my
  introduction to Mini's code?

  Thanks in advance for answers,
   regards,
   Konrad