[Mono-dev] inlining and performance of SIMD code
Erven Rohou
erven.rohou at inria.fr
Tue Oct 20 12:10:31 EDT 2009
Hello,
I have a few questions about inlining:
- I am curious what the heuristics are. I looked at the function
mono_method_check_inlining, but even when the function returns TRUE, the
function might not be inlined. Could you point me the relevant piece of code? Is
there any high level rule to make a guess, like complex control flow, use of
certain opcode, etc?
- Can I force inlining of a given function? Even a hack is fine, I am trying to
evaluate several code generation schemes, and I would like to measure the impact
of inlining. Whatever works is fine.
- I tried to run code with calls to Mono.Simd on architectures that do not
support SIMD (or on x86 with the flag --optimize=-simd). A simple loop written
in C a[i]=b[i]+c[i] gets vectorized by GCC, the bytecode esentially contains
calls to Mono.Simd.Vector4f::LoadAligned, StoreAligned and op_Addition, plus
address computations. The generated code, however, is very inefficient, values
being copied around many times. Here is an example I captured with 'mono -v -v':
f8: 8b 11 mov (%ecx),%edx
fa: 89 55 b8 mov %edx,-0x48(%ebp)
12e: 8b 4d b8 mov -0x48(%ebp),%ecx
131: 89 4d 88 mov %ecx,-0x78(%ebp)
15e: d9 45 88 flds -0x78(%ebp)
161: d9 45 98 flds <...second op...>
164: de c1 faddp %st,%st(1)
19c: d9 9d 4c ff ff ff fstps -0xb4(%ebp)
1b6: d9 85 4c ff ff ff flds -0xb4(%ebp)
1bc: d9 5d a8 fstps -0x58(%ebp)
1da: 8b 4d a8 mov -0x58(%ebp),%ecx
1dd: 89 4d d8 mov %ecx,-0x28(%ebp)
1f2: 8b 4d d8 mov -0x28(%ebp),%ecx
1f5: 89 08 mov %ecx,(%eax)
It seems that a simple copy propagation followed by dead code elimination would
fix it. But I am not sure where I should look. Any comment or suggestion?
Thanks a lot,
--
Erven.
More information about the Mono-devel-list
mailing list