[Mono-dev] mono benchmark on arm no FPU => division optimisation problem?
Martin Fuzzey
mfuzzey at parkeon.com
Wed Feb 24 13:50:40 EST 2010
Hi,
I have run the benchmarks included with mono on a ARM platform
(Freescale iMX21 based on ARM926EJS core)
Using several runtimes:
* mono 2.4.2.3 (built from openembedded)
* mono 2.6.1 (built form released tarball)
* mono svn revision 152005
All the above under linux 2.6.33-rc8 using eabi
The CPU has no FPU (2.4.2.3 shows up as "vfp" due to openembedded patch)
but soft float is used in all cases.
For good measure I also tried Microsoft .NET compact framework under
WinCE 5.0 (on the same hardware)
Results are (times in seconds):
mono2.4.2.3-vfp mono2.6.1-soft
monosvn-soft ms-cf
boxtest 19 36
43 24
bulkcpy 31 32
32 26
castclass 59 59
60 105
cmov1 68 69
68 81
cmov2 64 64
64 67
cmov3 11 10
11 16
cmov4 9 9
9 11
cmov5 88 86
86 130
commute 18 18
17 46
ctor-bench 344 347
424 CRASH
fib 42 77
45 59
iconst-byte 3 3
3 7
initlocals 56 55
55 104
inline-readonly 80 77
76 159
inline1 18 18
18 25
inline2 18 18
18 43
inline3 22 22
22 114
inline4 10 10
10 11
inline5 46 46
46 98
inline6 17 17
17 86
isinst 61 66
65 88
life 16 26
31 15
logic 66 65
66 126
loops 7 7
7 11
math 482 555
570 804
max-min 13 13
12 14
muldiv 102 162
117 16
readonly 16 16
16 31
readonly-byte-array 23 23
23 33
readonly-inst 9 9
9 13
readonly-vt 10 10
10 CRASH
regalloc 25 25
25 30
regalloc-2 14 9
9 12
sbperf1 16 28
33 13
sbperf2 20 38
44 27
switch 89 89
88 124
valuetype-hash-equals 44 54
57 89
vt2 10 10
10 32
Things I noticed:
1) The muldiv test on all mono versions is very slow (cf .NET)
Looking at the jit generated machine code shows that the n = (n / 256)
operation is not being converted to a shift operation (whereas the n = n
* 128 operation _is_). Using a shift (at the C# source level) gives ~4s
(vs 102)
2) boxtest has become significantly slower in more recent mono versions
3) Compares pretty well to .NET CF
I've looked at the code to try to figure out the cause for 1) and it
seems to be that mono uses emulation for the IDIV opcodes as
MONO_ARCH_EMULATE_DIV is defined (since the ARM does not have that in
hardware).
#if defined(MONO_ARCH_EMULATE_MUL_DIV) || defined(MONO_ARCH_EMULATE_DIV)
mono_register_opcode_emulation (CEE_DIV, "__emul_idiv", "int32 int32
int32", mono_idiv, FALSE);
mono_register_opcode_emulation (CEE_DIV_UN, "__emul_idiv_un", "int32
int32 int32", mono_idiv_un, FALSE);
mono_register_opcode_emulation (CEE_REM, "__emul_irem", "int32 int32
int32", mono_irem, FALSE);
mono_register_opcode_emulation (CEE_REM_UN, "__emul_irem_un", "int32
int32 int32", mono_irem_un, FALSE);
mono_register_opcode_emulation (OP_IDIV, "__emul_op_idiv", "int32
int32 int32", mono_idiv, FALSE);
mono_register_opcode_emulation (OP_IDIV_UN, "__emul_op_idiv_un",
"int32 int32 int32", mono_idiv_un, FALSE);
mono_register_opcode_emulation (OP_IREM, "__emul_op_irem", "int32
int32 int32", mono_irem, FALSE);
mono_register_opcode_emulation (OP_IREM_UN, "__emul_op_irem_un",
"int32 int32 int32", mono_irem_un, FALSE);
#endif
Which results in a dispatch to mono_idiv:
gint32
mono_idiv (gint32 a, gint32 b)
{
MONO_ARCH_SAVE_REGS;
#ifdef MONO_ARCH_NEED_DIV_CHECK
if (!b)
mono_raise_exception (mono_get_exception_divide_by_zero ());
else if (b == -1 && a == (0x80000000))
mono_raise_exception (mono_get_exception_arithmetic ());
#endif
return a / b;
}
However at this point the fact that we were dividing by a power of 2
constant has been lost.
Furthermore the actual mechanics of getting to this function is quite
heavy (through an indirection table) as
is born out by the disassembly of the jitted code for muldiv:
For n=n/256:
105c: e1a00006 mov r0, r6
1060: e3a01f40 mov r1, #256 ; 0x100
1064: eb000412 bl 20b4 <plt+0x14>
plt+0x14:
20b4: e28fc000 add ip, pc, #0 ; 0x0
20b8: e28ccc19 add ip, ip, #6400 ; 0x1900
20bc: e59cf058 ldr pc, [ip, #88]
Plus the implementation of mono_idiv
Compare to the code generated for the next two lines:
n++;
n = n * 128;
1068: e2800001 add r0, r0, #1 ; 0x1
106c: e1a00380 lsl r0, r0, #7
Would the correct way to fix this be to translate the opcodes in
mono_arch_lowering_pass() of mini-arm.c?
Regards,
Martin
More information about the Mono-devel-list
mailing list