[Mono-dev] Mono generates inefficient vectorized code
Sergei Dyshel
qyron.private at gmail.com
Thu Mar 11 18:30:04 EST 2010
Hello,
I'm doing some research on vectorization using Mono. I've noticed that
code generated by Mono's JIT contains many unnecessary memory loads
and stores. Here is simple example, the full code is attached:
public static unsafe int sum(int* a, int size) {
Vector4i temp = new Vector4i();
Vector4i* p = (Vector4i*) a;
for (int i = 0; i < size/4; i++) {
temp += *p;
p += 1;
}
return temp.X + temp.Y + temp.Z + temp.W;
}
Here we have simple 'sum' function which sums up a given array of
integers by splitting it into 4-int vectors. Temporary 'temp' is used
as accumulator in the loop and finally the sum of it's components is
returned. Trivially, inside the loop 'temp' should be stored in vector
registered and "leave" it only in the end of the function. Now, let's
concentrate on loop's iteration. Here is result of CIL->IR translation
(obtained with -v), I'm showing only the relevant part:
AFTER METHOD-TO-IR 3: [IN: BB0(0), OUT: BB2(0) ]
xzero R11 <- <============ initialize 'temp' with null value
iconst R12 <- [0]
iconst R13 <- [0]
AFTER METHOD-TO-IR 2: [IN: BB3(0), OUT: BB4(0) ]
xzero R11 <- <============= unnecessary but no critical
(we're outside of the loop)
move R14 <- R9
move R12 <- R14
iconst R13 <- [0]
br [B4]
AFTER METHOD-TO-IR 5: [IN: BB4(0), OUT: BB4(0) ]
xmove R16 <- R11 <============= unnecessary, R16 isn't used anymore,
should be removed in future passes
move R17 <- R12
loadx_membase R18 <- [R17 + 0x0] <========== loading of the next quadruple
paddd R19 <- R11 R18 clobbers: 1 <============ summation
xmove R11 <- R19 <============= R11 still holds 'temp'
move R20 <- R12
nop
int_add_imm R22 <- R20 [16] clobbers: 1
move R12 <- R22
move R23 <- R13
nop
int_add_imm R25 <- R23 [1] clobbers: 1
move R13 <- R25
AFTER METHOD-TO-IR 4: [IN: BB2(0) BB5(0), OUT: BB5(0) BB6(0) ]
move R26 <- R13
move R27 <- R10
nop
int_div_imm R29 <- R27 clobbers: d
icompare R26 R29
int_blt [B5B6]
Here we see some unnecessary instructions but obviously they should be
removed in subsequent passes (to be safe we are JITting with -O=all).
Now, the final part - the assembly and a big dissappointment:
<BB>:5
28: 0f 10 0e movups (%esi),%xmm1
<====== OK, loading next quadruple
2b: 0f 10 45 d8 movups 0xffffffd8(%ebp),%xmm0 <======
superfluous load
2f: 66 0f fe c1 paddd %xmm1,%xmm0
33: 0f 11 45 d8 movups %xmm0,0xffffffd8(%ebp) <======
superfluous store
37: 8d 46 10 lea 0x10(%esi),%eax
3a: 43 inc %ebx
3b: 8b f0 mov %eax,%esi
<BB>:4
3d: 8b c7 mov %edi,%eax
3f: c1 f8 1f sar $0x1f,%eax
42: c1 e8 1e shr $0x1e,%eax
45: 03 c7 add %edi,%eax
47: c1 f8 02 sar $0x2,%eax
4a: 3b d8 cmp %eax,%ebx
4c: 7c da jl 28 <Test_sum+0x28> <======= end of loop
BB>:6
4e: 8b 45 d8 mov 0xffffffd8(%ebp),%eax
51: 8b 4d dc mov 0xffffffdc(%ebp),%ecx
54: 03 c1 add %ecx,%eax
56: 8b 4d e0 mov 0xffffffe0(%ebp),%ecx
59: 03 c1 add %ecx,%eax
5b: 8b 4d e4 mov 0xffffffe4(%ebp),%ecx
5e: 03 c1 add %ecx,%eax
Here we got 'temp' saved and restored in each iteration. This is very
inefficient and generally should be avoided. Can someone of Mono JIT
gurus explain this behavior? Any help will be greatly appreciated!
I'm attaching all the intermediate results of compilation. Anyway,
you can recreate them using:
mcs -unsafe -reference:Mono.Simd.dll cs-sum.cs
mono -O=all -v -v -v cs-sum.exe
--
Regards,
Sergei Dyshel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.cil
Type: application/octet-stream
Size: 6905 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0003.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.cs
Type: application/octet-stream
Size: 599 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0004.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum
Type: application/octet-stream
Size: 4096 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0005.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.zip
Type: application/zip
Size: 5068 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0001.zip
More information about the Mono-devel-list
mailing list