[Mono-dev] Mono generates inefficient vectorized code

Sergei Dyshel qyron.private at gmail.com
Thu Mar 11 18:30:04 EST 2010


Hello,
I'm doing some research on vectorization using Mono. I've noticed that
code generated by Mono's JIT contains many unnecessary memory loads
and stores. Here is simple example, the full code is attached:

public static unsafe int sum(int* a, int size) {
 Vector4i temp = new Vector4i();
 Vector4i* p = (Vector4i*) a;
 for (int i = 0; i < size/4; i++) {
   temp += *p;
   p += 1;
 }
 return temp.X + temp.Y + temp.Z + temp.W;
}

Here we have simple 'sum' function which sums up a given array of
integers by splitting it into 4-int vectors. Temporary 'temp' is used
as accumulator in the loop and finally the sum of it's components is
returned. Trivially, inside the loop 'temp' should be stored in vector
registered and "leave" it only in the end of the function. Now, let's
concentrate on loop's iteration. Here is result of CIL->IR translation
(obtained with -v), I'm showing only the relevant part:

AFTER METHOD-TO-IR 3: [IN:  BB0(0), OUT:  BB2(0) ]
 xzero R11 <-              <============ initialize 'temp' with null value
 iconst R12 <- [0]
 iconst R13 <- [0]

AFTER METHOD-TO-IR 2: [IN:  BB3(0), OUT:  BB4(0) ]
 xzero R11 <-           <============= unnecessary but no critical
(we're outside of the loop)
 move R14 <- R9
 move R12 <- R14
 iconst R13 <- [0]
 br [B4]

AFTER METHOD-TO-IR 5: [IN:  BB4(0), OUT:  BB4(0) ]
 xmove R16 <- R11 <============= unnecessary, R16 isn't used anymore,
should be removed in future passes
 move R17 <- R12
 loadx_membase R18 <- [R17 + 0x0] <========== loading of the next quadruple
 paddd R19 <- R11 R18 clobbers: 1 <============ summation
 xmove R11 <- R19 <============= R11 still holds 'temp'
 move R20 <- R12
 nop
 int_add_imm R22 <- R20 [16] clobbers: 1
 move R12 <- R22
 move R23 <- R13
 nop
 int_add_imm R25 <- R23 [1] clobbers: 1
 move R13 <- R25

AFTER METHOD-TO-IR 4: [IN:  BB2(0) BB5(0), OUT:  BB5(0) BB6(0) ]
 move R26 <- R13
 move R27 <- R10
 nop
 int_div_imm R29 <- R27 clobbers: d
 icompare R26 R29
 int_blt [B5B6]

Here we see some unnecessary instructions but obviously they should be
removed in subsequent passes (to be safe we are JITting with -O=all).
Now, the final part - the assembly and a big dissappointment:

<BB>:5
 28:   0f 10 0e                movups (%esi),%xmm1
<====== OK, loading next quadruple
 2b:   0f 10 45 d8             movups 0xffffffd8(%ebp),%xmm0  <======
superfluous load
 2f:   66 0f fe c1             paddd  %xmm1,%xmm0
 33:   0f 11 45 d8             movups %xmm0,0xffffffd8(%ebp)  <======
superfluous store
 37:   8d 46 10                lea    0x10(%esi),%eax
 3a:   43                      inc    %ebx
 3b:   8b f0                   mov    %eax,%esi
<BB>:4
 3d:   8b c7                   mov    %edi,%eax
 3f:   c1 f8 1f                sar    $0x1f,%eax
 42:   c1 e8 1e                shr    $0x1e,%eax
 45:   03 c7                   add    %edi,%eax
 47:   c1 f8 02                sar    $0x2,%eax
 4a:   3b d8                   cmp    %eax,%ebx
 4c:   7c da                   jl     28 <Test_sum+0x28> <======= end of loop
BB>:6
 4e:   8b 45 d8                mov    0xffffffd8(%ebp),%eax
 51:   8b 4d dc                mov    0xffffffdc(%ebp),%ecx
 54:   03 c1                   add    %ecx,%eax
 56:   8b 4d e0                mov    0xffffffe0(%ebp),%ecx
 59:   03 c1                   add    %ecx,%eax
 5b:   8b 4d e4                mov    0xffffffe4(%ebp),%ecx
 5e:   03 c1                   add    %ecx,%eax

Here we got 'temp' saved and restored in each iteration. This is very
inefficient and generally should be avoided. Can someone of Mono JIT
gurus explain this behavior? Any help will be greatly appreciated!

I'm attaching all the intermediate results of  compilation. Anyway,
you can recreate them using:
mcs -unsafe -reference:Mono.Simd.dll cs-sum.cs
mono -O=all -v -v -v cs-sum.exe

--
Regards,
Sergei Dyshel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.cil
Type: application/octet-stream
Size: 6905 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0003.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.cs
Type: application/octet-stream
Size: 599 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0004.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum
Type: application/octet-stream
Size: 4096 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0005.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.zip
Type: application/zip
Size: 5068 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100312/34e4b0c8/attachment-0001.zip 


More information about the Mono-devel-list mailing list