[Mono-dev] Mono generates inefficient vectorized code

Sergei Dyshel qyron.private at gmail.com
Tue Mar 16 16:44:43 EDT 2010


Hello,

As I said in previous email I'm trying to make Mono+LLVM work on PowerPC. So
far I got it to compile (Mono itself, you can see changes I did in the
attached patch) and it "almost" produces some code but there is a couple of
issues I don't know what to do with:

1. 'mono_llvm_get_call_info' function in 'mini-ppc.c'. It was missing so I
copied it from 'mini-arm.c' and 'mini-x86.c' and modified it a bit in order
to match PPC implementation. I also don't know what to do with vtypes and
arguments on stack so I just disable LLVM in that case. But since I still
can't compile anything I don't know whether my solution works. Can you
please take a look at my function?

2. There are some exception unwinding problems I don''t know how to
overcome. I tried disabling LLVM in case of OP_THROW and
'implicit_exception' in 'mini-llvm.c' but some constructors still need this
unwinding support. You can see the exact error (together with the c# code
I'm using) in attached logs. What else can be done there?

I want to emphasize that currently I need only a "quick" solution that will
work for only functions which don't have vtype arguments and don't call for
any other function except from Mono.Simd (but these calls surely will be
inlined before IR "goes" to LLVM). I understand that implementing a fully
functional support is a huge a mount of work but may be I can do some small
modification for my restricted case.

Any help and hints will be greatly appreciated!
-- 
Regards,
Sergei Dyshel


On Fri, Mar 12, 2010 at 03:07, Zoltan Varga <vargaz at gmail.com> wrote:

> Hi,
>
>   After some fixes to the llvm code in mono SVN, it now generates the
> following:
>
>    d:   0f 10 0f                movups (%rdi),%xmm1
>   10:   66 0f fe c1             paddd  %xmm1,%xmm0
>   14:   48 83 c7 10             add    $0x10,%rdi
>   18:   89 f1                   mov    %esi,%ecx
>   1a:   c1 f9 1f                sar    $0x1f,%ecx
>   1d:   83 e1 03                and    $0x3,%ecx
>   20:   01 f1                   add    %esi,%ecx
>   22:   c1 f9 02                sar    $0x2,%ecx
>   25:   ff c0                   inc    %eax
>   27:   39 c8                   cmp    %ecx,%eax
>   29:   0f 8c de ff ff ff       jl     d <Test_sum+0xd>
>
> which is pretty good.
>
>                              Zoltan
>
> On Fri, Mar 12, 2010 at 1:15 AM, Sergei Dyshel <qyron.private at gmail.com>wrote:
>
>> Hello Rodrigo,
>> Thanks for the quick answer! But do you mean by it that the only
>> problem is in lack of global register allocator? What if 'temp' was
>> not vector but some bare 'int' temporary, would it be loaded and
>> stored in each iteration?
>>
>> Another question. I know that there is also LLVM engine in Mono and
>> LLVM generally supports vector instructions in his IR. Is it hard to
>> add SIMD support to mono-llvm.c?
>> --
>> Regards,
>> Sergei Dyshel
>>
>>
>>
>> On Fri, Mar 12, 2010 at 01:56, Rodrigo Kumpera <kumpera at gmail.com> wrote:
>> > Hi Sergei,
>> >
>> > On Thu, Mar 11, 2010 at 8:30 PM, Sergei Dyshel <qyron.private at gmail.com
>> >
>> > wrote:
>> >>
>> >> Hello,
>> >> I'm doing some research on vectorization using Mono. I've noticed that
>> >> code generated by Mono's JIT contains many unnecessary memory loads
>> >> and stores. Here is simple example, the full code is attached:
>> >>
>> >> public static unsafe int sum(int* a, int size) {
>> >>  Vector4i temp = new Vector4i();
>> >>  Vector4i* p = (Vector4i*) a;
>> >>  for (int i = 0; i < size/4; i++) {
>> >>    temp += *p;
>> >>    p += 1;
>> >>  }
>> >>  return temp.X + temp.Y + temp.Z + temp.W;
>> >> }
>> >>
>> >
>> >
>> > The problem you're seen of going to memory when not needed is due to the
>> > fact that
>> > mono lacks a working global register allocator. If you use a value in a
>> > single basic block,
>> > you'll notice that it's kept in memory the whole time.
>> >
>> > We don't eliminate a lot of redundancies, even under SSA, because our
>> JIT
>> > doesn't know how
>> > to handle SIMD ops under that form. It's an open problem requiring some
>> > work. The same
>> > applies to our global register allocator.
>> >
>> _______________________________________________
>> Mono-devel-list mailing list
>>
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100316/c5ce7289/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.logs.zip
Type: application/zip
Size: 10738 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100316/c5ce7289/attachment-0001.zip 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.cs
Type: application/octet-stream
Size: 599 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100316/c5ce7289/attachment-0003.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cs-sum.rename-to-exe
Type: application/octet-stream
Size: 4096 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100316/c5ce7289/attachment-0004.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mono-llvm-ppc.patch
Type: application/octet-stream
Size: 3660 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100316/c5ce7289/attachment-0005.obj 


More information about the Mono-devel-list mailing list