[Mono-dev] Mono.Simd - slower than the normal implementation

Alan McGovern alan.mcgovern at gmail.com
Sat Nov 15 21:37:01 EST 2008


Hey,

On Sat, Nov 15, 2008 at 3:50 PM, Rodrigo Kumpera <kumpera at gmail.com> wrote:
> Hi Alan,
> -Getters and setter are a hint of ill vectorized code.

In this particular scenario, I'm not sure how i can get rid of the use
of getters/setters unless I use even more unsafe code. I don't know
whether it's feasible or not, but it'd be great to be able to use this
API without having to use unsafe code. At the moment, I don't think
it's really possible to use this API without getters and setters.

> The last part of your unsafe code should use temps for the intermediate results.
Do you mean that I should copy the vector 'e', which i got from
XOR'ing my values, into another Vector4ui using the store operation?
Then I should do my bitshifting/storing into uint[] from that one?


> -For the safe case we still miss proper integration with arrays, in the form
> of methods to
> extract and store vectors from them.

I was thinking that the API could expose something like:
Vector4ui.Create (uint[] array, int offset, ref Vector4ui result)
which could be changed into:
result = *((Vector4ui*)&array [offset]);

Though I'm sure you have ideas already on this ;) A similar method for
storing the result into a uint[] would be great too.

>
> Your code looks a bit strange, the Vector4ui constructor indexes in
> particular. Have you checked that
> the output of the 3 methods are the same?
Yes, there is a bug in my implementation there, I left out a bracket
when setting the value of buff[t+3]. There should be an additional
bracket around (e.W << 1) | (e.W >> 31). Other than that, the
implementation is correct. I've pasted the correct implementation of
the unsafe and safe SIMD versions below. Just for reference purposes.

>
> I'll work on the Mono.Simd issues next week, getting setters to be
> accelerated, some methods
> to better integrate with arrays and other things like element extractors.

Great stuff. Give me a shout when you've done that and I'll try to
improve the above implementation. Though if you have time to spare
while writing the SIMD code, you could take a look at it yourself ;)

Thanks,
Alan.

Reference implementations (non buggy ;) ):
        public static void FillBuffSafe(uint[] buff)
        {
            for (int t = 16; t < buff.Length; t += 4)
            {
                Vector4ui e = new Vector4ui(buff[t - 3], buff[t - 2],
buff[t - 1], buff[t - 0]) ^
                              new Vector4ui(buff[t - 8], buff[t - 7],
buff[t - 6], buff[t - 5]) ^
                              new Vector4ui(buff[t - 14], buff[t -
13], buff[t - 12], buff[t - 11]) ^
                              new Vector4ui(buff[t - 16], buff[t -
15], buff[t - 14], buff[t - 13]);
                e.W ^= buff[t];

                buff[t] = (e.X << 1) | (e.X >> 31);
                buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
                buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
                buff[t + 3] = ((e.W << 1) | (e.W >> 31)) ^ ((e.X << 2)
| (e.X >> 30));
            }
        }


        public unsafe static void FillBuffUnsafe(uint[] buffb)
        {
            fixed (uint* buff = buffb)
            {
                for (int t = 16; t < buffb.Length; t += 4)
                {
                    Vector4ui e = *((Vector4ui*)&buff[t - 3]) ^
                                  *((Vector4ui*)&buff[t - 8]) ^
                                  *((Vector4ui*)&buff[t - 14]) ^
                                  *((Vector4ui*)&buff[t - 16]);
                    e.W ^= buff[t];

                    buff[t] = (e.X << 1) | (e.X >> 31);
                    buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
                    buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
                    buff[t + 3] = ((e.W << 1) | (e.W >> 31)) ^ ((e.X
<< 2) | (e.X >> 30));
                }
            }
        }

>
> Rodrigo
>
> On Sat, Nov 15, 2008 at 12:13 AM, Alan McGovern <alan.mcgovern at gmail.com>
> wrote:
>>
>> I found a bit of code in the SHA1 implementation which i thought was
>> ideal for SIMD optimisations. However, unless i resort to unsafe code,
>> it's actually substantially slower! I've attached three
>> implementations of the method here. The original, the safe SIMD and
>> the unsafe SIMD. The runtimes are as follows:
>>
>> Original: 600ms
>> Unsafe Simd: 450ms
>> Safe Simd: 1700ms
>>
>> Also, the method is always called with a uint[] of length 80.
>>
>> Is this just the wrong place to be using simd? It seemed ideal because
>> i need 75% less XOR's. If anyone has an ideas on whether SIMD could
>> actually be useful for this case or not, let me know.
>>
>> Thanks,
>> Alan.
>>
>>
>> The original code is:
>>
>> private static void FillBuff(uint[] buff)
>> {
>>        uint val;
>>        for (int i = 16; i < 80; i += 8)
>>        {
>>                val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i -
>> 16];
>>                buff[i] = (val << 1) | (val >> 31);
>>
>>                val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i -
>> 15];
>>                buff[i + 1] = (val << 1) | (val >> 31);
>>
>>                val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i -
>> 14];
>>                buff[i + 2] = (val << 1) | (val >> 31);
>>
>>                val = buff[i + 0] ^ buff[i - 5] ^ buff[i - 11] ^ buff[i -
>> 13];
>>                buff[i + 3] = (val << 1) | (val >> 31);
>>
>>                val = buff[i + 1] ^ buff[i - 4] ^ buff[i - 10] ^ buff[i -
>> 12];
>>                buff[i + 4] = (val << 1) | (val >> 31);
>>
>>                val = buff[i + 2] ^ buff[i - 3] ^ buff[i - 9] ^ buff[i -
>> 11];
>>                buff[i + 5] = (val << 1) | (val >> 31);
>>
>>                val = buff[i + 3] ^ buff[i - 2] ^ buff[i - 8] ^ buff[i -
>> 10];
>>                buff[i + 6] = (val << 1) | (val >> 31);
>>
>>                val = buff[i + 4] ^ buff[i - 1] ^ buff[i - 7] ^ buff[i -
>> 9];
>>                buff[i + 7] = (val << 1) | (val >> 31);
>>        }
>> }
>>
>> The unsafe SIMD code is:
>> public unsafe static void FillBuff(uint[] buffb)
>> {
>>    fixed (uint* buff = buffb) {
>>        Vector4ui e;
>>        for (int t = 16; t < buffb.Length; t += 4)
>>        {
>>            e = *((Vector4ui*)&(buff [t-16])) ^
>>                   *((Vector4ui*)&(buff [t-14])) ^
>>                   *((Vector4ui*)&(buff [t- 8])) ^
>>                   *((Vector4ui*)&(buff [t- 3]));
>>            e.W ^= buff[t];
>>
>>            buff[t] = (e.X << 1) | (e.X >> 31);
>>            buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
>>            buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
>>            buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) | (e.X >>
>> 30));
>>        }
>>    }
>> }
>>
>> The safe simd code is:
>>        public static void FillBuff(uint[] buff)
>>        {
>>            Vector4ui e;
>>            for (int t = 16; t < buff.Length; t += 4)
>>            {
>>                e = new Vector4ui (buff [t-16],buff [t-15],buff
>> [t-14],buff [t-13]) ^
>>                       new Vector4ui (buff [t-14],buff [t-13],buff
>> [t-12],buff [t-11]) ^
>>                       new Vector4ui (buff [t-8],  buff [t-7],  buff
>> [t-6],  buff [t-5]) ^
>>                       new Vector4ui (buff [t-3],  buff [t-2],  buff
>> [t-1],  buff [t-0]);
>>
>>                e.W ^= buff[t];
>>                buff[t] =        (e.X << 1) | (e.X >> 31);
>>                buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
>>                buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
>>                buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) |
>> (e.X >> 30));
>>            }
>>        }
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
>


More information about the Mono-devel-list mailing list