[Mono-dev] Mono.Simd - slower than the normal implementation

Alan McGovern alan.mcgovern at gmail.com
Fri Nov 14 21:15:14 EST 2008


I forgot to mention that I'm on a 1.86GHZ core2duo and i was running
with --optimize=simd.

Alan.

On Sat, Nov 15, 2008 at 2:13 AM, Alan McGovern <alan.mcgovern at gmail.com> wrote:
> I found a bit of code in the SHA1 implementation which i thought was
> ideal for SIMD optimisations. However, unless i resort to unsafe code,
> it's actually substantially slower! I've attached three
> implementations of the method here. The original, the safe SIMD and
> the unsafe SIMD. The runtimes are as follows:
>
> Original: 600ms
> Unsafe Simd: 450ms
> Safe Simd: 1700ms
>
> Also, the method is always called with a uint[] of length 80.
>
> Is this just the wrong place to be using simd? It seemed ideal because
> i need 75% less XOR's. If anyone has an ideas on whether SIMD could
> actually be useful for this case or not, let me know.
>
> Thanks,
> Alan.
>
>
> The original code is:
>
> private static void FillBuff(uint[] buff)
> {
>        uint val;
>        for (int i = 16; i < 80; i += 8)
>        {
>                val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i - 16];
>                buff[i] = (val << 1) | (val >> 31);
>
>                val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i - 15];
>                buff[i + 1] = (val << 1) | (val >> 31);
>
>                val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i - 14];
>                buff[i + 2] = (val << 1) | (val >> 31);
>
>                val = buff[i + 0] ^ buff[i - 5] ^ buff[i - 11] ^ buff[i - 13];
>                buff[i + 3] = (val << 1) | (val >> 31);
>
>                val = buff[i + 1] ^ buff[i - 4] ^ buff[i - 10] ^ buff[i - 12];
>                buff[i + 4] = (val << 1) | (val >> 31);
>
>                val = buff[i + 2] ^ buff[i - 3] ^ buff[i - 9] ^ buff[i - 11];
>                buff[i + 5] = (val << 1) | (val >> 31);
>
>                val = buff[i + 3] ^ buff[i - 2] ^ buff[i - 8] ^ buff[i - 10];
>                buff[i + 6] = (val << 1) | (val >> 31);
>
>                val = buff[i + 4] ^ buff[i - 1] ^ buff[i - 7] ^ buff[i - 9];
>                buff[i + 7] = (val << 1) | (val >> 31);
>        }
> }
>
> The unsafe SIMD code is:
> public unsafe static void FillBuff(uint[] buffb)
> {
>    fixed (uint* buff = buffb) {
>        Vector4ui e;
>        for (int t = 16; t < buffb.Length; t += 4)
>        {
>            e = *((Vector4ui*)&(buff [t-16])) ^
>                   *((Vector4ui*)&(buff [t-14])) ^
>                   *((Vector4ui*)&(buff [t- 8])) ^
>                   *((Vector4ui*)&(buff [t- 3]));
>            e.W ^= buff[t];
>
>            buff[t] = (e.X << 1) | (e.X >> 31);
>            buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
>            buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
>            buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) | (e.X >> 30));
>        }
>    }
> }
>
> The safe simd code is:
>        public static void FillBuff(uint[] buff)
>        {
>            Vector4ui e;
>            for (int t = 16; t < buff.Length; t += 4)
>            {
>                e = new Vector4ui (buff [t-16],buff [t-15],buff
> [t-14],buff [t-13]) ^
>                       new Vector4ui (buff [t-14],buff [t-13],buff
> [t-12],buff [t-11]) ^
>                       new Vector4ui (buff [t-8],  buff [t-7],  buff
> [t-6],  buff [t-5]) ^
>                       new Vector4ui (buff [t-3],  buff [t-2],  buff
> [t-1],  buff [t-0]);
>
>                e.W ^= buff[t];
>                buff[t] =        (e.X << 1) | (e.X >> 31);
>                buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
>                buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
>                buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) |
> (e.X >> 30));
>            }
>        }
>


More information about the Mono-devel-list mailing list