[Mono-dev] Mono.Simd - slower than the normal implementation

Alan McGovern alan.mcgovern at gmail.com
Fri Nov 14 21:13:48 EST 2008


I found a bit of code in the SHA1 implementation which i thought was
ideal for SIMD optimisations. However, unless i resort to unsafe code,
it's actually substantially slower! I've attached three
implementations of the method here. The original, the safe SIMD and
the unsafe SIMD. The runtimes are as follows:

Original: 600ms
Unsafe Simd: 450ms
Safe Simd: 1700ms

Also, the method is always called with a uint[] of length 80.

Is this just the wrong place to be using simd? It seemed ideal because
i need 75% less XOR's. If anyone has an ideas on whether SIMD could
actually be useful for this case or not, let me know.

Thanks,
Alan.


The original code is:

private static void FillBuff(uint[] buff)
{
	uint val;
	for (int i = 16; i < 80; i += 8)
	{
		val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i - 16];
		buff[i] = (val << 1) | (val >> 31);

		val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i - 15];
		buff[i + 1] = (val << 1) | (val >> 31);

		val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i - 14];
		buff[i + 2] = (val << 1) | (val >> 31);

		val = buff[i + 0] ^ buff[i - 5] ^ buff[i - 11] ^ buff[i - 13];
		buff[i + 3] = (val << 1) | (val >> 31);

		val = buff[i + 1] ^ buff[i - 4] ^ buff[i - 10] ^ buff[i - 12];
		buff[i + 4] = (val << 1) | (val >> 31);

		val = buff[i + 2] ^ buff[i - 3] ^ buff[i - 9] ^ buff[i - 11];
		buff[i + 5] = (val << 1) | (val >> 31);

		val = buff[i + 3] ^ buff[i - 2] ^ buff[i - 8] ^ buff[i - 10];
		buff[i + 6] = (val << 1) | (val >> 31);

		val = buff[i + 4] ^ buff[i - 1] ^ buff[i - 7] ^ buff[i - 9];
		buff[i + 7] = (val << 1) | (val >> 31);
	}
}

The unsafe SIMD code is:
public unsafe static void FillBuff(uint[] buffb)
{
    fixed (uint* buff = buffb) {
        Vector4ui e;
        for (int t = 16; t < buffb.Length; t += 4)
        {
            e = *((Vector4ui*)&(buff [t-16])) ^
                   *((Vector4ui*)&(buff [t-14])) ^
                   *((Vector4ui*)&(buff [t- 8])) ^
                   *((Vector4ui*)&(buff [t- 3]));
            e.W ^= buff[t];

            buff[t] = (e.X << 1) | (e.X >> 31);
            buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
            buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
            buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) | (e.X >> 30));
        }
    }
}

The safe simd code is:
        public static void FillBuff(uint[] buff)
        {
            Vector4ui e;
            for (int t = 16; t < buff.Length; t += 4)
            {
                e = new Vector4ui (buff [t-16],buff [t-15],buff
[t-14],buff [t-13]) ^
                       new Vector4ui (buff [t-14],buff [t-13],buff
[t-12],buff [t-11]) ^
                       new Vector4ui (buff [t-8],  buff [t-7],  buff
[t-6],  buff [t-5]) ^
                       new Vector4ui (buff [t-3],  buff [t-2],  buff
[t-1],  buff [t-0]);

                e.W ^= buff[t];
                buff[t] =        (e.X << 1) | (e.X >> 31);
                buff[t + 1] = (e.Y << 1) | (e.Y >> 31);
                buff[t + 2] = (e.Z << 1) | (e.Z >> 31);
                buff[t + 3] = (e.W << 1) | (e.W >> 31) ^ ((e.X << 2) |
(e.X >> 30));
            }
        }


More information about the Mono-devel-list mailing list