[Mono-dev] 32-bit vs 64-bit Performance Discrepency

Wed Oct 26 14:20:29 EDT 2011

Cross-posting to mono-dev since I'm not really sure where this belongs...

---------- Forwarded message ----------
From: Justin Holewinski <justin.holewinski at gmail.com>
Date: Wed, Oct 26, 2011 at 2:00 PM
Subject: 32-bit vs 64-bit Performance Discrepency
To: mono-list at lists.ximian.com

I'm currently testing Mono on some single-precision FP-heavy workloads, and
I'm a bit surprised to see that the performance of the 64-bit VM is
significantly slower than the 32-bit VM, over 2x in many cases.  As an
example, on Mac OS X 10.7, with Mono 2.10.6 compiled in both 32-bit and
64-bit modes:

Matrix Multiply Micro-Benchmark:

jholewinski at aquila [tests]$ ~/projects/mono/install/x86/bin/mono -O=all
embed1-extract.exe
Scalar:    362.775 ms
Mono.Simd: 164.645 ms
jholewinski at aquila [tests]$ ~/projects/mono/install/x64/bin/mono -O=all
embed1-extract.exe
Scalar:    841.482 ms
Mono.Simd: 131.844 ms

The Mono.Simd case is good, but for the scalar code that is a
large discrepancy.  Further, if I look at the disasembly from Mono, it looks
like the 64-bit VM is using double-precision arithmetic for single-precision
data types with the non-Mono.Simd version:

000000000000001b        movss   0x00(%r13),%xmm0
0000000000000021        cvtss2sd        %xmm0,%xmm0
0000000000000025        movss   (%r14),%xmm1
000000000000002a        cvtss2sd        %xmm1,%xmm1
000000000000002e        mulsd   %xmm1,%xmm0
0000000000000032        movss   0x04(%r13),%xmm1
0000000000000038        cvtss2sd        %xmm1,%xmm1
000000000000003c        movss   0x10(%r14),%xmm2
0000000000000042        cvtss2sd        %xmm2,%xmm2
0000000000000046        mulsd   %xmm2,%xmm1
000000000000004a        addsd   %xmm1,%xmm0
000000000000004e        movss   0x08(%r13),%xmm1
0000000000000054        cvtss2sd        %xmm1,%xmm1
0000000000000058        movss   0x20(%r14),%xmm2
000000000000005e        cvtss2sd        %xmm2,%xmm2
0000000000000062        mulsd   %xmm2,%xmm1
0000000000000066        addsd   %xmm1,%xmm0
000000000000006a        movss   0x0c(%r13),%xmm1

This could definitely account for the performance discrepancy.  Why is Mono
up-converting to doubles for single-precision expressions?

The 32-bit VM appears to be using the x87 stack instead of SSE scalar
instructions, but at least its using single-precision.

-- 

Thanks,

Justin Holewinski

-- 

Thanks,

Justin Holewinski
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20111026/bf5d0a42/attachment-0001.html