[Mono-list] 32-bit vs 64-bit Performance Discrepency
Rodrigo Kumpera
kumpera at gmail.com
Wed Oct 26 18:15:05 EDT 2011
The CLR demands that all floating point calculations to be conducted with
double precision.
On Wed, Oct 26, 2011 at 4:00 PM, Justin Holewinski <
justin.holewinski at gmail.com> wrote:
> I'm currently testing Mono on some single-precision FP-heavy workloads, and
> I'm a bit surprised to see that the performance of the 64-bit VM is
> significantly slower than the 32-bit VM, over 2x in many cases. As an
> example, on Mac OS X 10.7, with Mono 2.10.6 compiled in both 32-bit and
> 64-bit modes:
>
> Matrix Multiply Micro-Benchmark:
>
> jholewinski at aquila [tests]$ ~/projects/mono/install/x86/bin/mono -O=all
> embed1-extract.exe
> Scalar: 362.775 ms
> Mono.Simd: 164.645 ms
> jholewinski at aquila [tests]$ ~/projects/mono/install/x64/bin/mono -O=all
> embed1-extract.exe
> Scalar: 841.482 ms
> Mono.Simd: 131.844 ms
>
> The Mono.Simd case is good, but for the scalar code that is a
> large discrepancy. Further, if I look at the disasembly from Mono, it looks
> like the 64-bit VM is using double-precision arithmetic for single-precision
> data types with the non-Mono.Simd version:
>
> 000000000000001b movss 0x00(%r13),%xmm0
> 0000000000000021 cvtss2sd %xmm0,%xmm0
> 0000000000000025 movss (%r14),%xmm1
> 000000000000002a cvtss2sd %xmm1,%xmm1
> 000000000000002e mulsd %xmm1,%xmm0
> 0000000000000032 movss 0x04(%r13),%xmm1
> 0000000000000038 cvtss2sd %xmm1,%xmm1
> 000000000000003c movss 0x10(%r14),%xmm2
> 0000000000000042 cvtss2sd %xmm2,%xmm2
> 0000000000000046 mulsd %xmm2,%xmm1
> 000000000000004a addsd %xmm1,%xmm0
> 000000000000004e movss 0x08(%r13),%xmm1
> 0000000000000054 cvtss2sd %xmm1,%xmm1
> 0000000000000058 movss 0x20(%r14),%xmm2
> 000000000000005e cvtss2sd %xmm2,%xmm2
> 0000000000000062 mulsd %xmm2,%xmm1
> 0000000000000066 addsd %xmm1,%xmm0
> 000000000000006a movss 0x0c(%r13),%xmm1
>
> This could definitely account for the performance discrepancy. Why is Mono
> up-converting to doubles for single-precision expressions?
>
> The 32-bit VM appears to be using the x87 stack instead of SSE scalar
> instructions, but at least its using single-precision.
>
> --
>
> Thanks,
>
> Justin Holewinski
>
>
> _______________________________________________
> Mono-list maillist - Mono-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-list/attachments/20111026/50ecedc3/attachment.html
More information about the Mono-list
mailing list