[Mono-dev] Mono.Simd Acceleration Attributes

Fri Nov 7 06:30:11 EST 2008

On 11/07/08 Christophe Guillon wrote:
> It seems that as soon as the Mono.Simd primitives have a well defined
> semantic it is not useful to specify which architecture feature is able to
> emulate each of these primitives. I would have expected this to be the
> choice of the virtual execution environment.

It _is_ ultimately a choice of the runtime.
These attributes are never inspected by the runtime to decide whether to
optimize a method call or not.

> - if my underlying hardware XXX (not SSE2) is able to support efficiently
> add with saturation, I do not have to know whether SSE2 also supports it,
> the virtual machine for XXX can use the corresponding add with saturation
> instruction of XXX at the call sites of AddWithSaturation()   anyway,

When the runtime will implement that optimization, the attribute will be
changed to include SSE2 and your architecture (say AltiVec or Neon
etc). Yes, this requires a re-release of Mono.Simd, but it's not a big
deal as the changes will be relatively rare and if you are happy to
use unoptimized Mono.Simd anyway it doesn't matter.

> - if my underlying hardware features SSE2, the attribute is not useful, the
> virtual machine knows the underlying hardware and thus know that a SSE2
> instruction is able to emulate this,

It's useful to the Mono.Simd programmers, the runtime doesn't use it.

> - if the attribute is there to restrict the mapping to only SSE2 (and above)
> machines, it is an important restriction to the usage of the library.
> Imagine as above that I have in the future a hardware support XXX that is
> able to do AddWithSaturation on Vector16b; if I want a virtual machine to
> execute efficiently this primitive on XXX I would first have to modify the
> Mono.Simd library to add the corresponding XXX attribute and modify the
> primitives declaration to account for it.

Nope, this is not correct.
The behaviour is as follows:
1) the runtime will choose whether a method is optimized or not
depending on the optimization flags (-O=simd, on by default) and on
the features of the current processor.
2) the attributes on the methods are never inspected by the runtime:
they are there to guide the programmers using Mono.Simd in determining
what kind of optimizations are usually available or currently enabled.

The reasoning is this: using unoptimized Mono.Simd is currently
significantly slower than he equivalent scalar code. This has mostly to
do with the additional copies that happen because of the operator
overloading. This overhead is expected to decrease as we add more jit
optimizations. So you have two cases:

1) the slowdown is not significant to you (you must test! Run your
program with mono -O=simd and with mono -O=-simd): in this case
you should ignore completely the acceleration attributes and just enjoy
the speedup that the jit will give you when it can optimize the methods.

2) if the slowdown is significant you might want to have two codepaths,
mostly in the same way in C/C++ you have a C implementation and a simd
implementation of the critical functions. Now the question becomes:
how do you choose at runtime if you want to use Mono.Simd or the scalar
codepath? We offset two patters:

a) do a coarse decision: you take a look at the methods you use in your
algorithms and see that they are optimized when SSE2 is enabled, so you
just do:
	static readobly bool use_mono_simd = (SimdRuntime.AccelMode & AccelMode.SSE2) != 0;
	...
	if (use_mono_simd)
		// simd codepath
	else
		//scalar codepath

b) a fine-grained decision based on all or some of the methods you use:
for each method you check 
	(SimdRuntime.MethodAccelerationMode (typeof(...), "...") & SimdRuntime.AccelMode) != 0
until you determine that enough of your methods are accelerated to make
it worth using the Mono.Simd codepath.

Note that we may eventually either return the attribute not based on the
metadata in the assembly, but based on the runtime understanding: this
will avoid the need to have an updated Mono.Simd assembly when new
optimizations are added. Just use the b pattern if you want to avoid
that issue and remember that you don't usually need to check all the
methods, but just the ones you actually need to be optimized.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better