[Mono-dev] Mono generates inefficient vectorized code

Sergei Dyshel qyron.private at gmail.com
Tue Apr 13 11:01:29 EDT 2010

Hello Rodrigo,
Regarding your question unfortunately I cannot apply for GSoC due to time
and other constraints.

With your tips I managed to extend linear scan on to vector registers and
now SIMD code runs much faster. Thank you!

My next (:]) question is about "scalarization", i.e. running programs with
SIMD intrinsics on non-SIMD platforms (just simulating this with -O=-simd).
Current implementation in Mono simply treats vectors as vtypes and passes
them by value using stack, thus doing a lot of superfluous memory copies.
Therefore "scalarized" code runs slow, way behind code without vector

A better solution I'm thinking of is to "reduce" vector size to 1, i.e.
interpret Mono.Simd datatypes as corresponding scalar types. For example:
Vector4i a;
Vector4i b;
Vector4i c = op_addition (a, b);
will be transformed to something like:
int a;
int b;
int c = op_addition (a,b);

of course not any code allows such transformation (it must not use
hard-coded SIMD size but use some kind of get-vector-size intrinsics). I
tried some test by manually replacing assembly and it showed great results.
But now I want to do transformation inside the JIT.

Can you please help me to find corresponding place in JIT where I can do the
transformation? I tried searching through 'method-to-ir.c' but could realize
where exactly vtypes can be transformed to scalar types.
Sergei Dyshel

On Thu, Apr 8, 2010 at 18:08, Rodrigo Kumpera <kumpera at gmail.com> wrote:

> Hi Sergei,
> On Thu, Apr 8, 2010 at 11:59 AM, Sergei Dyshel <qyron.private at gmail.com>wrote:
>> Hello Rodrigo,
>> Just picking up this conversation we had some time ago. I was asking why
>> JIT does unneeded loads and stores and you answered that this behavior is
>> because of lack of global reg allocator. I understand it so that any vreg
>> which is used in different basic blocks is "promoted" to "memory variable"
>> and hence gets loaded and stored each time.
>> Then I asked why bare "global" 'ints' are treated differently (and more
>> effectively) and you said that there are callee-saved iregs so there is a
>> specialized allocator for them.
>> Can you please point at the relevant place in code?
> Look into liveness.c / linear_scan.c.
> In liveness.c look for mono_analyze_liveness
> In linear_scan.c look for mono_linear_scan
>> On Altivec we have callee-saved vector registers too. Is it possible to
>> use the same trick with them , in order to remove unnecessary loads/stores?
> Yes, it might be possible to do so, not sure how much work it will be thou.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100413/cd49b386/attachment.html 

More information about the Mono-devel-list mailing list