[Mono-dev] mcs math performance enhancement

Fri Oct 26 05:33:19 EDT 2007

Hello,
>>>>> problem:
>>>>> As C# does not have an exponentiation operato, programmers will
>>>>> typically use the
>>>>> expression "x*x" to get a square of x. There is Math.Pow, but only for
>>>>> doubles and likely
>>>>> to be inefficient for this simple case. This is a fairly common idiom
>>>>> even for C/C++ math programmers who can use pow(T,int).
>>>>> Mcs (1.2.5.1 release) does not recognize this special case, and emits
>>>>> two loads for the variable (or struct or class field, which is
>>>>> longer), instead of the obvious dup-mul sequence (which can easily
>>>>> identify the "calculate square" idiom for JIT)
>>>>>
>>>>>
>>>>>           
>>>> The only scenario which can benefit from this change is when both of
>>>> operands
>>>> are same instance variables. In all other cases this would produce
>>>> either invalid
>>>> or slower code.
>>>>
>>>>         
>>> Well, of course, it was supposed to only optimize this case (the same
>>> variable referenced twice) - this is what you normally do if you want
>>> to square a variable in C++. The bit about other cases makes no sense
>>> to me - there are no other cases. Perhaps I should have expressed
>>> myself more clearly.
>>>
>>>       
>> Yes, there are many different cases.
>>
>>     
>
>   
>> Consider differences in the following code.
>>
>> class C
>> {
>>   int x1;
>>   void Foo (int x2)
>>   {
>>     int temp;
>>     int x3 = 0;
>>     temp = x1 * x1; // Only this case could benefit from the patch
>>     temp = x2 * x2
>>     temp = x3 * x3;
>>   }
>> }
>>
>>     
>
> yes, in the latter two cases, this would lead to replacing
>
> ldarg.1
> ldarg.1
> mul
>
> with
>
> ldarg.1
> dup
> mul
>
>
> (or ldloc instead of ldarg). Surely this is valid code, although there is no
> code size win in these cases (x the first one). I can't see why this should be
> slower, but of course I know nothing about how JIT works. Can you
> please explain?
>   
I am not an expert too, but Mono JIT is generally slower when it has to 
deal with
dup(s), probably because there is stack logic behind it.

I also think JIT already has a pattern for ldloc/ldloc/mul.

> As for the motivation - I've started writing a 3D vector class with
> x,y,z fields,
> and implemented a norm() function as
> public double norm(Vector3 v) { return Math.Sqrt(v.x*v.x+v.y*v.y+v.z*v.z); }
> (like I'd do in C++), out of curiosity I've looked at the disassembler
> (an assembler readable even for beginners - a big plus for CLI, IMHO)
> and disliked the redundant ldarga/ldfld load pairs.
>
> I agree this is not much important issue, or perhaps this should not be solved
> separately but rather as a part of something bigger, I just wanted to
> try to solve it.
> Btw, the same could be done for any operator, but expressions like
> x-x or x/x are of no use, while x*x is seen often (as there is no x**2
> like in Python or Fortran).
>   
Marek