[Mono-dev] More updates on Mono (before the call)

Thu Sep 16 15:24:08 EDT 2010

Hello,

> I'm very sorry, this post was intended for another mailing-list
> (non-public).
> Moderators, please delete it. 

We do not maintain that infrastructure, it is now owned by a third
party.

I am afraid that this is also echoed on a dozen other mail archiving
sites.   

That being said, it looks like fascinating work.  

> -- 
> Regards,
> Sergei Dyshel
> 
> 
> On Wed, Sep 15, 2010 at 22:59, Sergei Dyshel <qyron.private at gmail.com>
> wrote:
>         Hi,
>         I've almost finished tuning Mono's Altivec performance. The
>         results are , as usual, in this table:
>         https://spreadsheets.google.com/ccc?key=0AhjvSAvEoHopdG1LUE9Zdkd1TTZIQ0FCWl82bU5Fa1E&hl=en&authkey=COqyrPMD
>         
>         
>         There are much more "blue" ratios now but there are still some
>         optimization issues I couldn't solve:
>         
>         
>         1) 'mmm_intrchage' uses a different expression for alignment
>         checking (versioning) and this expression is somehow isn't
>         constand-folded during JITing. This results in twice bigger
>         code and register allocator just can't act effectively there.
>         By enabling full optimizations in Mono I could partially solve
>         this problem but is not the best solution (since this
>         increases compilation time).
>         
>         
>         2) 'video_dissolve_fp', 'saxpy_fp', 'dscal_fp' are all
>         variations of simple 'a[i]=b*c[i]+d[i]' floating-point loop.
>         The aligned version, generated by vectorizer, looks (in
>         Gimple) like: "*(&a+i) = b* (*(&c+i)) + *(&d+i)" and this is
>         converted further to CIL. Since Mono has no inter-bb constant
>         propagation and all array's addresses are know at JIT time,
>         all 3 addresses are generated by Mono in each iteration (and
>         it takes 3 PPC instruction for each address). I think this is
>         the reason for bad results but the ratios these benchmarks
>         behave rather differently. Anyway, it would be much better if
>         arrays' addresses were saved to locals in loop prolog and then
>         used in each iteration.
>         
>         
>         'video_dissolve_s8' and 'small_sad' still need to be
>         implemented/analyzed. Tommorow I'll update the numbers for
>         SSE. I anticipate  an improvement after recent tweaks I've
>         added to Mono but it won't so good as with Altivec, mostly
>         because x87 instruction set is more stack-based so
>         floating-point code doesn't get optimized as simply as on
>         PowerPC. Anyway, let's wait until tomorrow's results...
>         
>         
>         That's all, folks! (c)
>         -- 
>         Regards,
>         Sergei Dyshel
>         
> 
> 
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list