[Mono-dev] Delegates very slow on Mono 2.2/Linux (but not on Mono 2.4/Windows)

Alan McGovern alan.mcgovern at gmail.com
Sun Mar 15 18:13:01 EDT 2009


Just to clarify, you're using a comma as a decimal separator and a dot as a
thousands seperator?

So: 766.6697 ns/call = ~766 thousand ns/call
and 13,0416 ns/call = ~13 ns/call

Alan.

2009/3/15 StApostol <stapostol at gmail.com>

> Hi all,
>
> I just ran some tests to measure performance in OpenTK.Graphics and
> Tao.OpenGl and uncovered some surprising results.
>
> Some background first: OpenGL exports functions either statically ("core
> functions") or dynamically ("extensions"). While you use a simple
> [DllImport] to invoke core functions, you have to invoke extensions through
> function pointers. Different platforms, video cards, even drivers expose
> different subsets of OpenGL as extensions, which means you have to handle
> this issue during runtime.
>
> To deal with this problem, the aforementioned libraries implement a
> relatively complex solution:
>
>    - The union of all core functions is declared as [DllImport] in a
>    private class named "Core".
>    - The union of all core and extension functions are declared as
>    delegates in a private class named "Delegates".
>    - Each delegate has one or more "wrapper" functions. This is the public
>    API for the user.
>    - During initialization, we probe each OpenGL function and "arm" the
>    relevant delegate with Marshal.GetDelegateForFunctionPointer, a function
>    from the Core class or null (if it exported dynamically, statically or not
>    at all, respectively).
>
> Most of the types used in OpenGL interop are blittable, which makes most
> pinvokes pretty fast. The main bottleneck is the delegate call, which should
> be plenty fast (or so we thought).
>
> To test the performance of this approach, I wrote a simple test that
> simulates OpenGL calls (attached). The test measures the call overhead for
> two function prototypes that are very common in OpenGL:
>
>    - void SendFloat(int, int, int, float*)
>    - void Send(int, int, int, int, void*)
>
> The first function is wrapped as "void SendFloat(int, int, int, float[])"
> and the array is pinned and passed as a simple pointer.  The second becomes
> "void Send(int, int, int, int, object)" and the last parameter is also
> pinned (with GCHandle.Alloc) and passed as a simple pointer (we assume
> 'object' is a blittable struct). Each of these functions is tested twice,
> first through a delegate (as outlined above) and then directly with a simple
> pinvoke.
>
> The results are measured on a 2.66GHz Core 2 Duo with each function called
> 10^6 times (not nearly enough for ns accuracy, but the problem is
> nonetheless obvious). The binaries were compiled with gmcs 2.2 (every test
> used the same executable). The unmanaged dll was compiled with gcc on Linux
> (x86_64) and msvc on Windows (x86):
>
> [Mono 2.2, Linux x86_64]
> Timing SendFloat (delegate): 0.7666697 seconds (766.6697 ns/call) with
> 3/3/3 collections.
> Timing SendFloat (direct): 0.0170575 seconds (17.0575 ns/call) with
> 3/3/3 collections.
> Timing Send (delegate): 1.3894752 seconds (1389.4752 ns/call) with
> 3/3/3 collections.
> Timing Send (direct): 0.2461236 seconds (246.1236 ns/call) with
> 3/3/3 collections.
>
> [Mono 2.4 RC1, Windows x86 (VirtualBox)]
> Timing SendFloat (delegate): 0,0130416 seconds (13,0416 ns/call) with 1/1/1
> collections.
> Timing SendFloat (direct): 0,0140448 seconds (14,0448 ns/call) with 1/1/1
> collections.
> Timing Send (delegate): 0,1033469 seconds (103,3469 ns/call) with 1/1/1
> collections.
> Timing Send (direct): 0,1063392 seconds (106,3392 ns/call) with 1/1/1
> collections.
>
> [.Net 3.5 SP1, Windows x86 (VirtualBox)]
> Timing SendFloat (delegate): 0,0117486 seconds (11,7486 ns/call) with
> 0/0/0 collections.
> Timing SendFloat (direct): 0,0070824 seconds (7,0824 ns/call) with
> 0/0/0 collections.
> Timing Send (delegate): 0,1087277 seconds (108,7277 ns/call) with
> 0/0/0 collections.
> Timing Send (direct): 0,095304 seconds (95,304 ns/call) with
> 0/0/0 collections.
>
> As you can see, Mono 2.2 on Linux x86_64 is 5 - 40 times slower when
> calling a delegate - nearly 1us for a single delegate call! In comparison,
> calling a delegate on Windows x86 seems comparable to a simple virtual call
> (1 - 3ns overhead).
>
> A typical, state-of-the-art 3d program may contain somewhere between
> 1000-5000 draw calls per frame. Assuming the above results hold, the interop
> layer will consume between 5-30% of your total frame bugdet (16.6ms) - not
> good!
>
> Is there an explanation for this discrepancy? Can we expect better
> performance in some future version of the runtime? Should we bite the bullet
> and rewrite the bindings in ilasm (replacing pinvokes with calli
> instructions)? Any possible workarounds / alternatives?
>
> Thanks for your time!
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20090315/fda0d0bc/attachment.html 


More information about the Mono-devel-list mailing list