[Mono-dev] Patch: Fast virtual generic method calls

Fri Sep 26 10:56:44 EDT 2008

Mark,

Since you started this discussion on MDL, we better keep it there.

On Fri, Sep 26, 2008 at 11:22 AM, Mark Probst <mark.probst at gmail.com> wrote:

> On Fri, Sep 26, 2008 at 2:19 PM, Rodrigo Kumpera <kumpera at gmail.com>
> wrote:
>
> > The
> > advantage
> > of imposing a limit is the fact that this could be used in memory
> > constrained environments.
> > I agree it doesn't buy anything performance wise.
>
> Well, actually, you might have a point.  It could be a good idea to
> limit the thunks to some number of instantiations, but to keep track
> of how often each instantiation is called.  Of course we could do the
> book-keeping only for those instantiations that fall back to the
> trampoline, but that's at least a crude way of counting.  We could,
> for example, insert an instantiation only if it's been called at least
> a hundred times.  Once we have, say, 8 different instantiations in the
> thunk, we might choose to add another one only if it has at least 50
> more counted invocations that another one, at which point we'd switch
> those two.  I'll try implementing something like that.
>

Waiting for a bunch of instantiations to queue up is indeed a good idea and
should same some code space.

>
> Of course benchmarks would be nice, so if anybody has an application
> that uses lots of virtual generic method calls, please let me know.
>

I know that the DLR doesn't use virtual generics because the know the perf
implications. But if I remember correctly, the F# tests you did had some.

>
> > Won't the constant rebuilding of the decision-tree impair performance?
>
> We're already doing that every time a new instantiation is added to a
> thunk.  Whether we use one big contiguous region of memory or many
> small ones to implement a thunk doesn't change that.
>

I guess we can't avoid sorting the whole thing everytime unless we keep it
in
memory and do insertions - which would be a bad idea.

>
> > Instead of building thunk code with the tokens embedded, we externalize
> then
> > into a table and when a new element is added we can add it in place. It
> > would
> > be similar to the jit info table, where you insert stuff concurrently to
> > readers.
> >
> > So the code instead of:
> >
> > mov %eax, 0x838383
> > cmp %eax, %IMG_REG
> > ...
> >
> > It would be
> >
> > mov %eax, [0x33333]
> > cmp %eax, %IMG_REG
>
> I'm not sure that can be made to work.  The table would have to
> contain both keys and values, and we can't fetch or update them
> together atomically.
>

Ok, but we could use a SkipList instead of a tree. This should
allow us to patch the code and do incremental thunk building - which avoids
the release issue altogether.

> > Anyway, the whole issue of having to manage live code makes me think that
> an
> > inline
> > cache (like the one used in Self) could provide similar results without
> the
> > burden.
>
> The Self polymorphic inline caches only do linear search and are
> limited to a small number of items.  The issue of freeing/reusing the
> thunk memory in a thread-safe way would still remain, though.
>
> Mark
>

I was refering to an inline cache that is just inlined code in the call site
with
one or two slots. This would avoid the whole trunk building thing and should
have similar performance. Or this would still have the issue of having to
call
a per-instantiation wrapper?
If we could avoid the mrgctx wrapper, the inline cache would be as effective
as using
thunks for the shared case.

Rodrigo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20080926/8cd15f49/attachment.html