[Mono-dev] Generic sharing: Good news, bad news, how to win big

Mon Apr 14 10:13:11 EDT 2008

<snip>

>
> It seems that only a small number of RGCTXs is ever used, and the ones
> that are used could make do with 2 to 5 slots on average.  For FSharp,
> for instance, if we used an optimal allocation strategy (i.e. only
> allocate RGCTXs if needed, only allocate as many slots as needed and
> don't use any meta-data) we could get the 600k down to about 6k, which
> would be more than acceptable.
>
> There is a problem with allocating a RGCTX lazily, though.  Allocating
> a RGCTX requires some information.  Specifically, it requires the
> MonoVTable* for the class for which to allocate the RGCTX.  This is
> not an issue for non-static methods because the vtable is accessible
> through the "this" argument.  Static methods don't have that argument,
> though.  In fact, the RGCTX must contain a pointer to the vtable
> because we need it in static methods for some purposes, like exception
> handling.  So I think we'll need to switch to passing not the RGCTX,
> but the vtable to static methods, since the latter contains a pointer
> to the RGCTX anyway, which, for lazy allocation, could be NULL.  This
> would give us the additional advantage of not having to store the
> vtable in the RGCTX, saving us 4/8 bytes per RGCTX.
>
> As for how to arrange the RGCTX itself I have the following proposal:
> Let's get rid of all the superclass and type argument type information
> and just concentrate on the extensible part.  I'd like to implement
> two different kinds of RGCTX - small ones and large ones.  Small
> RGCTXs would have space for up to some constant number of slots,
> preferably 3 or 7.  The layout would look like this on a 32-bit
> system:
>
> struct small_rgctx {
>  guint8 size;
>  guint8 slot_numbers[3];
>  gpointer slots[0];
> }
>
> size would be the number of slots in the RGCTX.  slot_numbers would
> identify the type information stored in the slots array.
>
> To fetch an item out of the RGCTX we'd have to search through the
> slot_numbers array to find the required type information and then
> fetch the corresponding pointer from the slots array.  If the type
> information is not found we'd jump into unmanaged code and extend the
> RGCTX by allocating a new one with space for one more slot or, if the
> maximum number of slots is reached, upgrading to a large RGCTX.
> Another reason for using a large RGCTX would be if the values of the
> slot numbers exceeded 255, which should be very unlikely, though.
>
> We could keep free lists per domain of the RGCTXs we've thrown away.
> It might be necessary to use hazard pointers for access to the RGCTX
> so as to avoid reusing memory of a thrown-away RGCTX if another thread
> is still accessing it.
>

Isn't possible or better to do RGCTX free'ing at GC time? It would be
simpler, the hardest
part would be guarding against parking threads inside RGCTX related code,
which can be done with
some link time trickery and a lit of changes on stack scanning code.

>
> A large RGCTX could be some kind of small hash table.
>
> With such data structures we could also do another optimization.
> Right now every class that is either a generic class or is a direct or
> indirect subclass of a generic class has its own RGCTX.  Non-generic
> classes can never give rise to the need for another piece of type
> information in a RGCTX, so they could share the RGCTXs of their
> generic superclasses.
>
> Does this sound like a sensible plan?  Am I missing something crucial?
>  Does anybody have any suggestions or better ideas?
>
> Mark

These are great news Mark.

In Madrid we discussed about using segfaults to trigger lazy filling of
rgctx, have you thought about using that?

I remember that a major issue with the rgctx layout was that you need to
coordinate slot filling between a type and all it's parents to avoid
collisions. How would that work on your proposed schema? How about using a
pointer to the parent context? This would eliminate the whole issue, could
save some bytes for parents with fat rgctx and make even less likely to have
a large rgctx.

One more thing, your stats miss something I guess it's important, how many
generic sharing failures each test suite has? This is important to see how
much further this could be improved if constrained and mixed
reference/valuetype sharing gets done.

It might too early to think about this, but do you have some speed results
for these tests?

Cheers,
Rodrigo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20080414/0b8a87dc/attachment.html