[Mono-devel-list] [PATCH] GC Profiling Interfaces

Paolo Molaro lupus at ximian.com
Sun Jan 23 06:16:44 EST 2005

On 01/22/05 Ben Maurer wrote:
> This is the patch that is being used for the heap profiler. I would like
> to get this into the main repo, so that people don't have to patch the
> tree.

There are several problems with the patch.
First, it's not 64 bit clean. You need to use at least size_t
for heap sizes, but maybe it's better to just use a 64 bit int
Second, you broke the build for the non-internal libgc builds
and non-libgc builds.
More below.

>       * Get a call back during every gc. During the call back:
>               * The world is stopped
>               * Mark bits are set
>               * Objects that do not have the mark bit set are still
>                 intact (ie, they have not bee been cleared
>         The goal here is to allow the profiler to do a quick heap walk,
>         so that it can see what is still alive. Paolo before commented
>         that it would be better to make this a generalized heap walk.
>         However, this ends up being less useful for my profiler -- I
>         need to store some extra data for each object (the backtrace of
>         who allocated it). So, if I did a generalized heap walk, I would
>         need to store this information in a hashtable. Also, the heap
>         walk would tell me who *survived* each gc. But what I want to
>         know is who *died* in each gc. So I have to do even more work.
>         This just ends up being more useful.
>       * Test the mark bit for any object -- only useful during the above
>         callback.

The interface is wrong, IMHO. First, mono_profiler_mark_set() is misnamed:
it should be mono_object_is_alive () and it should not be included in
the profiler interface. But even with these changes, I'm not sure we
can support that interface and the way you use it. Think for example a GC
that sets the mark bits in the least significant bits of the object header:
the function can be implemented, but you can't use the object directly.
I don't want to export assumptions on a GC implementation in a public
As for knowing that an object died vs knowing which objects are alive, it's
a simple difference which you could do when post processing, too, if needed.
Either way, we may consider exporting such an ugly interface only if
the overhead with the heap walk is significant.

>       * Get a callback each time the heap is resized

This is ok, except the 64 bit brokeness and the name: it doesn't need
to have _profile_ in the name.

>       * Query the (upper bound on) the number of live bytes and the size
>         of the GC heap at any time.

This is done the wrong way. See below.

> Index: libgc/include/gc.h
> ===================================================================
> --- libgc/include/gc.h	(revision 39325)
> +++ libgc/include/gc.h	(working copy)
> @@ -91,7 +91,22 @@
>  			/* If it returns, it must return 0 or a valid	*/
>  			/* pointer to a previously allocated heap 	*/
>  			/* object.					*/
> +			
> +GC_API GC_PTR (*GC_profile_marks_set) GC_PROTO((int col_num));
> +			/* Invoked on every collection. At this time mark
> +			 * bits are set. A profiler would use this to do 
> +			 * a heap profile: so it can see what objects are
> +			 * alive at a given time.
> +			 */
> +			 
> +GC_API GC_PTR (*GC_profile_heap_resize) GC_PROTO((int new_size));
> +			/* Invoked when the heap grows                  */
> +/* Slow/general mark bit manipulation: */
> +GC_API int GC_is_marked GC_PROTO((char * p));
> +GC_API void GC_clear_mark_bit GC_PROTO((char * p));
> +GC_API void GC_set_mark_bit GC_PROTO((char * p));

Why do you export GC_clear_mark_bit and GC_set_mark_bit even if they are not used.
The fact that they are in the gc_priv.h header should be a strong enough
hint that care should be applied when exporting them.

> Index: mono/metadata/profiler.c
> ===================================================================
> --- mono/metadata/profiler.c	(revision 39325)
> +++ mono/metadata/profiler.c	(working copy)
> @@ -12,6 +12,7 @@
>  #include <execinfo.h>
>  #endif
> +#include <mono/os/gc_wrapper.h>

It's not acceptable to depend on the guts of the GC in this file.
Don't use any Boehm GC specific call here, we need to provide an API,
not hack the code until it seems to work.

> +void
> +mono_profiler_gc_get_heap_stats (int* arena_size, int* live_bytes)
> +{
> +	if (arena_size)
> +		*arena_size = GC_get_heap_size ();
> +	
> +	if (live_bytes)
> +		*live_bytes = GC_get_heap_size () - GC_get_free_bytes ();
> +}

arena_size is not needed, since you have already the heap_resized callback.
Also, there is no need to have this callback to get the live data size:
we have already a function in metadata/gc.c that does
	GC_get_heap_size () - GC_get_free_bytes ()
to calculate it, so we should simply export two non-GC specific functions:

	void   mono_gc_collect          (int generation_num);
	gint64 mono_gc_get_total_memory (void);

Note that you can use mono_gc_get_total_memory() when you want and your code
won't depend on the GC implementation details.

>  MonoProfileCoverageInfo* 
> Index: mono/metadata/profiler.h
> ===================================================================
> --- mono/metadata/profiler.h	(revision 39325)
> +++ mono/metadata/profiler.h	(working copy)
> @@ -64,6 +64,8 @@
>  typedef void (*MonoProfileThreadFunc)     (MonoProfiler *prof, guint32 tid);
>  typedef void (*MonoProfileAllocFunc)      (MonoProfiler *prof, MonoObject *obj, MonoClass *klass);
>  typedef void (*MonoProfileStatFunc)       (MonoProfiler *prof, guchar *ip, void *context);
> +typedef void (*MonoProfileGCFunc)       (MonoProfiler *prof, int gc_num);

This interface is wrong: it should provide GC events such as GC start
and GC end and it should include the generation number.

enum {
	/* maybe also add suspend and resume events */

typedef void (*MonoProfileGCFunc)         (MonoProfiler *prof, int event_type, int gen_num);

Of course for the current GC the generation number will be always 0, but
the interface will work with the new GC. Note also that the generation number
is not needed, since you can easily keep your counter when the MONO_GC_EVENT_START
event is received.

> +typedef void (*MonoProfileGCHeapResizeFunc)       (MonoProfiler *prof, int new_size);

Note again it's not 64 bit clean.

> +/* data gathering */
> +
> +gboolean mono_profiler_mark_set (MonoObject* o);
> +void mono_profiler_gc_get_heap_stats (int* arena_size, int* live_bytes);

These have nothing to do with the profiler, so they should not be here.


lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better

More information about the Mono-devel-list mailing list