[Mono-dev] Question: goals for the allocation, collection and heap profiler

Massimiliano Mantione massi at ximian.com
Tue May 13 06:49:54 EDT 2008


Hello,
I have an issue in the "heap-desc" functionality of the new profiler
and wanted to have some feedback on its goals before reworking it.

The issue is the following: in the logging profiler there are two
more or less separated areas that profile memory usage:
- Allocation profiling (that records an event for every allocation).
- Heap profiling (that analyzes the whole heap at each collection).

Allocation profiling only knows about *allocations*, and nothing
about freed objects.
So, if the user chooses the "a" option, he will get a summary of the
total number of bytes allocated for each class.
If the "enter-leave" option is active, the profiler will also be
able to attribute each allocation to the calling method, but nothing
more.

On the other hand, heap profiling will scan the whole heap at each
collection, and report data on the objects it finds.
There are two levels of detail for heap profiling in the logging
profiler:
- "heap-shot", which dumps the whole heap, object by object, and
  for each object also dumps all its references.
- "unreachable", which only reports the objects freed in the current
  collection.

The idea was that the info provided with the "unreachable" option
would be like the ones of the "heap-desc" tool.

The problem is that in the current implementation I gather the
"allocated objects" data from the events of the "a" option.
These events are in buffers that are different from the heap ones,
and are typically saved in the profiler log at different times,
so the decoder cannot properly correlate them to the collection
it is working on.
I tried to dump the buffers at the right time, and it "mostly"
works, but there are still events that are not processed early
enough and spoil all the data.
The "heap-desc" tool did not have this problem because it acquired
a lock at *every* allocation, serializing all events, but this is
something we'll not do in the logging profiler, where frequent
events are processed in a lock free way.

One easy solution would be to ignore the allocation events for the
heap profiling, and just use the heap data itself.
This would mean that at each collection we should write some data
about the live objects, and not only about the free ones.

I'd like to have feedback about what we should actually write on
the log file (of course a full heap snapshot would be just too
much: the data should be summarized in some way).
IMO, the best way would be to write, for each class:
- number of live instances,
- total bytes taken by the live instances,
- number of instances freed in this collection,
- total bytes freed in this collection.
The profile will need to do some internal bookkeeping to do this,
but it's cheap and easy, and the decoder will have a very easy
time showing the data because it's already aggregated.

If the user wants more details (like the size of each instance),
he should ask for a full snapshot.

At this point I would break the "heap data" block in the file in
two: one for this aggregated data, and one for full snapshots.
Now the info about live and freed objects are intermixed in the
same block, but separating them would make the decoder simpler and
faster because skipping snapshots would be easier.

Does this seem reasonable?

Another (minor) question.
The allocation profiler can attribute the object creation to the
calling method.
The default profiler can "skip" wrappers because it checks
"MonoMethod->wrapper_type", but in the new profiler this info
gets lost (it is the decoder that correlates allocations with
callers...).
Would it make sense to write some info about the method in the
log file together with the mapping info?
And if yes, which info would make sense writing (besides being
a wrapper)?
I'd like to get these right, because at some point the file
format should be frozen, and then adding more info will be a
bit annoying.
Otherwise I can simply recognize wrappers by their name, but
would it be robust enough?

Thanks!
  Massi




More information about the Mono-devel-list mailing list