[Mono-dev] New profiler (work in progress)

Fri Oct 12 12:57:27 EDT 2007

On 10/11/07 Massimiliano Mantione wrote:
> Motivations:
> 
> The main issues the current profiler has:
[...]
> - Since it stores everything in memory, its memory usage grows
>   with time.

Memory usage is not an issue in the old profiler, it uses very little
memory, likely less than this new profiler will.

> Output file format
> 
> We'll describe each block as a sequence of fields, where
> three data types are allowed:
> - BYTE (used only for block codes),
> - INT, and
> - STRING.
> We write strings as null terminated ASCII strings.

Should be 0 terminated utf8 strings.

> We store integers one byte at a time, seven bits per byte,
> starting from the least significant bits.
> We put the eight bit in the byte to 1 when we write the
> last byte.

There must be also a int type that is always output as 4 bytes in
little endian.

> Here is the description of the various blocks:

Note that each block must have a size field (using a 4 byte int)
along with the block code. Also, since blocks are not frequent,
use at least 2 bytes for the block code, for future expansion.
The size will include the size of the header plus any possible data
belonging to the block following the header. This way readers can easily
skip blocks they don't care about or don't recognize. It will also allow
readers to easily check if the profiled program has completed writing
a block by simply comparing the size and they can load the entire block
in a single read, which allows simpler and faster parsing of the data.

> Event block, for common "per thread" events:
> BYTE: block code = 6.
> INTEGER: start counter value.
> INTEGER: thread id.
> INTEGER: base counter value. # All counter values in events are deltas
> # A sequence of the following (events):
> BYTE: event code.
> INTEGER: main event value.
> INTEGER: secondary event value.

We need to optimize the space used for 3 very common events:
method entry and exit and object allocation, so I suggest the following
encoding.
The 2 low bits of the first byte will encode a type:
	0 method enter
	1 method exit
	2 object allocation
	3 further event type in the upper 6 bits

For the first 3 types, the 6 upper bits will be part of the main event
value, giving 6+7= 13 bits, so up to 8191 methods/classes can be encoded in 2
bytes, while that would require 3 with your encoding (and just 127 could
be stored in 2 bytes, which is too low).

All the other event types can fit into the 64 different values that
remain in the first byte. One of these values should be reserved for the
event type LAST_METHOD_EXITED, which won't emit the method id,
saving more space.

For allocations it is likely worth it to use one of the upper bits in
the first byte for variable-sized vs fixed-size allocations. The first
type of event will have the class id followed by the size. The second
type will omit the size (which will need to be stored in the ID mapping
table or somewhere else). This will save further space in the files.

> # And finally, a 0, which is the end because no event code can be 0.
> BYTE: 0.

It's better not to waste a value here: we'll have already the size
stored in the block header.

Thanks!

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better