[Mono-dev] tuning sgen performance & bug
zeno490 at gmail.com
Mon Sep 3 15:10:13 UTC 2012
You might want to consider optimizing your memory usage. In time
series data, usually it is best not to keep objects in those arrays
but rather structs. The time serie itself can be an object but keeping
large arrays of small objects is very bad performance wise: you waste
memory due to the internal overhead of an allocation/object and the
CPU cache will have a very hard time keeping up (because every small
object access will cause a cache miss). Also obviously the high
allocation rates will cause many GC collections to happen as you touch
those. This also will speed up the GC collection time since far fewer
references need to be traversed/touched.
With arrays of structs, the allocations are likely to be large and go
into a separate heap. If they aren't that large (because maybe your
time series only keep a small set of data), they the amount of
allocations is still dramatically reduced and fixed.
In mine, my time serie keeps arrays of structs (typed on the time
serie) where those arrays are managed as circular buffers. Cache usage
is optimal and the number of allocations is reduced. Furthermore,
since the size is fixed, once it is allocated, no allocation happens
even if I go through GBs of data.
Keep in mind nothing I said applies to you if your objects in those
time series are very large and the copy overhead would be too great
for you. However, I would consider 88 bytes to be small considering
the potential memory/cpu savings here.
On Fri, Aug 31, 2012 at 6:26 PM, Jonathan Shore
<jonathan.shore at gmail.com> wrote:
> With this specific application, (which is single threaded), I have a
> "volatile" working set of ~2GB . By volatile I mean that these are not
> application lifetime objects, rather will be disposed at some point during
> More specifically, I read 1.6TB of data incrementally into 1600 timeseries
> (basically an array of event objects). Each timeseries only holds a window
> of data (in my case half with 25K items and half with 5K items). Once
> each timeseries has overrun by say 1024 elements, the 1024 oldest elements
> are shifted off, for GC.
> So the pattern is that there are always 2GB of referenced objects, and
> periodically 1600 x 1024 old objects to be disposed of. Due to the large
> sizes, it would seem that these older objects get relegated to the main
> heap. This then requires a much more expensive GC (presumably).
> If I understood the sgen algorithm correctly, no matter what the size of the
> nursery (unless was 1.6TB), my working set is going to land in the main heap
> with my object garbage pattern. I believe this is because if the nursery
> fills, any objects that are still referenced, regardless of age, will be
> moved to the main heap. Once GC completes, the nursery is empty (maybe
> except pinned objs)?
> My objects become garbage in a FIFO pattern and not something LIFO like.
> The garbage "pipeline" is 2GB large, so the nursery fails for this app.
> Assuming Boehm is my only choice, If I expand the series window or # of
> series I quickly run into the maximum heap problem with Boehm.
> On Aug 31, 2012, at 5:29 PM, Rodrigo Kumpera wrote:
> There are two situations that make sgen slower than boehm.
> The first is a non-generational workload. If your survivor rate is too high,
> a generational collector
> can't compete with single space one like boehm.
> To some extent this is defined by the size of the nursery, no?
> The second one is if you have too much of the old generation pointing to
> young objects causing minor collections
> to scan way too much memory to be profitable.
> The nursery size should usually be a not so small fraction of the total heap
> you expect. As a good guess you can use
> 1/10 - 1/20.
> Are you expecting to have a heap of multiple Gigs? Because a 2Gb nursery
> will need at least 8Gb of major memory.
> About your crash. I just noticed a very silly thing, we have never ever
> tried sgen with huge nurseries because there's a
> 128Mb implicit limit due to some internal sizes.
> Jonathan, for such huge heaps, sgen will need the parallel collector to
> compete with boehm on linux, which is a not
> very mature piece of code both in stability and performance.
> On Fri, Aug 31, 2012 at 2:03 PM, Jonathan Shore <jonathan.shore at gmail.com>
>> sgen is now working for me (thanks to a subtle bug fix for
>> thread-local-storage by Zoltan). However, for one application, sgen is 25%
>> slower than the same with the boehm collector. I am processing some GBs of
>> timeseries data, though only evaluating a window at a time. As the window
>> reaches some size, older objects in the timeseries are dereferenced. The
>> object size is 88bytes, but generate many millions across the course of a
>> I suspect that the nursery is too small, so that the objects I want to
>> collect are now in the main heap. Towards that end I wanted to extend the
>> nursery, and attempted this:
>> export MONO_GC_PARAMS="nursery-size=2g"
>> This causes mono to crash immediately, with:
>> * Assertion at sgen-gc.c:1206, condition `idx < section->num_scan_start'
>> not met
>> (this is on linux with the latest code on master, roughly 2.11.3+)
>> I took a look at the code, but requires too much context for me to
>> understand the real cause of the issue. I am guessing that there is some
>> assumption re: the size of the nursery, block size, etc.
>> Finally, I am interested in trying the "copying collector" as discussed in
>> this blog entry:
>> I'm wondering if will get some performance advantages with this approach,
>> whereas the nursery may be too small for my garbage working set.
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
More information about the Mono-devel-list