[Mono-dev] High threadpool CPU usage

Wed May 25 18:29:56 UTC 2016

We have a server app which is periodically going into a mode where all threadpool threads start running at very high CPU. I've run pstack when it's in this mode, and every time I do it, nearly all the threadpool threads have this stack:

#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000618817 in mono_cond_timedwait_ms (cond=cond at entry=0x7fe7ee1fddc0, mutex=0x241eb78, timeout_ms=<optimized out>) at mono-mutex.c:181
#2  0x0000000000586f28 in worker_park () at threadpool-ms.c:509
#3  worker_thread (data=<optimized out>) at threadpool-ms.c:607
#4  0x00000000005841e9 in start_wrapper_internal (data=<optimized out>) at threads.c:725
#5  start_wrapper (data=<optimized out>) at threads.c:772
#6  0x0000000000621026 in inner_start_thread (arg=0x7fe831df4650) at mono-threads-posix.c:97
#7  0x00007fe88a55edf5 in start_thread (arg=0x7fe7ee1fe700) at pthread_create.c:308
#8  0x00007fe88a28c1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Usually one thread will have a stack like this:

#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x000000000061aa37 in mono_sem_wait (sem=0x9542c0 <suspend_ack_semaphore>, alertable=alertable at entry=0) at mono-semaphore.c:107
#2  0x00000000005c77bd in sgen_wait_for_suspend_ack (count=count at entry=18) at sgen-os-posix.c:188
#3  0x00000000005c78f9 in sgen_thread_handshake (suspend=suspend at entry=1) at sgen-os-posix.c:224
#4  0x00000000005c7e92 in sgen_client_stop_world (generation=generation at entry=0) at sgen-stw.c:234
#5  0x00000000005e6aca in sgen_stop_world (generation=0) at sgen-gc.c:3389
#6  0x00000000005e6c29 in sgen_perform_collection (requested_size=4096, generation_to_collect=0, reason=0x6d9595 "Nursery full", wait_to_finish=0) at sgen-gc.c:2322#7  0x00000000005da62a in sgen_alloc_obj_nolock (vtable=vtable at entry=0x7fe85c0343c0, size=size at entry=128) at sgen-alloc.c:291
#8  0x00000000005da913 in sgen_alloc_obj (vtable=vtable at entry=0x7fe85c0343c0, size=128) at sgen-alloc.c:457
#9  0x00000000005c86e9 in mono_gc_alloc_obj (vtable=vtable at entry=0x7fe85c0343c0, size=<optimized out>) at sgen-mono.c:936
#10 0x00000000005a8b54 in mono_object_new_alloc_specific (vtable=vtable at entry=0x7fe85c0343c0) at object.c:4385
#11 0x00000000005a8bf0 in mono_object_new_specific (vtable=0x7fe85c0343c0) at object.c:4379
#12 0x00000000005a8c8c in mono_object_new (domain=domain at entry=0x1ded1c0, klass=<optimized out>) at object.c:4318
#13 0x00000000005ac1c9 in mono_async_result_new (domain=domain at entry=0x1ded1c0, handle=handle at entry=0x0, state=0x0, data=data at entry=0x0, object_data=object_data at entry=0x7fe8838af020) at object.c:5768
#14 0x00000000005887ff in mono_threadpool_ms_begin_invoke (domain=0x1ded1c0, target=target at entry=0x7fe8838aee38, method=method at entry=0x2963d28, params=params at entry=0x7fe7ed9f8f10) at threadpool-ms.c:1300
#15 0x000000000054b547 in mono_delegate_begin_invoke (delegate=0x7fe8838aee38, params=0x7fe7ed9f8f10) at marshal.c:2111
#16 0x00000000416d29d8 in ?? ()
#17 0x0000000000000000 in ?? ()

Just from reading the first stack, it doesn't look like mono_cond_timedwait_ms would spin, at least as long as the timeout_ms wasn't 0. For the second stack, I don't know whether that's a normal garbage collection pass or (since we see it frequently) a sign that garbage collection is happening too often.

Can anyone give me some pointers for where to dig more deeply?

Thanks,
chris