[Mono-dev] High threadpool CPU usage
Chris Swiedler
cswiedler at trionworlds.com
Fri Jun 3 17:11:00 UTC 2016
Well, looking at some pstack+top output, I’m not sure it’s GC anymore. I ran pstack and top as close to simultaneously as I could, and the high-cpu threads were just in pthread_cond_timedwait. Is it possible we get into a scenario where mono_cond_timedwait_ms is sleeping with a timeout_ms that's extremely short? Is there an easy way to force the compiler (#pragma or something?) to not optimize out the timeout_ms value? I'd rather not build all of mono with optimizations off, but I could disable optimizations for a few files.
chris
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17369 hydrogen 20 0 6491744 157928 17216 S 53.3 0.1 0:05.08 Threadpool work
17385 hydrogen 20 0 6491744 157928 17216 S 46.7 0.1 0:09.38 Threadpool work
Thread 8 (Thread 0x7f52c65fc700 (LWP 17369)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x0000000000618817 in mono_cond_timedwait_ms (cond=cond at entry=0x7f52c65fbdc0, mutex=0x2887808, timeout_ms=<optimized out>) at mono-mutex.c:181
#2 0x0000000000586f28 in worker_park () at threadpool-ms.c:509
Thread 4 (Thread 0x7f528a7fe700 (LWP 17385)):
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x0000000000618817 in mono_cond_timedwait_ms (cond=cond at entry=0x7f528a7fddc0, mutex=0x2887808, timeout_ms=<optimized out>) at mono-mutex.c:181
#2 0x0000000000586f28 in worker_park () at threadpool-ms.c:509
From: Rodrigo Kumpera [mailto:kumpera at gmail.com]
Sent: Friday, May 27, 2016 5:07 PM
To: Chris Swiedler <cswiedler at trionworlds.com>
Cc: Ludovic Henry <ludovic at xamarin.com>; mono-devel-list <mono-devel-list at lists.ximian.com>
Subject: Re: [Mono-dev] High threadpool CPU usage
Chris,
Can you at least enable GC logging? That will tell you how frequently and for how long the collector is running.
On Fri, May 27, 2016 at 11:04 AM, Chris Swiedler <cswiedler at trionworlds.com> wrote:
Thanks. I’ll try to use the profiler, but this problem doesn’t happen at startup, and running the application under profiling for eight hours is probably infeasible.
I see that at one point mono support signal-based enabling/disabling of the profiler, like how SIGUSR2 toggles the trace mechanism on and off. Is that sort of functionality ever going to be re-added? Currently there’s the heapshot option via the TCP listener, perhaps there could be enable/disable commands via that mechanism?
chris
From: Ludovic Henry [mailto:ludovic at xamarin.com]
Sent: Friday, May 27, 2016 1:27 AM
To: Chris Swiedler <cswiedler at trionworlds.com>; mono-devel-list <mono-devel-list at lists.ximian.com>
Subject: Re: [Mono-dev] High threadpool CPU usage
Hi Chris,
The signal handler you are seeing is the GC signalling every thread to suspend them. So yes you are right, that's the sgen collector stopping the world.
To track down the large number of allocation, I would encourage you to try the log profiler (documentation: http://www.mono-project.com/docs/debug+profile/profile/profiler/ ). That should give you some more insights in where the allocations are coming from.
Thank you,
Ludovic
On Thu, May 26, 2016 at 8:20 PM Chris Swiedler <cswiedler at trionworlds.com> wrote:
Thanks, Ludovic. I’m using mono-4.2.3. The ‘massive amounts of GC’ idea makes sense based on what I’m seeing. When I run pstack, I get a number of threadpool threads in stacks with:
#0 0x00007fdff1c6a952 in do_sigsuspend (set=0x954220 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:32
#1 __GI___sigsuspend (set=set at entry=0x954220 <suspend_signal_mask>) at ../sysdeps/unix/sysv/linux/sigsuspend.c:46
#2 0x00000000005c7534 in suspend_thread (info=0x7fdf480008c0, context=context at entry=0x7fde997ea1c0) at sgen-os-posix.c:126
#3 0x00000000005c771f in suspend_handler (_dummy=<optimized out>, _info=<optimized out>, context=0x7fde997ea1c0) at sgen-os-posix.c:153
#4 <signal handler called>
I thought this was related to GDB / pstack attaching, but it’s actually the suspend handling for the sgen collector, right?
Is a thread suspending itself CPU-intensive? I see lots of threads with high CPU at any point, which seems to indicate that this wouldn’t just be the CPU usage of the collector thread itself.
Do you have any suggestions for how to track down the large numbers of allocations? This isn’t very easy to reproduce, but now that I know what to look for, I might be able to get it to happen in a test environment.
chris
From: Ludovic Henry [mailto:ludovic at xamarin.com]
Sent: Thursday, May 26, 2016 8:43 AM
To: Chris Swiedler <cswiedler at trionworlds.com>; mono-devel-list <mono-devel-list at lists.ximian.com>
Subject: Re: [Mono-dev] High threadpool CPU usage
Hi Chris,
The first stacktrace you are observing is for threadpool thread parking. We use this technique for threads that are currently not doing anything, to keep them around for a little while (5-60 seconds) so if we have burst of usage, we do not end up destroying and creating threads all the time.
The second stacktrace you are observing is, as you noted, when performing a garbage collection, when the GC thread is suspending all the running threads. So if you are witnessing this trace very often, it means a thread is allocating memory very quickly triggering GC collection very often.
So I would need more information to tell you exactly why it would use 100% CPU. Also which version of Mono are you running?
Thank you very much,
Ludovic
On Wed, May 25, 2016 at 8:30 PM Chris Swiedler <cswiedler at trionworlds.com> wrote:
We have a server app which is periodically going into a mode where all threadpool threads start running at very high CPU. I've run pstack when it's in this mode, and every time I do it, nearly all the threadpool threads have this stack:
#0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1 0x0000000000618817 in mono_cond_timedwait_ms (cond=cond at entry=0x7fe7ee1fddc0, mutex=0x241eb78, timeout_ms=<optimized out>) at mono-mutex.c:181
#2 0x0000000000586f28 in worker_park () at threadpool-ms.c:509
#3 worker_thread (data=<optimized out>) at threadpool-ms.c:607
#4 0x00000000005841e9 in start_wrapper_internal (data=<optimized out>) at threads.c:725
#5 start_wrapper (data=<optimized out>) at threads.c:772
#6 0x0000000000621026 in inner_start_thread (arg=0x7fe831df4650) at mono-threads-posix.c:97
#7 0x00007fe88a55edf5 in start_thread (arg=0x7fe7ee1fe700) at pthread_create.c:308
#8 0x00007fe88a28c1ad in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Usually one thread will have a stack like this:
#0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1 0x000000000061aa37 in mono_sem_wait (sem=0x9542c0 <suspend_ack_semaphore>, alertable=alertable at entry=0) at mono-semaphore.c:107
#2 0x00000000005c77bd in sgen_wait_for_suspend_ack (count=count at entry=18) at sgen-os-posix.c:188
#3 0x00000000005c78f9 in sgen_thread_handshake (suspend=suspend at entry=1) at sgen-os-posix.c:224
#4 0x00000000005c7e92 in sgen_client_stop_world (generation=generation at entry=0) at sgen-stw.c:234
#5 0x00000000005e6aca in sgen_stop_world (generation=0) at sgen-gc.c:3389
#6 0x00000000005e6c29 in sgen_perform_collection (requested_size=4096, generation_to_collect=0, reason=0x6d9595 "Nursery full", wait_to_finish=0) at sgen-gc.c:2322#7 0x00000000005da62a in sgen_alloc_obj_nolock (vtable=vtable at entry=0x7fe85c0343c0, size=size at entry=128) at sgen-alloc.c:291
#8 0x00000000005da913 in sgen_alloc_obj (vtable=vtable at entry=0x7fe85c0343c0, size=128) at sgen-alloc.c:457
#9 0x00000000005c86e9 in mono_gc_alloc_obj (vtable=vtable at entry=0x7fe85c0343c0, size=<optimized out>) at sgen-mono.c:936
#10 0x00000000005a8b54 in mono_object_new_alloc_specific (vtable=vtable at entry=0x7fe85c0343c0) at object.c:4385
#11 0x00000000005a8bf0 in mono_object_new_specific (vtable=0x7fe85c0343c0) at object.c:4379
#12 0x00000000005a8c8c in mono_object_new (domain=domain at entry=0x1ded1c0, klass=<optimized out>) at object.c:4318
#13 0x00000000005ac1c9 in mono_async_result_new (domain=domain at entry=0x1ded1c0, handle=handle at entry=0x0, state=0x0, data=data at entry=0x0, object_data=object_data at entry=0x7fe8838af020) at object.c:5768
#14 0x00000000005887ff in mono_threadpool_ms_begin_invoke (domain=0x1ded1c0, target=target at entry=0x7fe8838aee38, method=method at entry=0x2963d28, params=params at entry=0x7fe7ed9f8f10) at threadpool-ms.c:1300
#15 0x000000000054b547 in mono_delegate_begin_invoke (delegate=0x7fe8838aee38, params=0x7fe7ed9f8f10) at marshal.c:2111
#16 0x00000000416d29d8 in ?? ()
#17 0x0000000000000000 in ?? ()
Just from reading the first stack, it doesn't look like mono_cond_timedwait_ms would spin, at least as long as the timeout_ms wasn't 0. For the second stack, I don't know whether that's a normal garbage collection pass or (since we see it frequently) a sign that garbage collection is happening too often.
Can anyone give me some pointers for where to dig more deeply?
Thanks,
chris
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list
More information about the Mono-devel-list
mailing list