[Mono-devel-list] NPTL thread hang issue

Scott Mohekey scott.mohekey at telogis.com
Tue Sep 14 07:53:06 EDT 2004


Gonzalo Paniagua Javier wrote:

>On Mon, 2004-09-13 at 11:54, Scott Mohekey wrote:
>  
>
>>I've spent the last few days looking into the issue that has been 
>>reported here (http://bugs.ximian.com/show_bug.cgi?id=60576) and here 
>>(http://bugs.gentoo.org/show_bug.cgi?id=63734).
>>
>>This first code snippet exhibits the bug on a pure NPTL system:
>>
>>using System;
>>using System.Threading;
>>
>>class Test
>>{
>>        public static void Main( String[] args )
>>        {
>>                int i = 0;
>>                while( true ) {
>>                        Thread t = new Thread( new ThreadStart(Blah) );
>>                        t.Start();
>>                        i++;
>>                        Console.WriteLine( i+" threads" );
>>                }
>>        }
>>
>>        private static void Blah() {
>>                Console.WriteLine( "starting thread" );
>>        }
>>}
>>
>>Running this program inside gdb reveals that each of the threads created 
>>becomes a zombie as soon as it exits. Further investigation reveals that 
>>none of the mono thread cleanup code is being run (breaking on 
>>thread_cleanup() or handle_remove() gives no results) for any of the 
>>zombie threads. After the program has run for a certain length of time, 
>>the garbage collector is invoked, which forces a world stop of all 
>>threads. If the garbage collector is invoked when a zombie thread is 
>>present, mono_gc_stop_world() is called which iterates over the threads 
>>hashtable calling gc_stop_world() on each entry, which in turn calls 
>>SuspendThread() on the each. But because the thread cleanup code hasn't 
>>been run (in particular, handle_remove() which removes the thread handle 
>>from the hashtable), SuspendThread() gets called for threads which don't 
>>exist, or are zombies. The offending code appears to be:
>>
>>while (MONO_SEM_WAIT (&thread->suspend_sem) != 0 && errno == EINTR);
>>
>>
>>in _wapi_timed_thread_suspend(), which indirectly calls 
>>pthread_cond_wait() for a condition variable that is not going to be 
>>triggered, because the other thread is a zombie.
>>
>>As far as I can tell, mono_thread_manage runs it's 'join' loop once, at 
>>which point it waits for eternity inside WaitForMultipleObjectsEx() 
>>which is called from wait_for_tids(). Because of this, the 
>>thread_cleanup() call in wait_for_tids() is never reached, not for any 
>>of the threads.
>>
>>I'm going to continue investigating this. Any help would be greatly 
>>appreciated.
>>
>>    
>>
>
>Is that happening with CVS HEAD and not on 1.0? I removed some code to
>fix this bug:
>http://bugzilla.ximian.com/show_bug.cgi?id=65379
>
>and that might be the cause of yours if you're using CVS HEAD.
>
>-Gonzalo
>
>
>_______________________________________________
>Mono-devel-list mailing list
>Mono-devel-list at lists.ximian.com
>http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
>
>  
>
Yes, I have been using cvs HEAD. I switched back to version 1.0.1 and 
this problem (handle_remove() not being called) was not present anymore. 
However, the program still freezes during a garbage collection. Is it 
possible that a thread exits, and the garbage collector is called in 
another thread before handle_remove() is called for the exiting thread?

It appears not to be the case if the following output is anything to go 
by. I've built mono-1.0.1 with THREAD_DEBUG, THREAD_WAIT_DEBUG, and 
LIBGC_DEBUG defined in metadata/threads.c. I also modified the g_message 
calls in handle_store and handle_remove to be a bit more descriptive. 
This the output of the thread test case run inside gdb.

starting thread
** Message: : returning 0x8230700
** Message: : (1098329008) Done launching thread 0x8230700 (1100430256)
197 threads
** Message: : returning 0x80fae00
** Message: : joining thread handle 0xd7d, -1 ms
** Message: : (1100430256) Abort requested for 0x8230700 (1100430256)
** Message: : returning 0x8230700
** Message: : returning 0x8230700
** Message: : removing thread ID 1100430256
** Message: : returning (nil)
** Message: : join successful
** Message: : Trying to start a new thread: this (0x8230690) start (0x8241b18)
[Thread 1100430256 (zombie) exited]
[New Thread 1102531504 (LWP 1187)]
** Message: : Started thread ID 1102531504 (handle 0xd80)
** Message: : (1098329008) Launching thread 0x8230690 (1102531504)
** Message: : storing thread 0x8230690 ID 1102531504
** Message: : (1098329008) waiting for thread 0x8230690 (1102531504) to start
** Message: : (1102531504) Start wrapper
** Message: : returning 0x8230690
** Message: : (1102531504,985) Setting thread stack to 0x41b749ec
** Message: : (1102531504) Setting current_object_key to 0x8230690
** Message: : (1098329008) Done launching thread 0x8230690 (1102531504)
198 threads
** Message: : returning 0x80fae00
** Message: : joining thread handle 0xd80, -1 ms
** Message: : returning 0x8230690
starting thread
** Message: : returning 0x8230690
** Message: : (1102531504) Abort requested for 0x8230690 (1102531504)
** Message: : returning 0x8230690
** Message: : returning 0x8230690
** Message: : join successful
** Message: : Trying to start a new thread: this (0x8230620) start (0x8241a78)
** Message: : removing thread ID 1102531504
** Message: : returning (nil)
[Thread 1102531504 (zombie) exited]
[New Thread 1100430256 (LWP 1188)]
Initiating full world-stop collection 4 after 466576 allocd bytes


Everything seems to be ok here. However, if I break on 
mono_gc_stop_world, the breakpoint isn't hit.

Scott.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20040914/73821724/attachment.html 


More information about the Mono-devel-list mailing list