[Mono-devel-list] NPTL thread hang issue

Scott Mohekey scott.mohekey at telogis.com
Mon Sep 13 11:54:51 EDT 2004


I've spent the last few days looking into the issue that has been 
reported here (http://bugs.ximian.com/show_bug.cgi?id=60576) and here 
(http://bugs.gentoo.org/show_bug.cgi?id=63734).

This first code snippet exhibits the bug on a pure NPTL system:

using System;
using System.Threading;

class Test
{
        public static void Main( String[] args )
        {
                int i = 0;
                while( true ) {
                        Thread t = new Thread( new ThreadStart(Blah) );
                        t.Start();
                        i++;
                        Console.WriteLine( i+" threads" );
                }
        }

        private static void Blah() {
                Console.WriteLine( "starting thread" );
        }
}

Running this program inside gdb reveals that each of the threads created 
becomes a zombie as soon as it exits. Further investigation reveals that 
none of the mono thread cleanup code is being run (breaking on 
thread_cleanup() or handle_remove() gives no results) for any of the 
zombie threads. After the program has run for a certain length of time, 
the garbage collector is invoked, which forces a world stop of all 
threads. If the garbage collector is invoked when a zombie thread is 
present, mono_gc_stop_world() is called which iterates over the threads 
hashtable calling gc_stop_world() on each entry, which in turn calls 
SuspendThread() on the each. But because the thread cleanup code hasn't 
been run (in particular, handle_remove() which removes the thread handle 
from the hashtable), SuspendThread() gets called for threads which don't 
exist, or are zombies. The offending code appears to be:

while (MONO_SEM_WAIT (&thread->suspend_sem) != 0 && errno == EINTR);


in _wapi_timed_thread_suspend(), which indirectly calls 
pthread_cond_wait() for a condition variable that is not going to be 
triggered, because the other thread is a zombie.

As far as I can tell, mono_thread_manage runs it's 'join' loop once, at 
which point it waits for eternity inside WaitForMultipleObjectsEx() 
which is called from wait_for_tids(). Because of this, the 
thread_cleanup() call in wait_for_tids() is never reached, not for any 
of the threads.

I'm going to continue investigating this. Any help would be greatly 
appreciated.

Scott.



More information about the Mono-devel-list mailing list