[Mono-devel-list] NPTL thread hang issue

Mon Sep 13 18:23:27 EDT 2004

On Mon, 2004-09-13 at 11:54, Scott Mohekey wrote:
> I've spent the last few days looking into the issue that has been 
> reported here (http://bugs.ximian.com/show_bug.cgi?id=60576) and here 
> (http://bugs.gentoo.org/show_bug.cgi?id=63734).
> 
> This first code snippet exhibits the bug on a pure NPTL system:
> 
> using System;
> using System.Threading;
> 
> class Test
> {
>         public static void Main( String[] args )
>         {
>                 int i = 0;
>                 while( true ) {
>                         Thread t = new Thread( new ThreadStart(Blah) );
>                         t.Start();
>                         i++;
>                         Console.WriteLine( i+" threads" );
>                 }
>         }
> 
>         private static void Blah() {
>                 Console.WriteLine( "starting thread" );
>         }
> }
> 
> Running this program inside gdb reveals that each of the threads created 
> becomes a zombie as soon as it exits. Further investigation reveals that 
> none of the mono thread cleanup code is being run (breaking on 
> thread_cleanup() or handle_remove() gives no results) for any of the 
> zombie threads. After the program has run for a certain length of time, 
> the garbage collector is invoked, which forces a world stop of all 
> threads. If the garbage collector is invoked when a zombie thread is 
> present, mono_gc_stop_world() is called which iterates over the threads 
> hashtable calling gc_stop_world() on each entry, which in turn calls 
> SuspendThread() on the each. But because the thread cleanup code hasn't 
> been run (in particular, handle_remove() which removes the thread handle 
> from the hashtable), SuspendThread() gets called for threads which don't 
> exist, or are zombies. The offending code appears to be:
> 
> while (MONO_SEM_WAIT (&thread->suspend_sem) != 0 && errno == EINTR);
> 
> 
> in _wapi_timed_thread_suspend(), which indirectly calls 
> pthread_cond_wait() for a condition variable that is not going to be 
> triggered, because the other thread is a zombie.
> 
> As far as I can tell, mono_thread_manage runs it's 'join' loop once, at 
> which point it waits for eternity inside WaitForMultipleObjectsEx() 
> which is called from wait_for_tids(). Because of this, the 
> thread_cleanup() call in wait_for_tids() is never reached, not for any 
> of the threads.
> 
> I'm going to continue investigating this. Any help would be greatly 
> appreciated.
> 

Is that happening with CVS HEAD and not on 1.0? I removed some code to
fix this bug:
http://bugzilla.ximian.com/show_bug.cgi?id=65379

and that might be the cause of yours if you're using CVS HEAD.

-Gonzalo