[Mono-bugs] [Bug 60576][Maj] Changed - Bad interaction - Mono, Gentoo (nptl) & Muine

bugzilla-daemon@bugzilla.ximian.com bugzilla-daemon@bugzilla.ximian.com
Fri, 24 Sep 2004 08:30:33 -0400 (EDT)


Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.

Changed by ed@catmur.co.uk.

http://bugzilla.ximian.com/show_bug.cgi?id=60576

--- shadow/60576	2004-09-24 08:06:18.000000000 -0400
+++ shadow/60576.tmp.384	2004-09-24 08:30:33.000000000 -0400
@@ -633,6 +633,46 @@
 other was originally created by a ximianite, which should lend some
 extra credulity to it (it seems we need that).
 
 ------- Additional Comments From dick@ximian.com  2004-09-24 08:06 -------
 Don't mark it a dup just yet.
 
+
+------- Additional Comments From ed@catmur.co.uk  2004-09-24 08:30 -------
+More info:
+
+GC_thread_exit_proc() (libgc/pthread_support.c) is not getting called
+when threads exit. At all. As a result, although the thread in
+question has exited, it remains in the libgc thread table (GC_threads[]).
+
+If that were all then we'd be OK as the thread (which is dead) is not
+signalable so pthread_kill() returns ESRCH and the n_live_threads in
+GC_suspend_all() (libgc/pthread_stop_world.c) is decremented so the
+number of semaphores expected equals the number of threads alive to
+post them.
+
+However, what can happen is that pthread reuses that thread ID, and so
+the pthread_kill() in GC_suspend_all() succeeds multiple times on the
+same thread; however the thread only handles one signal and so the
+semaphore is posted fewer times than expected, leading to semaphore lock.
+
+This suggests a hack:
+
+--- libgc/pthread_support.c     2004-09-06 19:06:56.000000000 +0100
++++ libgc/pthread_support.c     2004-09-24 12:48:23.621325151 +0100
+@@ -603,6 +603,11 @@ GC_thread GC_new_thread(pthread_t id)
+     GC_thread result;
+     static GC_bool first_thread_used = FALSE;
+
++    for (result = GC_threads[hv]; result; result = result->next)
++           if (pthread_equal(result->id, id)) {
++                   WARN("Thread 0x%lx already in table!", id);
++                   return result;
++           }
+     if (!first_thread_used) {
+        result = &first_thread;
+        first_thread_used = TRUE;
+
+This indeed makes the test case run OK. So, now the question is why
+GC_thread_exit_proc is not getting called (it is set as a
+pthread_cleanup_push function) even though the (nptl) pthread library
+thinks the thread has exited.