[Mono-bugs] [Bug 60576][Maj] Changed - Bad interaction - Mono, Gentoo (nptl) & Muine
bugzilla-daemon@bugzilla.ximian.com
bugzilla-daemon@bugzilla.ximian.com
Fri, 24 Sep 2004 08:30:33 -0400 (EDT)
Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.
Changed by ed@catmur.co.uk.
http://bugzilla.ximian.com/show_bug.cgi?id=60576
--- shadow/60576 2004-09-24 08:06:18.000000000 -0400
+++ shadow/60576.tmp.384 2004-09-24 08:30:33.000000000 -0400
@@ -633,6 +633,46 @@
other was originally created by a ximianite, which should lend some
extra credulity to it (it seems we need that).
------- Additional Comments From dick@ximian.com 2004-09-24 08:06 -------
Don't mark it a dup just yet.
+
+------- Additional Comments From ed@catmur.co.uk 2004-09-24 08:30 -------
+More info:
+
+GC_thread_exit_proc() (libgc/pthread_support.c) is not getting called
+when threads exit. At all. As a result, although the thread in
+question has exited, it remains in the libgc thread table (GC_threads[]).
+
+If that were all then we'd be OK as the thread (which is dead) is not
+signalable so pthread_kill() returns ESRCH and the n_live_threads in
+GC_suspend_all() (libgc/pthread_stop_world.c) is decremented so the
+number of semaphores expected equals the number of threads alive to
+post them.
+
+However, what can happen is that pthread reuses that thread ID, and so
+the pthread_kill() in GC_suspend_all() succeeds multiple times on the
+same thread; however the thread only handles one signal and so the
+semaphore is posted fewer times than expected, leading to semaphore lock.
+
+This suggests a hack:
+
+--- libgc/pthread_support.c 2004-09-06 19:06:56.000000000 +0100
++++ libgc/pthread_support.c 2004-09-24 12:48:23.621325151 +0100
+@@ -603,6 +603,11 @@ GC_thread GC_new_thread(pthread_t id)
+ GC_thread result;
+ static GC_bool first_thread_used = FALSE;
+
++ for (result = GC_threads[hv]; result; result = result->next)
++ if (pthread_equal(result->id, id)) {
++ WARN("Thread 0x%lx already in table!", id);
++ return result;
++ }
+ if (!first_thread_used) {
+ result = &first_thread;
+ first_thread_used = TRUE;
+
+This indeed makes the test case run OK. So, now the question is why
+GC_thread_exit_proc is not getting called (it is set as a
+pthread_cleanup_push function) even though the (nptl) pthread library
+thinks the thread has exited.