[Mono-bugs] [Bug 77470][Nor] Changed - mono_thread_attach/mono_thread_detach can cause deadlock/segfault on OS X

Mon Mar 20 20:41:09 EST 2006

Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.

Changed by bryan at imeem.com.

http://bugzilla.ximian.com/show_bug.cgi?id=77470

--- shadow/77470	2006-02-09 19:38:03.000000000 -0500
+++ shadow/77470.tmp.619	2006-03-20 20:41:09.000000000 -0500
@@ -101,6 +101,79 @@
 Program will always segfault or deadlock.
 
 Additional Information:
 
 On both ia32 Linux and OS X 10.4.4, the sample code causes a lot of warning output as 
 described in Bug #77468.
+
+------- Additional Comments From bryan at imeem.com  2006-03-20 20:41 -------
+So Allan and I revisited this today with mono HEAD (r58196), and here is our assessment 
+of what's happening.  On both OS X on Intel and OS X on PPC, racy still crashes attempting 
+to dereference a member in the GC_mach_threads array.  Here is the backtrace (on PPC, 
+THREAD_COUNT = 256):
+
+Program received signal EXC_BAD_ACCESS, Could not access memory.
+Reason: KERN_PROTECTION_FAILURE at address: 0x01278e58
+[Switching to process 11906 thread 0x5803]
+0x011b5fa0 in GC_suspend_thread_list (act_list=0xa4000, count=247, old_list=0x0, 
+old_count=0) at darwin_stop_world.c:316
+316           GC_mach_threads[GC_mach_threads_count].already_suspended = 0;
+(gdb) bt
+#0  0x011b5fa0 in GC_suspend_thread_list (act_list=0xa4000, count=247, old_list=0x0, 
+old_count=0) at darwin_stop_world.c:316
+#1  0x011b6228 in GC_stop_world () at darwin_stop_world.c:409
+#2  0x0119ddf4 in GC_stopped_mark (stop_func=0x119d020 <GC_never_stop_func>) at 
+alloc.c:504
+#3  0x0119da1c in GC_try_to_collect_inner (stop_func=0x119d020 
+<GC_never_stop_func>) at alloc.c:386
+#4  0x0119f1c4 in GC_collect_or_expand (needed_blocks=1, ignore_off_page=0) at 
+alloc.c:1046
+#5  0x0119f568 in GC_allocobj (sz=60, kind=1) at alloc.c:1126
+#6  0x011a60d8 in GC_generic_malloc_inner (lb=176, k=1) at malloc.c:136
+#7  0x011a62b0 in GC_generic_malloc (lb=176, k=1) at malloc.c:192
+#8  0x011a669c in GC_malloc (lb=176) at malloc.c:297
+#9  0x010d2668 in mono_object_allocate (size=176, vtable=0x1803e2c) at object.c:2301
+#10 0x010d25c8 in mono_object_new_alloc_specific (vtable=0x1803e2c) at object.c:2398
+#11 0x010d2514 in mono_object_new_specific (vtable=0x1803e2c) at object.c:2384
+#12 0x010d239c in mono_object_new (domain=0x5cf00, klass=0x160d7a0) at object.c:
+2345
+#13 0x0110d3cc in mono_thread_attach (domain=0x5cf00) at threads.c:408
+#14 0x000028ac in thread_function ()
+#15 0x9002b1e0 in _pthread_body ()
+(gdb) 
+
+GC_mach_threads_count is:
+(gdb) p GC_mach_threads_count 
+$1 = 44035
+...which is obviously wrong (you can see that count = 247 above).
+
+In the source, GC_mach_threads_count is statically defined right below GC_mach_threads, 
+and so my guess is that GC_mach_threads_count is greater than THREAD_TABLE_SZ, and 
+access into the GC_mach_threads array overflows and then overwrites the 
+GC_mach_threads_count variable, and things go wrong from there.  THREAD_TABLE_SZ is 
+#define'd to be 128 elsewhere in the source.
+
+As a workaround, in our local tree, we've defined this to be 2048 and this particular crash 
+appears to be at least mitigated.  However, we then eventually get (at least on Intel):
+
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+[...]
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+warning: Error 6 getting port names from mach_port_names
+
+Program received signal EXC_BAD_ACCESS, Could not access memory.
+Reason: KERN_INVALID_ADDRESS at address: 0x00374084
+[Switching to process 25049 local thread 0x180f]
+Cannot remove breakpoints because program is no longer writable.
+It might be running in another process.
+Further execution is probably impossible.
+0x00374084 in __i686.get_pc_thunk.bx ()
+
+If not running under GDB, you get an Illegal instruction error and then the program exits.
+