[Mono-devel-list] [PATCH] Race condition when restarting threads

Sun Jul 3 11:29:19 EDT 2005

Hey,

In a Mono bug report, we noticed a very rare race in the GC when
restarting the world. GC_restart_handler states:

    /* Let the GC_suspend_handler() know that we got a SIG_THR_RESTART. */
    /* The lookup here is safe, since I'm doing this on behalf  */
    /* of a thread which holds the allocation lock in order	*/
    /* to stop the world.  Thus concurrent modification of the	*/
    /* data structure is impossible.				*/

However, this comment is not always true. When starting the world, the
thread that does the restarting does *not* wait for all threads to get
past the point where they need the structures used by the lookup for it
to release the GC_lock.

So the sequence of events looked something like:

      * T1 signals T2 to restart the world
      * T1 releases the GC_lock
      * T3 is a newborn thread and adds itself to the table
      * T2 gets the signal and sees a corrupt table because T3 is
        concurrently modifying it.

What would end up happening when we experienced the race was either a
deadlock or a SIGSEGV.

The race was extremely rare. It took 1-2 hours to reproduce on an SMP
machine. With the attached patch, it has not segfaulted or hung for 21
hrs.

-- Ben
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gc.patch
Type: text/x-patch
Size: 1309 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20050703/e09d1119/attachment.bin