[Mono-bugs] [Bug 341923] New: thread race during code generation

bugzilla_noreply at novell.com bugzilla_noreply at novell.com
Thu Nov 15 08:14:57 EST 2007


https://bugzilla.novell.com/show_bug.cgi?id=341923

           Summary: thread race during code generation
           Product: Mono: Runtime
           Version: unspecified
          Platform: Other
        OS/Version: Other
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: misc
        AssignedTo: mono-bugs at ximian.com
        ReportedBy: lupus at novell.com
         QAContact: mono-bugs at ximian.com
          Found By: ---


[From a mono-devel mail by David Miller]
Sometimes a thread abort is missed while running the nunit tests,
specifically when running the following under mcs/class/corlib/:

MONO_REGISTRY_PATH="/home/davem/.mono/registry"
MONO_PATH="../../class/lib/default::$MONO_PATH"
/home/davem/src/MONO/mono/runtime/mono-wrapper --debug
./../class/lib/default/nunit-console.exe  
/exclude:NotWorking,ValueAdd,CAS,InetAccess
/output:TestResult-default.log /xml:TestResult-default.xml 
corlib_test_default.dll

Here is a trace of what happens, I think the troublesome case that
triggers this are the cases exercising class C2Test in
mcs/class/corlib/Test/System.Threading/ThreadTest.cs

After all the tests run the threading layer spits out an error because
it cannot abort a thread referencing the
domain-corlib_test_default.dll domain:

--------------------
Tests run: 6368, Failures: 14, Not run: 37, Time: 235.439826 seconds

ABORTING THREAD 4123343760 BECAUSE IT REFERENCES DOMAIN
domain-corlib_test_default.dll.
DEBUG: Sending signal 35 to TID 4123343760
DEBUG: pthread_kill() returns 0
DEBUG: TID 4123343760 receiving signal.
DEBUG: IP(0xf7e2365c) ji=(nil)
ABORTING THREAD 4117756816 BECAUSE IT REFERENCES DOMAIN
domain-corlib_test_default.dll.
DEBUG: exc((nil))
Waiting for 2 TIDs
ABORTING THREAD 4117756816 BECAUSE IT REFERENCES DOMAIN
domain-corlib_test_default.dll.
Waiting for 1 TIDs
ABORTING THREAD 4117756816 BECAUSE IT REFERENCES DOMAIN
domain-corlib_test_default.dll.
Waiting for 1 TIDs
 ...

** (../../class/lib/default/nunit-console.exe:30276): WARNING **: Aborting of
threads in
domain domain-corlib_test_default.dll timed out.

--------------------

Earlier in the run we initially tried to kill this seemingly
stuc thread 4117756816:

--------------------
DEBUG: Sending signal 35 to TID 4117756816
DEBUG: pthread_kill() returns 0
DEBUG: TID 4117756816 receiving signal.
DEBUG: IP(0x13b234) ji=(nil)
DEBUG: exc((nil))
--------------------

That IP program counter is in the JIT optimizer, specifically
in mono_aliasing_get_affected_variables_for_inst_traversing_code()

I am pretty sure it is compiling C2Test.TestMethod() and then
jumping to it.  Since the signal arrives while the thread is
not executing the managed code yet, the signal is basically
ignored.

At this point the event is lost because the thread state already has
the ThreadState_AbortRequested bit set, therefore subsequent abort
attempts will skip trying to resend the signal.

So, it will loop there forever in C2Test.TestMethod().

The only solutions I see to this problem are:

1) Check for pending exceptions and events such as a thread
   state change right before every time we jump into managed
   code.

   This unfortunately, is also racey, the signal can arrive
   right after we retest and before we jump into the managed
   code.

2) Mask out the thread event signal, and somehow atomically
   unmask that signal and branch to the managed code.  Perhaps
   using a setcontext() call from a trampoline of some sort.

Or, we could simply allow thread abort events to be processed
even if they arrive while compiling, if that can be managed
properly wrt. dropping domain locks and things of that
nature.


-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.


More information about the mono-bugs mailing list