[Mono-dev] thread race during code generation

David Miller davem at davemloft.net
Sun Nov 4 03:34:28 EST 2007


Sometimes a thread abort is missed while running the nunit tests,
specifically when running the following under mcs/class/corlib/:

MONO_REGISTRY_PATH="/home/davem/.mono/registry" MONO_PATH="../../class/lib/default::$MONO_PATH" /home/davem/src/MONO/mono/runtime/mono-wrapper --debug  ../../class/lib/default/nunit-console.exe   /exclude:NotWorking,ValueAdd,CAS,InetAccess /output:TestResult-default.log /xml:TestResult-default.xml  corlib_test_default.dll

Here is a trace of what happens, I think the troublesome case that
triggers this are the cases exercising class C2Test in
mcs/class/corlib/Test/System.Threading/ThreadTest.cs

After all the tests run the threading layer spits out an error because
it cannot abort a thread referencing the
domain-corlib_test_default.dll domain:

--------------------
Tests run: 6368, Failures: 14, Not run: 37, Time: 235.439826 seconds

ABORTING THREAD 4123343760 BECAUSE IT REFERENCES DOMAIN domain-corlib_test_default.dll.
DEBUG: Sending signal 35 to TID 4123343760
DEBUG: pthread_kill() returns 0
DEBUG: TID 4123343760 receiving signal.
DEBUG: IP(0xf7e2365c) ji=(nil)
ABORTING THREAD 4117756816 BECAUSE IT REFERENCES DOMAIN domain-corlib_test_default.dll.
DEBUG: exc((nil))
Waiting for 2 TIDs
ABORTING THREAD 4117756816 BECAUSE IT REFERENCES DOMAIN domain-corlib_test_default.dll.
Waiting for 1 TIDs
ABORTING THREAD 4117756816 BECAUSE IT REFERENCES DOMAIN domain-corlib_test_default.dll.
Waiting for 1 TIDs
 ...

** (../../class/lib/default/nunit-console.exe:30276): WARNING **: Aborting of threads in domain domain-corlib_test_default.dll timed out.

--------------------

Earlier in the run we initially tried to kill this seemingly
stuc thread 4117756816:

--------------------
DEBUG: Sending signal 35 to TID 4117756816
DEBUG: pthread_kill() returns 0
DEBUG: TID 4117756816 receiving signal.
DEBUG: IP(0x13b234) ji=(nil)
DEBUG: exc((nil))
--------------------

That IP program counter is in the JIT optimizer, specifically
in mono_aliasing_get_affected_variables_for_inst_traversing_code()

I am pretty sure it is compiling C2Test.TestMethod() and then
jumping to it.  Since the signal arrives while the thread is
not executing the managed code yet, the signal is basically
ignored.

At this point the event is lost because the thread state already has
the ThreadState_AbortRequested bit set, therefore subsequent abort
attempts will skip trying to resend the signal.

So, it will loop there forever in C2Test.TestMethod().

With this information, I hope it should be pretty easy for someone to
fix this bug. :-)




More information about the Mono-devel-list mailing list