[Mono-dev] Mono CI weather report 9/15

Andi McClure anmccl at microsoft.com
Thu Sep 15 23:28:07 UTC 2016


What this is: The Mono team has a CI (continuous integration) system which builds and runs automated tests on every commit checked in to git (specifically the master branch). We have a test log viewer<https://jenkins.mono-project.com/view/All/job/jenkins-testresult-viewer/Test_Result_View/> on Jenkins that tracks the results. Once a week I sweep through and write an email with a list of the most frequently-failing automated tests. This is both so that everyone on the team is aware of our current stability level, and so that when people see failures in the github PR tests they know whether to treat them as known bugs or new failures. In the interest of making our development process more open, I’m going to start crossposting this weekly email on the public mailing list.

Here’s an overview of the top recurring failures currently ruining Jenkins builds:

1. MonoTests.runtime.reference-loader.exe failing 100%

This morning the fix for bug 42584 was reverted in order to fix a failure in the build of the GTK+ package. The corresponding unit test for 42584 did not get reverted and that test is now failing on every single build. This will be fixed soon either by removing the test or by re-applying the 42584 patch but for the moment, this failure is seen in every build.

2. MonoTests.System.Net.Sockets.SocketTest.SendAsyncFile

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43172 , currently assigned to Marcos Heinrich.

This has been failing for a pretty long time. It only occurs on Linux but on Linux it fails over 20% of the time. (It has also been seen on Android.) It is possible this is only an issue in CI (see akoeplinger note in bug).

The failure is consistent and looks like:


                                                MESSAGE:
                                                System.Exception : Could not abort registered blocking threads before closing socket.
Thread StackTrace:
  at System.Net.Sockets.SafeSocketHandle.RegisterForBlockingSyscall () [0x00057] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/SafeSocketHandle.cs:114
  at System.Net.Sockets.Socket.SendFile_internal (System.Net.Sockets.SafeSocketHandle safeHandle, System.String filename, System.Byte[] pre_buffer, System.Byte[] post_buffer, System.Net.Sockets.TransmitFileOptions flags) [0x00000] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2944
  at System.Net.Sockets.Socket.SendFile (System.String fileName, System.Byte[] preBuffer, System.Byte[] postBuffer, System.Net.Sockets.TransmitFileOptions flags) [0x00028] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2893

[snip]

Examples:

https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=ubuntu-1404-amd64/556/testReport/MonoTests.System.Net.Sockets/SocketTest/SendAsyncFile/https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=ubuntu-1404-i386/558/testReport/MonoTests.System.Net.Sockets/SocketTest/SendAsyncFile/

2.5. MonoTests.Remoting.RemotingServicesTest.MarshalThrowException

On ARM64 only, when this test calls ChannelServices.UnregisterChannel(), sometimes a KeyNotFoundException is generated somewhere in the guts of Socket.Close. This is filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43727 . It is possible this is the same issue as #2 above (see akoeplinger note in bug).

Examples:

https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-arm64/641/testReport/MonoTests.Remoting/RemotingServicesTest/MarshalThrowException/
https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-arm64/636/testReport/MonoTests.Remoting/RemotingServicesTest/MarshalThrowException/

3. ThreadAbortException in System.Threading.Timer+Scheduler.SchedulerThread
Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43320 , currently assigned to Rodrigo.

This occurs in many different places but the crash message always looks the same. It is believed to be existing bad behavior brought into the light by recent fixes by Vargaz around finalizers and VM shutdown.


Unhandled Exception:

System.TypeInitializationException: The type initializer for 'System.Collections.Generic.List`1' threw an exception. ---> System.Threading.ThreadAbortException

   --- End of inner exception stack trace ---

  at System.Threading.Timer+Scheduler.SchedulerThread () [0x0000f] in <filename unknown>:0

  at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) [0x00017] in <filename unknown>:0

  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x0008d] in <filename unknown>:0

  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <filename unknown>:0

  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) [0x00031] in <filename unknown>:0

  at System.Threading.ThreadHelper.ThreadStart () [0x0000b] in <filename unknown>:0

[MVID] 0deb57f9de664ff681556c641423618d 0,1,2,3,4,5

[ERROR] FATAL UNHANDLED EXCEPTION: Nested exception trying to figure out what went wrong


Some places this failure is seen include MonoTests.gshared.generic-marshalbyref.2.exe, MonoTests.runtime.bug-415577.exe, and as an unknown-test failure when a test suite (such as mcs/class/corlib) is shutting down.

Examples:

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4606/testReport/MonoTests/gshared/generic_marshalbyref_2_exe_3/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4607/testReport/MonoTests/gshared/generic_marshalbyref_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4608/testReport/MonoTests/runtime/bug_415577_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4656/parsed_console/log_content.html#WARNING1 (test shutdown)

4. __icall_wrapper_mono_gc_alloc_vector crash during thread start / domain unload

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43921 , currently assigned to Aleksey. We have started seeing SIGSEGVs in a range of tests related to domain unloading, or thread creation around the same time as the GC stopping the world. Mac only (?).

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4742/testReport/MonoTests/sgen-regular-tests-ms-split-95/sgen_domain_unload_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4744/testReport/MonoTests/sgen-regular-tests-ms-split-clear-at-gc/sgen_new_threads_dont_join_stw_2_exe/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4812/parsed_console/log_content.html#WARNING2

4.5 (?). AppDomain.internalUnload crash

This is also mac-only and might be the same failure as #4? Aleksey is looking into it.

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4812/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4811/<https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4811/parsed_console/log_content.html#WARNING1>
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4813/testReport/MonoTests/sgen-regular-tests-plain/sgen_domain_unload_exe_timedout/
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4812/testReport/ (both failures)
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4811/testReport/MonoTests/runtime/remoting4_exe_timedout/
Crashes, managed stack looks like:

  at (wrapper managed-to-native) System.AppDomain.InternalUnload (int) <0x00012>
  at System.AppDomain.Unload (System.AppDomain) [0x00011] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-i386/mcs/class/corlib/System/AppDomain.cs:1200
  at MonoTests.System.AppDomainTest.TearDown () [0x0000b] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-i386/mcs/class/corlib/Test/System/AppDomainTest.cs:71
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <IL 0x0004f, 0x00092>


...

5. Crash doing thread join while closing process after ServiceModel tests

Both crashes and hangs have been seen recently while running the ServiceModel test suites. This has been seen on both Mac and Linux. Nothing is filed. When the crash occurs, it happens in the test runner itself, waiting for the tests to finish.

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4808/parsed_console/log_content.html#WARNING1
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-amd64/4794/parsed_console/log_content.html#WARNING2
https://jenkins.mono-project.com/job/test-mono-mainline-linux/label=debian-8-arm64/739/parsed_console/log_content.html#WARNING1

Managed stack looks like:

  at (wrapper managed-to-native) System.Threading.Thread.JoinInternal (System.Threading.Thread,int) <IL 0x00014, 0x00067>
  at System.Threading.Thread.Join () [0x00000] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/class/referencesource/mscorlib/system/threading/thread.cs:697
  at NUnit.Core.TestRunnerThread.Wait () [0x00010] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/TestRunnerThread.cs:118
  at NUnit.Core.ThreadedTestRunner.Wait () [0x0000b] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:63
  at NUnit.Core.ThreadedTestRunner.EndRun () [0x00000] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:55
  at NUnit.Core.ThreadedTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00008] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:36
  at NUnit.Core.ProxyTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00007] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ProxyTestRunner.cs:133
  at NUnit.Core.RemoteTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x0002b] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/RemoteTestRunner.cs:63


Native stack, when we get one, looks like:

        0   mono                                0x00000001073a7d5a mono_handle_native_sigsegv + 282
        1   libsystem_platform.dylib            0x00007fff91ff152a _sigtramp + 26
        2   ???                                 0x00000001081b9a00 0x0 + 4430993920
        3   mono                                0x000000010756f763 mono_os_cond_timedwait + 163
        4   mono                                0x000000010756e326 mono_w32handle_timedwait_signal_handle + 358
        5   mono                                0x000000010756e0e1 mono_w32handle_wait_one + 897
        6   mono                                0x00000001075535f9 wapi_WaitForSingleObjectEx + 9
        7   mono                                0x00000001074a2cfe ves_icall_System_Threading_Thread_Join_internal + 174


6. Tarjan GC bridge crashing in “major fragmentation” test

sgen-bridge-major-fragmentation.exe, which runs with a simulated version of the Android GC bridge, has in the last 24 hours started segfaulting about 1/3 of the time on ÅRM soft float (but never on any other platform). The stacks are consistent. Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=44397 , currently assigned to me.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dot.net/pipermail/mono-devel-list/attachments/20160915/476dcff0/attachment-0001.html>


More information about the Mono-devel-list mailing list