[Mono-dev] Mono CI weather report 10/12

Andi McClure anmccl at microsoft.com
Wed Oct 12 18:54:31 UTC 2016

What this is: The Mono team has a CI (continuous integration) system which builds and runs automated tests on every commit checked in to git (specifically the master branch). We have a test log viewer<https://jenkins.mono-project.com/view/All/job/jenkins-testresult-viewer/Test_Result_View/> on Jenkins that tracks the results (currently only accessible to github project admins, sorry). Once a week I sweep through and write an email with a list of the most frequently-failing automated tests.

We have had a pretty rough couple weeks on the master branch and I actually have not even been able to create this list for the last two weeks.  We’re starting to get back to stable but now there are some new issues, some with strange signatures and frustratingly low frequencies. #s 1, 2, 3 and 6 in particular this week are new or effectively new and need someone to look at them.

Here are top recurring failures currently ruining Jenkins builds:

0. System.Security.Cryptography.X509Certificates, various [New]

A couple of weeks ago we checked in a major set of changes— adding the BoringTLS system— to master without going through our normal PR process. This has lead to a large number of consistent failures on some lanes. This is being worked on, but there are still every-build failures on Mac Intel64 and Mac Intel32. It is not totally clear yet whether the final failures are due to a bug or due to certificates not being registered properly. The main person working on fixing this is Martin Baulig.

1. Null reference exceptions in System.Text.StringBuilder.Append, etc [New]

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=45335 , examples in bugzilla. Null reference exceptions are turning up in various runtime tests, most often in StringBuilder.Append. It is not immediately clear if the problem is in the class libraries or if the runtime is turning references null.

2. AppDomain.internalUnload crash [Existing, but has increased in frequency]

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=45337 , examples in bugzilla. We are seeing segfaults whose stacktraces have the signature:

  at (wrapper managed-to-native) System.AppDomain.InternalUnload (int) <0x00012>
  at System.AppDomain.Unload (System.AppDomain) <0x0002c>

We were seeing this a couple weeks ago when I last sent a CI weather report. A few people have told me this will possibly be fixed if we merge https://github.com/mono/mono/pull/3364 .

3. Hang doing thread join while closing process after ServiceModel tests [Existing, but has increased in frequency]

Hangs have been seen for the last few weeks while running the ServiceModel test suites. This has been seen on both Mac and Linux (I think it likes to crash on mac and hang on Linux?). Nothing is filed. When the crash occurs, it happens in the test runner itself, waiting for the tests to finish.

An example from this week:


Some more from a couple weeks ago:


Managed stack looks like:

  at (wrapper managed-to-native) System.Threading.Thread.JoinInternal (System.Threading.Thread,int) <IL 0x00014, 0x00067>
  at System.Threading.Thread.Join () [0x00000] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/class/referencesource/mscorlib/system/threading/thread.cs:697
  at NUnit.Core.TestRunnerThread.Wait () [0x00010] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/TestRunnerThread.cs:118
  at NUnit.Core.ThreadedTestRunner.Wait () [0x0000b] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:63
  at NUnit.Core.ThreadedTestRunner.EndRun () [0x00000] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:55
  at NUnit.Core.ThreadedTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00008] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:36
  at NUnit.Core.ProxyTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00007] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ProxyTestRunner.cs:133
  at NUnit.Core.RemoteTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x0002b] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/RemoteTestRunner.cs:63

Native stack, when we get one, looks like:

        0   mono                                0x00000001073a7d5a mono_handle_native_sigsegv + 282
        1   libsystem_platform.dylib            0x00007fff91ff152a _sigtramp + 26
        2   ???                                 0x00000001081b9a00 0x0 + 4430993920
        3   mono                                0x000000010756f763 mono_os_cond_timedwait + 163
        4   mono                                0x000000010756e326 mono_w32handle_timedwait_signal_handle + 358
        5   mono                                0x000000010756e0e1 mono_w32handle_wait_one + 897
        6   mono                                0x00000001075535f9 wapi_WaitForSingleObjectEx + 9
        7   mono                                0x00000001074a2cfe ves_icall_System_Threading_Thread_Join_internal + 174

4. ThreadAbortException in System.Threading.Timer+Scheduler.SchedulerThread  (the "List`1 issue")  [Existing]

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43320 , currently assigned to Rodrigo. I thought this had gotten better for a week or so, but in the last 48 hours it’s occurred on at least one lane on almost every build, so maybe it was random fluctuations.

This occurs in many different places but the crash message always looks the same. It is believed to be existing bad behavior brought into the light by recent fixes by Vargaz around finalizers and VM shutdown.

Unhandled Exception:

System.TypeInitializationException: The type initializer for 'System.Collections.Generic.List`1' threw an exception. ---> System.Threading.ThreadAbortException

   --- End of inner exception stack trace ---

  at System.Threading.Timer+Scheduler.SchedulerThread () [0x0000f] in <filename unknown>:0

  at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) [0x00017] in <filename unknown>:0

  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x0008d] in <filename unknown>:0

  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <filename unknown>:0

  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) [0x00031] in <filename unknown>:0

  at System.Threading.ThreadHelper.ThreadStart () [0x0000b] in <filename unknown>:0

[MVID] 0deb57f9de664ff681556c641423618d 0,1,2,3,4,5

[ERROR] FATAL UNHANDLED EXCEPTION: Nested exception trying to figure out what went wrong

Some places this failure is seen include MonoTests.gshared.generic-marshalbyref.2.exe, MonoTests.runtime.bug-415577.exe, and as an unknown-test failure when a test suite (such as mcs/class/corlib) is shutting down.

A recent example:


An old examples:

https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4656/parsed_console/log_content.html#WARNING1 (test shutdown)

5. MonoTests.System.Net.Sockets.SocketTest.SendAsyncFile [Existing]

Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43172 , currently assigned to Marcos Heinrich.

This has been failing for a pretty long time. It only occurs on Linux but on Linux it fails over 20% of the time. (It has also been seen on Android.) It is possible this is only an issue in CI (see akoeplinger note in bug).

The failure is consistent and looks like:

                                                System.Exception : Could not abort registered blocking threads before closing socket.
Thread StackTrace:
  at System.Net.Sockets.SafeSocketHandle.RegisterForBlockingSyscall () [0x00057] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/SafeSocketHandle.cs:114
  at System.Net.Sockets.Socket.SendFile_internal (System.Net.Sockets.SafeSocketHandle safeHandle, System.String filename, System.Byte[] pre_buffer, System.Byte[] post_buffer, System.Net.Sockets.TransmitFileOptions flags) [0x00000] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2944
  at System.Net.Sockets.Socket.SendFile (System.String fileName, System.Byte[] preBuffer, System.Byte[] postBuffer, System.Net.Sockets.TransmitFileOptions flags) [0x00028] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2893




6. Hang(?) during MS acceptance tests [New]

Not filed. See https://jenkins.mono-project.com/view/All/job/jenkins-testresult-viewer/Test_Result_View/builds.html#filterTestStep=make%20-w%20-C%20acceptance-tests%20check-ms-test-suite&groupBy=Builds&massFail=Hide&pr=Hide&span=Last7Days

The ms-test-suite normally takes six to nine minutes to run. Four times in the last week at least, always on Mac, it takes up the full 15 minutes and then times out. The failure occurs in different places so it doesn’t obviously appear one single test is hanging. This seems a bit high to be normal variance.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dot.net/pipermail/mono-devel-list/attachments/20161012/58214fee/attachment-0001.html>

More information about the Mono-devel-list mailing list