[Mono-dev] Mono CI weather report 10/12
anmccl at microsoft.com
Wed Oct 12 18:54:31 UTC 2016
What this is: The Mono team has a CI (continuous integration) system which builds and runs automated tests on every commit checked in to git (specifically the master branch). We have a test log viewer<https://jenkins.mono-project.com/view/All/job/jenkins-testresult-viewer/Test_Result_View/> on Jenkins that tracks the results (currently only accessible to github project admins, sorry). Once a week I sweep through and write an email with a list of the most frequently-failing automated tests.
We have had a pretty rough couple weeks on the master branch and I actually have not even been able to create this list for the last two weeks. We’re starting to get back to stable but now there are some new issues, some with strange signatures and frustratingly low frequencies. #s 1, 2, 3 and 6 in particular this week are new or effectively new and need someone to look at them.
Here are top recurring failures currently ruining Jenkins builds:
0. System.Security.Cryptography.X509Certificates, various [New]
A couple of weeks ago we checked in a major set of changes— adding the BoringTLS system— to master without going through our normal PR process. This has lead to a large number of consistent failures on some lanes. This is being worked on, but there are still every-build failures on Mac Intel64 and Mac Intel32. It is not totally clear yet whether the final failures are due to a bug or due to certificates not being registered properly. The main person working on fixing this is Martin Baulig.
1. Null reference exceptions in System.Text.StringBuilder.Append, etc [New]
Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=45335 , examples in bugzilla. Null reference exceptions are turning up in various runtime tests, most often in StringBuilder.Append. It is not immediately clear if the problem is in the class libraries or if the runtime is turning references null.
2. AppDomain.internalUnload crash [Existing, but has increased in frequency]
Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=45337 , examples in bugzilla. We are seeing segfaults whose stacktraces have the signature:
at (wrapper managed-to-native) System.AppDomain.InternalUnload (int) <0x00012>
at System.AppDomain.Unload (System.AppDomain) <0x0002c>
We were seeing this a couple weeks ago when I last sent a CI weather report. A few people have told me this will possibly be fixed if we merge https://github.com/mono/mono/pull/3364 .
3. Hang doing thread join while closing process after ServiceModel tests [Existing, but has increased in frequency]
Hangs have been seen for the last few weeks while running the ServiceModel test suites. This has been seen on both Mac and Linux (I think it likes to crash on mac and hang on Linux?). Nothing is filed. When the crash occurs, it happens in the test runner itself, waiting for the tests to finish.
An example from this week:
Some more from a couple weeks ago:
Managed stack looks like:
at (wrapper managed-to-native) System.Threading.Thread.JoinInternal (System.Threading.Thread,int) <IL 0x00014, 0x00067>
at System.Threading.Thread.Join () [0x00000] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/class/referencesource/mscorlib/system/threading/thread.cs:697
at NUnit.Core.TestRunnerThread.Wait () [0x00010] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/TestRunnerThread.cs:118
at NUnit.Core.ThreadedTestRunner.Wait () [0x0000b] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:63
at NUnit.Core.ThreadedTestRunner.EndRun () [0x00000] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:55
at NUnit.Core.ThreadedTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00008] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ThreadedTestRunner.cs:36
at NUnit.Core.ProxyTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x00007] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/ProxyTestRunner.cs:133
at NUnit.Core.RemoteTestRunner.Run (NUnit.Core.EventListener,NUnit.Core.ITestFilter) [0x0002b] in /Users/builder/jenkins/workspace/test-mono-mainline/label/osx-amd64/mcs/nunit24/NUnitCore/core/RemoteTestRunner.cs:63
Native stack, when we get one, looks like:
0 mono 0x00000001073a7d5a mono_handle_native_sigsegv + 282
1 libsystem_platform.dylib 0x00007fff91ff152a _sigtramp + 26
2 ??? 0x00000001081b9a00 0x0 + 4430993920
3 mono 0x000000010756f763 mono_os_cond_timedwait + 163
4 mono 0x000000010756e326 mono_w32handle_timedwait_signal_handle + 358
5 mono 0x000000010756e0e1 mono_w32handle_wait_one + 897
6 mono 0x00000001075535f9 wapi_WaitForSingleObjectEx + 9
7 mono 0x00000001074a2cfe ves_icall_System_Threading_Thread_Join_internal + 174
4. ThreadAbortException in System.Threading.Timer+Scheduler.SchedulerThread (the "List`1 issue") [Existing]
Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43320 , currently assigned to Rodrigo. I thought this had gotten better for a week or so, but in the last 48 hours it’s occurred on at least one lane on almost every build, so maybe it was random fluctuations.
This occurs in many different places but the crash message always looks the same. It is believed to be existing bad behavior brought into the light by recent fixes by Vargaz around finalizers and VM shutdown.
System.TypeInitializationException: The type initializer for 'System.Collections.Generic.List`1' threw an exception. ---> System.Threading.ThreadAbortException
--- End of inner exception stack trace ---
at System.Threading.Timer+Scheduler.SchedulerThread () [0x0000f] in <filename unknown>:0
at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) [0x00017] in <filename unknown>:0
at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x0008d] in <filename unknown>:0
at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <filename unknown>:0
at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) [0x00031] in <filename unknown>:0
at System.Threading.ThreadHelper.ThreadStart () [0x0000b] in <filename unknown>:0
[MVID] 0deb57f9de664ff681556c641423618d 0,1,2,3,4,5
[ERROR] FATAL UNHANDLED EXCEPTION: Nested exception trying to figure out what went wrong
Some places this failure is seen include MonoTests.gshared.generic-marshalbyref.2.exe, MonoTests.runtime.bug-415577.exe, and as an unknown-test failure when a test suite (such as mcs/class/corlib) is shutting down.
A recent example:
An old examples:
https://jenkins.mono-project.com/job/test-mono-mainline/label=osx-i386/4656/parsed_console/log_content.html#WARNING1 (test shutdown)
5. MonoTests.System.Net.Sockets.SocketTest.SendAsyncFile [Existing]
Filed as https://bugzilla.xamarin.com/show_bug.cgi?id=43172 , currently assigned to Marcos Heinrich.
This has been failing for a pretty long time. It only occurs on Linux but on Linux it fails over 20% of the time. (It has also been seen on Android.) It is possible this is only an issue in CI (see akoeplinger note in bug).
The failure is consistent and looks like:
System.Exception : Could not abort registered blocking threads before closing socket.
at System.Net.Sockets.SafeSocketHandle.RegisterForBlockingSyscall () [0x00057] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/SafeSocketHandle.cs:114
at System.Net.Sockets.Socket.SendFile_internal (System.Net.Sockets.SafeSocketHandle safeHandle, System.String filename, System.Byte pre_buffer, System.Byte post_buffer, System.Net.Sockets.TransmitFileOptions flags) [0x00000] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2944
at System.Net.Sockets.Socket.SendFile (System.String fileName, System.Byte preBuffer, System.Byte postBuffer, System.Net.Sockets.TransmitFileOptions flags) [0x00028] in /mnt/jenkins/workspace/test-mono-mainline-linux/label/ubuntu-1404-amd64/mcs/class/System/System.Net.Sockets/Socket.cs:2893
6. Hang(?) during MS acceptance tests [New]
Not filed. See https://jenkins.mono-project.com/view/All/job/jenkins-testresult-viewer/Test_Result_View/builds.html#filterTestStep=make%20-w%20-C%20acceptance-tests%20check-ms-test-suite&groupBy=Builds&massFail=Hide&pr=Hide&span=Last7Days
The ms-test-suite normally takes six to nine minutes to run. Four times in the last week at least, always on Mac, it takes up the full 15 minutes and then times out. The failure occurs in different places so it doesn’t obviously appear one single test is hanging. This seems a bit high to be normal variance.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mono-devel-list