[Mono-list] Looking for mono expert to help debug a hanging process

River Satya river.satya at gmail.com
Thu Nov 12 05:54:23 UTC 2015


Great, thanks for the helpful response Edward. My responses to your
comments inline, and updated info at the end:

On 10 November 2015 at 08:56, Edward Ned Harvey (mono) <
edward.harvey.mono at clevertrove.com> wrote:

> > From: mono-list-bounces at lists.ximian.com [mailto:mono-list-
> > bounces at lists.ximian.com] On Behalf Of River Satya
> >
> > We have a c# binary running under mono on Ubuntu 14.04 which hangs
> > periodically.
> >
> > When it hangs, SIGQUIT does not generate a thread dump, and all threads,
> > including one heartbeat thread that does very little but pulse the logs
> once a
> > minute, seem to stop.
>
> First and foremost, make sure you're running the latest version of mono.
> What version are you on?
>

Mono JIT compiler version 4.0.4 (Stable 4.0.4.1/5ab4c0d Tue Aug 25 23:11:51
UTC 2015)


>
> You should also be aware, that Xamarin has a list of 3rd party contractors
> for support work like this. You should be able to find that on their
> website.
>

Great, thanks!


> Sounds like (probably) a deadlock. But a deadlock between some other
> threads shouldn't affect your heartbeat thread - unless your heartbeat
> thread is dependent on something. How is your heartbeat thread written?
>

It's a bit of a stretch to call it a heartbeat thread. It's actually the
main thread, and writes a log line once per minute. It also watches a
CancellationTokenSource for  cancellation via a Unix signal (to allow
graceful shutdown in case of SIGTERM/SIGINT). It also does cleanup of other
completed Tasks etc. It's certainly not impossible that it's blocking on
another thread.


> For example, if you have a heartbeat thread that uses a Timer, the Timer
> needs to raise an async event from the threadpool, so if the threadpool is
> drained by some other threads, then your Timer event might not occur. But
> if you created a managed instance of System.Threading.Thread, and then
> launched it into a while(true) loop, that uses
> System.Threading.Thread.Sleep(), you can be assured you don't have a
> dependency on the threadpool.


We don't use the threadpool from the main thread (apart from at startup),
though it is used elsewhere in the app. I'd be very surprised if we're
maxing out the threadpool (default number is 100?), unless there's a leak
somewhere, which is possible, though I think I'd have seen it. We never
seem to go above 40 total threads.


> But if you accidentally drop reference to your heartbeat thread, some time
> later it will be collected by the GC (while it's still running) which is no
> bueno.


It's the main thread, so presumably this isn't a problem.


> If the heartbeat thread is using any locking, that's a possible issue. If
> it's writing to some log resource, or file, which is shared by other
> threads, that's a possible issue.
>

It writes logs using log4net, and there is definitely some locking code in
it.

time passes....

Okay, so it turns out that the machine was low on memory at the time that
this happened (~ 80MiB). I'm not sure if this is a symptom of what was
happening or the cause. Either way, I spun up a new instance with double
the memory and retested.

Now I'm seeing different symptoms, but still concerning, and possibly
related.

Several times a day, we get segfaults printed to stdout, often with no
stacktrace:
ie

Stacktrace:
>
> Native stacktrace:


and sometimes with a stacktrace: eg

* Assertion at mono-internal-hash.c:125, condition `0' not met
>
>
>> Stacktrace:
>
>
>>
>> Native stacktrace:
>
>
>>         /usr/bin/mono() [0x4b23dc]
>
>         /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7f9f06cbc340]
>
>         /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7f9f0691dcc9]
>
>         /lib/x86_64-linux-gnu/libc.so.6(abort+0x148) [0x7f9f069210d8]
>
>         /usr/bin/mono() [0x629869]
>
>         /usr/bin/mono() [0x629a77]
>
>         /usr/bin/mono() [0x629bc6]
>
>         /usr/bin/mono() [0x6193ac]
>
>         /usr/bin/mono() [0x422086]
>
>         /usr/bin/mono() [0x5a6f02]
>
>         /usr/bin/mono() [0x5b0610]
>
>         /usr/bin/mono() [0x5a1c89]
>
>         /usr/bin/mono() [0x5a1cc0]
>
>         /usr/bin/mono() [0x5a215d]
>
>         /usr/bin/mono() [0x5874e8]
>
>         /usr/bin/mono() [0x623a36]
>
>         /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7f9f06cb4182]
>
>         /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f9f069e147d]
>
>
>> Debug info from gdb:
>
>
>>
>> =================================================================
>
> Got a SIGABRT while executing native code. This usually indicates
>
> a fatal error in the mono runtime or one of the native libraries
>
> used by your application.
>
> =================================================================
>
>
>
Thanks again for your help!

Cheers,

River
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ximian.com/pipermail/mono-list/attachments/20151112/55e395b6/attachment-0001.html>


More information about the Mono-list mailing list