[Mono-dev] FW: Random hangs while running mono app

Burkhard Linke blinke at CeBiTec.Uni-Bielefeld.DE
Thu May 19 14:30:59 UTC 2016


Hi,

On 04/29/2016 04:12 PM, Rodrigo Kumpera wrote:
> This looks like a shutdown bug in mono.
>
> Do you have a reliable way to reproduce it?
> How loaded are the machines running your workload?

We have encountered the same(?) bug on our compute cluster. Applications 
process data, write output files, but do not terminate.

(gdb) info threads
   Id   Target Id         Frame
   6    Thread 0x2b1f83200700 (LWP 63141) "mono" 
pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
   5    Thread 0x2b1f84cf3700 (LWP 63142) "Finalizer" sem_wait () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
   4    Thread 0x2b1f87ee1700 (LWP 63143) "mono" 
pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
   3    Thread 0x2b1f8c81d700 (LWP 63148) "Timer-Scheduler" 
pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
   2    Thread 0x2b1fe1133700 (LWP 63248) "mono" 
pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 1    Thread 0x2b1f81c98580 (LWP 63140) "mono" 
pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
(gdb) thread apply all bt

Thread 6 (Thread 0x2b1f83200700 (LWP 63141)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000005f9aec in ?? ()
#2  0x00002b1f8259b182 in start_thread (arg=0x2b1f83200700) at 
pthread_create.c:312
#3  0x00002b1f828ab47d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 5 (Thread 0x2b1f84cf3700 (LWP 63142)):
#0  sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
#1  0x000000000061de28 in mono_sem_wait ()
#2  0x00000000005a2076 in ?? ()
#3  0x00000000005843d3 in ?? ()
#4  0x0000000000624666 in ?? ()
#5  0x00002b1f8259b182 in start_thread (arg=0x2b1f84cf3700) at 
pthread_create.c:312
#6  0x00002b1f828ab47d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x2b1f87ee1700 (LWP 63143)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00002b1f867ce29c in cl_thread_wait_for_thread_condition () from 
/usr/lib/gridengine-drmaa/lib/libdrmaa.so
#2  0x00002b1f867ce6d3 in cl_thread_wait_for_event () from 
/usr/lib/gridengine-drmaa/lib/libdrmaa.so
#3  0x00002b1f867b297f in ?? () from 
/usr/lib/gridengine-drmaa/lib/libdrmaa.so
#4  0x00002b1f8259b182 in start_thread (arg=0x2b1f87ee1700) at 
pthread_create.c:312
#5  0x00002b1f828ab47d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 3 (Thread 0x2b1f8c81d700 (LWP 63148)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at 
../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00000000005fef47 in ?? ()
#2  0x000000000061101b in ?? ()
#3  0x000000000058415e in ?? ()
#4  0x0000000000585309 in ?? ()
#5  0x0000000041806ecd in ?? ()
#6  0x00002b1f90004990 in ?? ()
#7  0xffffffffffffffff in ?? ()
#8  0x7fffffffffffffff in ?? ()
#9  0x00002b1f82e1b1b0 in ?? ()
#10 0xffffffffffffffff in ?? ()
#11 0x00002b1f90004880 in ?? ()
#12 0x0000000041806e4a in ?? ()
#13 0x00002b1f8c81c780 in ?? ()
#14 0x00002b1f8c81c6f0 in ?? ()
/build/buildd/gdb-7.7.1/gdb/dwarf2-frame.c:692: internal-error: Unknown 
CFI encountered.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
Quit this debugging session? (y or n)

(The gbd crash might or might not be part of the problem).

OS is Ubuntu 14.04, with mono from the xamarin repositories:
# mono --version
Mono JIT compiler version 4.2.3 (Stable 4.2.3.4/832de4b Wed Mar 16 
13:19:08 UTC 2016)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. 
www.mono-project.com
     TLS:           __thread
     SIGSEGV:       altstack
     Notifications: epoll
     Architecture:  amd64
     Disabled:      none
     Misc:          softdebug
     LLVM:          supported, not enabled.
     GC:            sgen

The process is still running if you need further debugging information. 
The problem does not affect all instance, but about 20%. It is thus 
cannot be reproduced reliably.

Regards,
Burkhard


More information about the Mono-devel-list mailing list