[Mono-dev] FW: Random hangs while running mono app
George, Glover E ERDC-RDE-ITL-MS CIV
Glover.E.George at erdc.dren.mil
Fri Jun 3 19:12:22 UTC 2016
I¹ve been trying to reproduce the case where my jobs get a core dump when
the finalizer doesn¹t return in time, but I¹ve only been able to reproduce
the previously posted stack trace which causes the job to hang in ³Sl²
state (using ps). Posting the stack trace again because I somehow didn¹t
pasted on of the threads entire stacktrace:
Rodrigo, can you please follow up on the email I posted about not seeing
anything in STDOUT/STDERR? My code doesn¹t seem to make it to where you
are referring to.
Burkhard, what file system are you guys using on your cluster? NFS,
Gluster, Lustre?
(gdb) thread apply all bt
Thread 3 (Thread 0x7fffebfff700 (LWP 2269)):
#0 0x00007fffeccca66c in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x000000000060c873 in mono_os_cond_wait (mutex=0x97e640 <lock>,
cond=0x97e600 <work_cond>) at ../../mono/utils/mono-os-mutex.h:105
#2 thread_func (thread_data=0x0) at sgen-thread-pool.c:118
#3 0x00007fffeccc6806 in start_thread () from /lib64/libpthread.so.0
#4 0x00007fffec80a9bd in clone () from /lib64/libc.so.6
#5 0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7fffec637700 (LWP 2272)):
#0 0x00007fffec75ec8b in sigsuspend () from /lib64/libc.so.6
#1 0x000000000063cda6 in suspend_signal_handler (_dummy=<optimized out>,
info=<optimized out>, context=0x7fffec633f80) at
mono-threads-posix-signals.c:209
#2 <signal handler called>
#3 0x00007fffed8faf97 in open64 () from /lib64/ld-linux-x86-64.so.2
#4 0x00007fffed8ea82d in open_verify () from /lib64/ld-linux-x86-64.so.2
#5 0x00007fffed8ecca1 in _dl_map_object () from /lib64/ld-linux-x86-64.so.2
#6 0x00007fffed8f7400 in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#7 0x00007fffed8f2e86 in _dl_catch_error () from
/lib64/ld-linux-x86-64.so.2
#8 0x00007fffed8f6e3b in _dl_open () from /lib64/ld-linux-x86-64.so.2
#9 0x00007fffecedcf9b in dlopen_doit () from /lib64/libdl.so.2
#10 0x00007fffed8f2e86 in _dl_catch_error () from
/lib64/ld-linux-x86-64.so.2
#11 0x00007fffecedd33c in _dlerror_run () from /lib64/libdl.so.2
#12 0x00007fffecedcf01 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#13 0x0000000000631345 in mono_dl_open_file (file=<optimized out>,
flags=<optimized out>) at mono-dl-posix.c:67
#14 0x0000000000630b79 in mono_dl_open (name=name at entry=0x19839c0
"/p/home/apps/unsupported/NAVAIR/build/mono-4.3.2/lib/libSystem.Data.dll.so
", flags=flags at entry=1, error_msg=error_msg at entry=0x7fffec634e80) at
mono-dl.c:150
#15 0x000000000054b9f0 in cached_module_load (name=name at entry=0x19839c0
"/p/home/apps/unsupported/NAVAIR/build/mono-4.3.2/lib/libSystem.Data.dll.so
", err=err at entry=0x7fffec634e80, flags=1) at loader.c:1398
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#16 0x000000000054cc78 in mono_lookup_pinvoke_call (method=method at entry=,
exc_class=exc_class at entry=0x7fffec635f00,
exc_arg=exc_arg at entry=0x7fffec635f08) at loader.c:1641
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#17 0x0000000000562ce6 in mono_marshal_get_native_wrapper
(method=method at entry=, check_exceptions=check_exceptions at entry=1, aot=0)
at marshal.c:7396
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#18 0x0000000000452912 in mono_method_to_ir (cfg=cfg at entry=0x1984120,
method=method at entry=, start_bblock=<optimized out>,
start_bblock at entry=0x0, end_bblock=<optimized out>, end_bblock at entry=0x0,
return_var=return_var at entry=0x0, inline_args=inline_args at entry=0x0,
inline_offset=0,
is_virtual_call=0) at method-to-ir.c:9280
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#19 0x00000000005097d9 in mini_method_compile (method=method at entry=,
opts=opts at entry=370239999, domain=domain at entry=0x9d9e00,
flags=flags at entry=JIT_FLAG_RUN_CCTORS, parts=parts at entry=0,
aot_method_index=aot_method_index at entry=-1) at mini.c:3608
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#20 0x000000000050afb5 in mono_jit_compile_method_inner
(method=method at entry=, target_domain=target_domain at entry=0x9d9e00,
opt=opt at entry=370239999, jit_ex=jit_ex at entry=0x7fffec636678) at mini.c:4263
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#21 0x0000000000428458 in mono_jit_compile_method_with_opt
(method=method at entry=, opt=370239999, ex=ex at entry=0x7fffec636678) at
mini-runtime.c:1952
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#22 0x0000000000428c1b in mono_jit_compile_method (method=) at
mini-runtime.c:2008
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#23 0x00000000004ad743 in common_call_trampoline_inner
(regs=regs at entry=0x7fffec636890, code=code at entry=0x40244e34 "\270\001",
m=m at entry=, vt=vt at entry=0x0, vtable_slot=<optimized out>,
vtable_slot at entry=0x0) at mini-trampolines.c:694
Python Exception <type 'exceptions.ValueError'> zero length field name in
format:
#24 0x00000000004adea0 in common_call_trampoline (regs=0x7fffec636890,
code=0x40244e34 "\270\001", m=, vt=0x0, vtable_slot=0x0) at
mini-trampolines.c:808
#25 0x0000000040000289 in ?? ()
#26 0x0000000000a35cc5 in ?? ()
#27 0x0000000040244e34 in ?? ()
#28 0x0000000000a35cc5 in ?? ()
#29 0x00007fffec6369d0 in ?? ()
#30 0x00007fffec636890 in ?? ()
#31 0x00007fffec637698 in ?? ()
#32 0x00007fffec67a188 in ?? ()
#33 0x00007fffec67a1a0 in ?? ()
#34 0x00007fffec636af0 in ?? ()
#35 0x0000000000000003 in ?? ()
#36 0x00007fffec6369d0 in ?? ()
#37 0x00007fffec636a60 in ?? ()
#38 0x0000000000000001 in ?? ()
#39 0x00007fffec67a188 in ?? ()
#40 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7fffedae7780 (LWP 2226)):
#0 0x00007fffecccd324 in __lll_lock_wait () from /lib64/libpthread.so.0
#1 0x00007fffeccc8684 in _L_lock_1091 () from /lib64/libpthread.so.0
#2 0x00007fffeccc84f6 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00007fffed8f6dcc in _dl_open () from /lib64/ld-linux-x86-64.so.2
#4 0x00007fffec842530 in do_dlopen () from /lib64/libc.so.6
#5 0x00007fffed8f2e86 in _dl_catch_error () from
/lib64/ld-linux-x86-64.so.2
#6 0x00007fffec8425e5 in dlerror_run () from /lib64/libc.so.6
#7 0x00007fffec8426d7 in __libc_dlopen_mode () from /lib64/libc.so.6
#8 0x00007fffec81d2e5 in init () from /lib64/libc.so.6
#9 0x00007fffecccbd03 in pthread_once () from /lib64/libpthread.so.0
#10 0x00007fffec81d43c in backtrace () from /lib64/libc.so.6
#11 0x00000000004ac025 in mono_handle_native_sigsegv (signal=<optimized
out>, ctx=<optimized out>, info=<optimized out>) at mini-exceptions.c:2309
#12 <signal handler called>
#13 0x00007fffec75e875 in raise () from /lib64/libc.so.6
#14 0x00007fffec75fe51 in abort () from /lib64/libc.so.6
#15 0x000000000064528a in monoeg_log_default_handler (log_domain=0x0,
log_level=G_LOG_LEVEL_ERROR, message=0x17b4f20 "suspend_thread suspend
took 200 ms, which is more than the allowed 200 ms", unused_data=0x0) at
goutput.c:233
#16 0x0000000000645077 in monoeg_g_logv (log_domain=0x0,
log_level=G_LOG_LEVEL_ERROR, format=0x7015d8 "suspend_thread suspend took
%d ms, which is more than the allowed %d ms", args=0x7fffffffce48) at
goutput.c:113
#17 0x000000000064512d in monoeg_g_log (log_domain=0x0,
log_level=G_LOG_LEVEL_ERROR, format=0x7015d8 "suspend_thread suspend took
%d ms, which is more than the allowed %d ms") at goutput.c:123
#18 0x000000000063a13f in mono_threads_wait_pending_operations () at
mono-threads.c:238
#19 0x000000000063a8cd in suspend_sync (interrupt_kernel=1,
tid=140737159329536) at mono-threads.c:877
#20 suspend_sync_nolock (interrupt_kernel=1, id=140737159329536) at
mono-threads.c:892
#21 mono_thread_info_safe_suspend_and_run (id=140737159329536,
interrupt_kernel=interrupt_kernel at entry=1,
callback=callback at entry=0x58d5c0 <abort_thread_critical>,
user_data=user_data at entry=0x7fffffffd3d0) at mono-threads.c:935
#22 0x0000000000591a86 in abort_thread_internal
(thread=thread at entry=0x7fffec6e0230,
install_async_abort=install_async_abort at entry=1, can_raise_exception=1) at
threads.c:4728
#23 0x0000000000591b29 in mono_thread_internal_stop
(thread=0x7fffec6e0230) at threads.c:2385
#24 0x00000000005b123e in mono_gc_cleanup () at gc.c:842
#25 0x00000000005aab8e in mono_runtime_cleanup
(domain=domain at entry=0x9d9e00) at appdomain.c:356
#26 0x0000000000426c8b in mini_cleanup (domain=0x9d9e00) at
mini-runtime.c:4017
#27 0x000000000047fac6 in mono_main (argc=11, argv=<optimized out>) at
driver.c:2115
#28 0x0000000000424c68 in mono_main_with_options (argv=0x7fffffffd688,
argc=11) at main.c:20
#29 main (argc=<optimized out>, argv=<optimized out>) at main.c:53
‹ ‹ ‹
Glover E. George
Computer Scientist
Information Technology Laboratory
US Army Engineer Research and Development Center
Vicksburg, MS 39180
601-634-4730
On 6/2/16, 7:34 AM, "mono-devel-list-bounces at lists.ximian.com on behalf of
Burkhard Linke" <mono-devel-list-bounces at lists.ximian.com on behalf of
blinke at CeBiTec.Uni-Bielefeld.DE> wrote:
>Hi,
>
>any updates on this? The bug affects the latest stable packages in the
>official xamarin repository, and nightly builds or building from source
>are not options.
>
>Regards,
>Burkhard
>
>On 05/19/2016 04:30 PM, Burkhard Linke wrote:
>> Hi,
>>
>> On 04/29/2016 04:12 PM, Rodrigo Kumpera wrote:
>>> This looks like a shutdown bug in mono.
>>>
>>> Do you have a reliable way to reproduce it?
>>> How loaded are the machines running your workload?
>>
>> We have encountered the same(?) bug on our compute cluster.
>> Applications process data, write output files, but do not terminate.
>>
>> (gdb) info threads
>> Id Target Id Frame
>> 6 Thread 0x2b1f83200700 (LWP 63141) "mono"
>> pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> 5 Thread 0x2b1f84cf3700 (LWP 63142) "Finalizer" sem_wait () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
>> 4 Thread 0x2b1f87ee1700 (LWP 63143) "mono"
>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>> 3 Thread 0x2b1f8c81d700 (LWP 63148) "Timer-Scheduler"
>> pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> 2 Thread 0x2b1fe1133700 (LWP 63248) "mono"
>> pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> * 1 Thread 0x2b1f81c98580 (LWP 63140) "mono"
>> pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> (gdb) thread apply all bt
>>
>> Thread 6 (Thread 0x2b1f83200700 (LWP 63141)):
>> #0 pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> #1 0x00000000005f9aec in ?? ()
>> #2 0x00002b1f8259b182 in start_thread (arg=0x2b1f83200700) at
>> pthread_create.c:312
>> #3 0x00002b1f828ab47d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> Thread 5 (Thread 0x2b1f84cf3700 (LWP 63142)):
>> #0 sem_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85
>> #1 0x000000000061de28 in mono_sem_wait ()
>> #2 0x00000000005a2076 in ?? ()
>> #3 0x00000000005843d3 in ?? ()
>> #4 0x0000000000624666 in ?? ()
>> #5 0x00002b1f8259b182 in start_thread (arg=0x2b1f84cf3700) at
>> pthread_create.c:312
>> #6 0x00002b1f828ab47d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> Thread 4 (Thread 0x2b1f87ee1700 (LWP 63143)):
>> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>> #1 0x00002b1f867ce29c in cl_thread_wait_for_thread_condition () from
>> /usr/lib/gridengine-drmaa/lib/libdrmaa.so
>> #2 0x00002b1f867ce6d3 in cl_thread_wait_for_event () from
>> /usr/lib/gridengine-drmaa/lib/libdrmaa.so
>> #3 0x00002b1f867b297f in ?? () from
>> /usr/lib/gridengine-drmaa/lib/libdrmaa.so
>> #4 0x00002b1f8259b182 in start_thread (arg=0x2b1f87ee1700) at
>> pthread_create.c:312
>> #5 0x00002b1f828ab47d in clone () at
>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
>>
>> Thread 3 (Thread 0x2b1f8c81d700 (LWP 63148)):
>> #0 pthread_cond_wait@@GLIBC_2.3.2 () at
>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>> #1 0x00000000005fef47 in ?? ()
>> #2 0x000000000061101b in ?? ()
>> #3 0x000000000058415e in ?? ()
>> #4 0x0000000000585309 in ?? ()
>> #5 0x0000000041806ecd in ?? ()
>> #6 0x00002b1f90004990 in ?? ()
>> #7 0xffffffffffffffff in ?? ()
>> #8 0x7fffffffffffffff in ?? ()
>> #9 0x00002b1f82e1b1b0 in ?? ()
>> #10 0xffffffffffffffff in ?? ()
>> #11 0x00002b1f90004880 in ?? ()
>> #12 0x0000000041806e4a in ?? ()
>> #13 0x00002b1f8c81c780 in ?? ()
>> #14 0x00002b1f8c81c6f0 in ?? ()
>> /build/buildd/gdb-7.7.1/gdb/dwarf2-frame.c:692: internal-error:
>> Unknown CFI encountered.
>> A problem internal to GDB has been detected,
>> further debugging may prove unreliable.
>> Quit this debugging session? (y or n)
>>
>> (The gbd crash might or might not be part of the problem).
>>
>> OS is Ubuntu 14.04, with mono from the xamarin repositories:
>> # mono --version
>> Mono JIT compiler version 4.2.3 (Stable 4.2.3.4/832de4b Wed Mar 16
>> 13:19:08 UTC 2016)
>> Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors.
>> Blockedwww.mono-project.comBlocked
>> TLS: __thread
>> SIGSEGV: altstack
>> Notifications: epoll
>> Architecture: amd64
>> Disabled: none
>> Misc: softdebug
>> LLVM: supported, not enabled.
>> GC: sgen
>>
>> The process is still running if you need further debugging
>> information. The problem does not affect all instance, but about 20%.
>> It is thus cannot be reproduced reliably.
>>
>> Regards,
>> Burkhard
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> Blockedhttp://lists.ximian.com/mailman/listinfo/mono-devel-listBlocked
>
>_______________________________________________
>Mono-devel-list mailing list
>Mono-devel-list at lists.ximian.com
>Blockedhttp://lists.ximian.com/mailman/listinfo/mono-devel-listBlocked
>
More information about the Mono-devel-list
mailing list