[Mono-dev] Mono hangs on shutdown when /dev/ttySx ports were opened.

Thu Sep 17 09:40:55 EDT 2009

We are checking that. However, using serial is not the only scenario
that this behaviour occurs, it's just the easiest to replicate. We are
now looking through some "mono --trace" outputs that contain no
SerialPort interaction at all and [probably] the same lockup had taken
place.

Mostly unrelated: printing Thread.Name from an aborted thread usually
produces some unicode garbage, although the number of printed
characters stays the same.

On Thu, Sep 17, 2009 at 2:43 PM, Zoltan Varga <vargaz at gmail.com> wrote:
> Hi,
>
>   The runtime tries to abort all threads and waits for them to terminate, so
> if a thread refuses
> to die for some reason, the runtime will hang. Its possible that the serial
> port code doesn't
> check for thread aborts/interruptions.
>
>               Zoltan
>
> On Thu, Sep 17, 2009 at 2:33 PM, Leszek Ciesielski <skolima at gmail.com>
> wrote:
>>
>> Oh! We found something with "mono --trace" that we missed before. It
>> seems that we are Thread.Abort() 'ing a thread thats inside
>> SerialPort.Read() (and through this and serial.c - in kernel mode) and
>> the abort gets ignored. However, on the managed side, everything
>> proceeds as though the thread was killed - until only unmanaged code
>> remains running - including the JITed rogue thread. I am checking now
>> this with a small test case and will send it along once I am able to
>> reproduce the problem.
>>
>> On Thu, Sep 17, 2009 at 1:49 PM, Leszek Ciesielski <skolima at gmail.com>
>> wrote:
>> > That's the
>> >
>> >> kill -3 PID prints:
>> >
>> >> "0" tid=0x0xb7d206f0 this=0x0x2fed8 thread handle 0x404 state: waiting
>> >> on 0x400 : Event owns ()
>> >
>> > result, nothing more is printed...
>> >
>> > On Thu, Sep 17, 2009 at 1:25 PM, Zoltan Varga <vargaz at gmail.com> wrote:
>> >> Hi,
>> >>
>> >>   My mistake. You should send a SIGQUIT signal.
>> >>
>> >>              Zoltan
>> >>
>> >> On Thu, Sep 17, 2009 at 12:58 PM, Leszek Ciesielski <skolima at gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> kill -SIGUSR1 PID prints
>> >>>
>> >>> User definied signal 1
>> >>>
>> >>> And Mono terminates. Does this suggest no managed threads were left
>> >>> (there are 10 or 11 while the application is running)? gdb native
>> >>> stack trace follows:
>> >>>
>> >>> 0xffffe430 in __kernel_vsyscall ()
>> >>> (gdb) thread apply all bt
>> >>>
>> >>> Thread 4 (Thread 0xb7573b90 (LWP 25150)):
>> >>> #0  0xffffe430 in __kernel_vsyscall ()
>> >>> #1  0xb7ee73f6 in nanosleep () from /lib/libpthread.so.0
>> >>> #2  0x081a91f8 in collection_thread (unused=0x0) at collection.c:34
>> >>> #3  0xb7ee01b5 in start_thread () from /lib/libpthread.so.0
>> >>> #4  0xb7e263be in clone () from /lib/libc.so.6
>> >>>
>> >>> Thread 3 (Thread 0xb754fb90 (LWP 25151)):
>> >>> #0  0xffffe430 in __kernel_vsyscall ()
>> >>> #1  0xb7ee5ef5 in sem_wait@@GLIBC_2.1 () from /lib/libpthread.so.0
>> >>> #2  0x0812eed9 in finalizer_thread (unused=0x0) at gc.c:1058
>> >>> #3  0x08153188 in start_wrapper (data=0x8305078) at threads.c:623
>> >>> #4  0x081c5d66 in thread_start_routine (args=0x82faaa4) at
>> >>> threads.c:286
>> >>> #5  0x081e5aa5 in GC_start_routine (arg=0x26f20) at
>> >>> pthread_support.c:1382
>> >>> #6  0xb7ee01b5 in start_thread () from /lib/libpthread.so.0
>> >>> #7  0xb7e263be in clone () from /lib/libc.so.6
>> >>>
>> >>> Thread 2 (Thread 0xb565ab90 (LWP 25339)):
>> >>> #0  0xb7efe3da in clock_gettime () from /lib/librt.so.1
>> >>> #1  0x081d5705 in mono_100ns_ticks () at mono-time.c:107
>> >>> #2  0xb568bf66 in ?? ()
>> >>> #3  0xb568bf23 in ?? ()
>> >>> #4  0xb568af80 in ?? ()
>> >>> #5  0xb7916ba0 in ?? ()
>> >>> #6  0x08110f14 in mono_runtime_delegate_invoke (delegate=0x1a6b712,
>> >>> params=0xb565a2e4, exc=0x0)
>> >>>    at object.c:2943
>> >>> #7  0x0815320f in start_wrapper (data=0x0) at threads.c:629
>> >>> #8  0x081c5d66 in thread_start_routine (args=0x82faff4) at
>> >>> threads.c:286
>> >>> #9  0x081e5aa5 in GC_start_routine (arg=0x2dffe0) at
>> >>> pthread_support.c:1382
>> >>> #10 0xb7ee01b5 in start_thread () from /lib/libpthread.so.0
>> >>> #11 0xb7e263be in clone () from /lib/libc.so.6
>> >>>
>> >>> Thread 1 (Thread 0xb7d206f0 (LWP 25117)):
>> >>> #0  0xffffe430 in __kernel_vsyscall ()
>> >>> #1  0xb7ee3c35 in pthread_cond_wait@@GLIBC_2.3.2 () from
>> >>> /lib/libpthread.so.0
>> >>> #2  0x081af0b1 in _wapi_handle_timedwait_signal_handle (handle=0x400,
>> >>> timeout=0x0, alertable=1,
>> >>>    poll=0) at handles.c:1605
>> >>> #3  0x081af1b7 in _wapi_handle_wait_signal (poll=0) at handles.c:1534
>> >>> #4  0x081cac2b in WaitForMultipleObjectsEx (numobjects=2,
>> >>> handles=0x8c0a900, waitall=1,
>> >>>    timeout=4294967295, alertable=0) at wait.c:723
>> >>> #5  0x081510b1 in wait_for_tids (wait=0x8c0a900, timeout=365) at
>> >>> threads.c:2443
>> >>> #6  0x0815488c in mono_thread_manage () at threads.c:2733
>> >>> #7  0x080b25cd in mono_main (argc=2, argv=0xbfafbdb4) at driver.c:1648
>> >>> #8  0x0805af21 in main (argc=Cannot access memory at address 0x80
>> >>> ) at main.c:34
>> >>> #0  0xffffe430 in __kernel_vsyscall ()
>> >>>
>> >>> Regards,
>> >>>
>> >>> skolima
>> >>>
>> >>> On Thu, Sep 17, 2009 at 12:25 PM, Zoltan Varga <vargaz at gmail.com>
>> >>> wrote:
>> >>> > Hi,
>> >>> >
>> >>> >   You can attach to the hung process with gdb and type
>> >>> > 'thread apply all bt' to get a native backtrace, and/or
>> >>> > send a SIGUSR1 signal to the process to print a manager backtrace.
>> >>> >
>> >>> >                Zoltan
>> >>> >
>> >>> > On Thu, Sep 17, 2009 at 12:15 PM, Leszek Ciesielski
>> >>> > <skolima at gmail.com>
>> >>> > wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>
>> >>> >> we have tried to isolate the problem for almost a month, the best
>> >>> >> we
>> >>> >> managed to get is a hardware configuration for our application that
>> >>> >> hangs on every exit - but this is with about 8MB of binaries,
>> >>> >> probably
>> >>> >> over 100k SLOC. What I am hoping for now are some gdb guidelines to
>> >>> >> pinpoint the problem.
>> >>> >>
>> >>> >> Regards
>> >>> >>
>> >>> >> On Thu, Sep 17, 2009 at 12:02 PM, Zoltan Varga <vargaz at gmail.com>
>> >>> >> wrote:
>> >>> >> > Hi,
>> >>> >> >
>> >>> >> >   Could you create some kind of test case to help us debug this
>> >>> >> > issue
>> >>> >> > ?
>> >>> >> >
>> >>> >> >             Zoltan
>> >>> >> >
>> >>> >> > On Thu, Sep 17, 2009 at 11:28 AM, Leszek Ciesielski
>> >>> >> > <skolima at gmail.com>
>> >>> >> > wrote:
>> >>> >> >>
>> >>> >> >> Hi,
>> >>> >> >>
>> >>> >> >> I am experiencing Mono hangup when my application should
>> >>> >> >> terminate.
>> >>> >> >> The application opens multiple serial ports, but the bug has
>> >>> >> >> also
>> >>> >> >> manifested when network sockets were hanging on reads or writes
>> >>> >> >> - it
>> >>> >> >> seems to be related to a pending I/O operation, asynchronous
>> >>> >> >> networking helps somewhat. Anyway, the managed code exits, Mono
>> >>> >> >> CPU
>> >>> >> >> usage jumps to 100%, /proc/PID/status shows 4 threads and the
>> >>> >> >> application never exits. kill -3 PID prints:
>> >>> >> >>
>> >>> >> >> "0" tid=0x0xb7d0f6f0 this=0x0x2fed8 thread handle 0x404 state:
>> >>> >> >> waiting
>> >>> >> >> on 0x400 : Event owns ()
>> >>> >> >>
>> >>> >> >> and that's all. What can I do to help debug this?
>> >>> >> >>
>> >>> >> >> BTW this happens on 1.9 (Debian and Gentoo) and 2.4.2.3 (Debian
>> >>> >> >> and
>> >>> >> >> OpenSuse) [so I'm pretty sure it's not distribution-specific],
>> >>> >> >> more
>> >>> >> >> often if the application uses System.Windows.Forms.
>> >>> >> >>
>> >>> >> >> Regards,
>> >>> >> >>
>> >>> >> >> Leszek 'skolima' Ciesielski
>> >>> >> >> _______________________________________________
>> >>> >> >> Mono-devel-list mailing list
>> >>> >> >> Mono-devel-list at lists.ximian.com
>> >>> >> >> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>
>> >>
>> >
>
>