[Mono-dev] WebRequest timeouts after ThreadPool exhaustion

Alan alan.mcgovern at gmail.com
Fri Feb 12 15:59:05 UTC 2016


Ah, sorry, i meant this commit
https://github.com/mono/mono/commit/578a2327b8a216a2b59e9fc430ae5d77af2616bd

On 12 February 2016 at 14:58, Alexander Köplinger <
alexander.koeplinger at xamarin.com> wrote:

> Sorry, turns out I made an error when testing on master. I can actually
> see the request timeout there too, so it's not fixed on master.
>
> I filed a bug with your repro code:
> https://bugzilla.xamarin.com/show_bug.cgi?id=38715
>
> - Alex
>
>
>
> 2016-02-12 15:13 GMT+01:00 Alexander Köplinger <
> alexander.koeplinger at xamarin.com>:
>
>> Happens on mono-4.3.2-branch/9f44a62 as well...
>>
>> Alan: the PR you linked doesn't seem to be related, did you have another
>> PR in mind?
>>
>> - Alex
>>
>> 2016-02-12 15:07 GMT+01:00 Alexander Köplinger <
>> alexander.koeplinger at xamarin.com>:
>>
>>> I tried the testcase on master and couldn't reproduce there. I could,
>>> however, reproduce it on the 4.3.2 build I had installed
>>> (mono-4.3.2-branch/0df254d). I'm downloading a later 4.3.2 build right now
>>> to see if it still happens there, if it does then we need to backport
>>> something from master.
>>>
>>> - Alex
>>>
>>> 2016-02-12 15:04 GMT+01:00 Seif Attar <iam at seifattar.net>:
>>>
>>>> Great, I'll try it out. Is the console app in that gist enough for a
>>>> test case?
>>>>
>>>> @Mike @Jonathan we've faced bugs with previous versions of libc and
>>>> networking before, also some kernel issues. Update to latest if you can. I
>>>> can't reproduce with 3.12. I get timeouts but then it recovers when there
>>>> are available threads unlike with 4.x
>>>>
>>>> On Fri, 12 Feb 2016 13:50 Alan <alan.mcgovern at gmail.com> wrote:
>>>>
>>>>> It's also worth pointing out that the threadpool implementation has
>>>>> changed completely since mono 4.0. I believe the new threadpool
>>>>> implementation shipped as the default starting with mono 4.2 (or
>>>>> thereabouts). If you're on older Monos the odds are high whatever issue you
>>>>> have has been fixed already.
>>>>>
>>>>> Alan
>>>>>
>>>>> On 12 February 2016 at 13:48, Alan <alan.mcgovern at gmail.com> wrote:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> We have just fixed some issues in that area. They are expected to
>>>>>> ship as part of a the next mono 4.3+ release. If you want to test them out
>>>>>> in the meantime you could try building mono with this PR [0] and see if it
>>>>>> resolves all your issues. If it doesn't then a testcase and bug report on
>>>>>> http://bugzilla.xamarin.com would be awesome!
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> [0] https://github.com/mono/mono/pull/2576
>>>>>>
>>>>>> On 12 February 2016 at 12:33, Mike Horsley <mhorsley at vqcomms.com>
>>>>>> wrote:
>>>>>>
>>>>>>> we’ve also seen instances of webrequest timeouts that don’t recover
>>>>>>> (but curl worked) as well but haven’t got to the bottom of it yet.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> we ran your test app and see the same issue with mono 3.12 on
>>>>>>> OpenSUSE 13.2 (kernel 3.16.7, libc 2.19).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> we’ll add the diagnostics from your test app into ours and this will
>>>>>>> tell us whether we are seeing the same issue with the threadpool.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> regards
>>>>>>>
>>>>>>> Mike
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* mono-devel-list-bounces at lists.ximian.com [mailto:
>>>>>>> mono-devel-list-bounces at lists.ximian.com] *On Behalf Of *Seif Attar
>>>>>>> *Sent:* Friday, February 12, 2016 10:43 AM
>>>>>>> *To:* mono-devel-list at lists.ximian.com
>>>>>>> *Subject:* [Mono-dev] WebRequest timeouts after ThreadPool
>>>>>>> exhaustion
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are running mono 4.2.2 in prod and the VM that the process was
>>>>>>> running on had SAN failure and after storage recovered, all outgoing
>>>>>>> requests were timing out, even though doing a curl was working fine.
>>>>>>>
>>>>>>> Theory was that thread pool starved and somehow things didn't
>>>>>>> recover properly.
>>>>>>>
>>>>>>> Managed to reproduce with:
>>>>>>>
>>>>>>>
>>>>>>> https://gist.githubusercontent.com/seif/ae2defbfa5afa26fa8f6/raw/bef351eded56c882658a4bad60fa4818ad32d629/timeout.cs
>>>>>>>
>>>>>>> Even after ThreadPool finishes the tasks and has available threads,
>>>>>>> it never recovers and subsequent webrequests all timeout.
>>>>>>>
>>>>>>> Running on mono 4.2.2, linux kernel 4.2.0-27 and libc 2.21.
>>>>>>>
>>>>>>> Output from gdb is:
>>>>>>>
>>>>>>> (gdb) info threads
>>>>>>>   Id   Target Id         Frame
>>>>>>>   13   Thread 0x7fca903ff700 (LWP 27944) "cli" pthread_cond_wait@@GLIBC_2.3.2
>>>>>>> () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>>>>>>   12   Thread 0x7fca90b34700 (LWP 27945) "Finalizer"
>>>>>>> 0x00007fca911d70c9 in futex_abstimed_wait (cancel=true, private=<optimised
>>>>>>> out>, abstime=0x0, expected=0, futex=0x948ae0) at sem_waitcommon.c:42
>>>>>>>   11   Thread 0x7fca8dfff700 (LWP 27946) "Timer-Scheduler"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   10   Thread 0x7fca91ba1700 (LWP 27947) "cli" __clock_nanosleep
>>>>>>> (clock_id=1, flags=1, req=0x7fca91ba0dc0, rem=0x7fca90f134aa
>>>>>>> <__clock_nanosleep+138>) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:48
>>>>>>>   9    Thread 0x7fca8ddfe700 (LWP 27948) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   8    Thread 0x7fca8dbfd700 (LWP 27949) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   7    Thread 0x7fca8d7fb700 (LWP 27951) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   6    Thread 0x7fca8d3f9700 (LWP 27953) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   5    Thread 0x7fca8d1f8700 (LWP 27954) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   4    Thread 0x7fca8cff7700 (LWP 27955) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   3    Thread 0x7fca8cdf6700 (LWP 27956) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>>   2    Thread 0x7fca8cbf5700 (LWP 27957) "Threadpool work"
>>>>>>> pthread_cond_timedwait@@GLIBC_2.3.2 () at
>>>>>>> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
>>>>>>> * 1    Thread 0x7fca91cf0780 (LWP 27942) "cli" 0x00007fca911d7e0d in
>>>>>>> read () at ../sysdeps/unix/syscall-template.S:81
>>>>>>>
>>>>>>> Could not reproduce on mono 3.12, but it happens on 4.0.3 and 4.2.2
>>>>>>>
>>>>>>> Is this a known issue? any workarounds? Tried setting MONO_DNS=1 to
>>>>>>> use the clr dns, but that didn't help.
>>>>>>>
>>>>>>> Let me know if there is any more info I need to provide.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Seif
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mono-devel-list mailing list
>>>>>>> Mono-devel-list at lists.ximian.com
>>>>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Mono-devel-list mailing list
>>>> Mono-devel-list at lists.ximian.com
>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ximian.com/pipermail/mono-devel-list/attachments/20160212/b265ffe1/attachment-0001.html>


More information about the Mono-devel-list mailing list