[Mono-bugs] [Bug 472732] mod-mono spawns many process and fails to respond when using AutoRestartMode

Wed Feb 11 09:32:37 EST 2009

https://bugzilla.novell.com/show_bug.cgi?id=472732

User mhabersack at novell.com added comment
https://bugzilla.novell.com/show_bug.cgi?id=472732#c14

Marek Habersack <mhabersack at novell.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P1 - Urgent                 |P3 - Medium

--- Comment #14 from Marek Habersack <mhabersack at novell.com>  2009-02-11 07:32:36 MST ---
Thanks for the test case, I will look at it next week. In the meantime, let me
provide more in-depth analysis of what, in my opinion, is going on.

You need to understand the way apache2+mod_mono+backend work regarding the
requests and restarts. Apache MPM uses either a process (prefork) or a
process+threads (worker) model. Each instance of the apache daemon, with either
MPM, loads its own copy of mod_mono and is handed connections by the main
apache process listening on the configured ports. When a request gets to
mod_mono, it checks whether the appropriate backend(s) are running and, if not,
starts them and then sends them the request. The request is sent synchronously
and the backend answer is received in the same fashion - during that time this
particular instance of httpd process (in the prefork MPM) or thread inside a
process (the worker MPM) is not able to service any other requests. If mod_mono
notices that a restart criterion (like time elapsed or requests server) is met,
it restarts the backend (by killing it and starting at the next request) in the
_synchronous_ way again - that is, it can happen only after the backend has
finished processing the previous request and the httpd instance needs to wait
for the shutdown request to finish. Depending on your application, either of
those can take a long time and if the backend locks up or blocks for some
reason, it can be a very long time. At the same time each of the spawned
backends consumes an unknown (you can check that by looking at the
/proc/BACKENDPID/fd directory) number of file descriptors - and the descriptors
are consumed on behalf of the httpd worker/prefork instance which spawned the
backend. Now, if you restart your backend every 100 requests, then each time it
happens you might have an unknown number of backend waiting to finish, which
means apache needs to spawn new processes/threads and those, in turn, need to
spawn new backends - all of them under the same process which spawned the
backends waiting to finish. Even if you have a limit of 32k file descriptors,
this is a limit per entire group of processes led by a specific httpd instance,
it's not a per-process limit for your backend. There's also the question of
whether your _soft_ limit was raised along with the _hard_ limit (soft limits
are those which the application itself can adjust within the boundaries set by
the hard limits) - if not, then raising the hard limit serves no purpose.
After this description it should be clear that the situation you're facing is
not a bug in mod_mono (although mod_mono could handle this more robustly - and
I'm working on it atm) but rather a configuration isue arising from a
miscalculation of the resource limits - the error message you're seing (the one
regarding process handles) demonstrates that.
You mentioned that your application survived 4 days without problems after you
increased the fd limit, that confirms my analysis above should be correct. I
would suggest doing the following:

- discovering what is the peak fd usage of your application
- raising that times 100 (or to the limit of your operating system's maximum
value minus, say, 1000 file descriptors for administrative usage) and making
that the hard+soft limit of apache.
- not restarting the application every 100 requests, but increasing that value
to 1000 - it should really be enough even with big memory usage.
- if the above restart limit is not possible, disable it altogether and restart
apache from a cronjob at some time during the week's most peaceful period.
- if you have long-running requests, increase the number of concurrent requests
served by mod_mono (or disable the limit)

You might want to graph the memory usage and requests served to get a picture
of your application pattern is and tune the above settings accordingly.

As for the compacting GC - it is not a panacea of all the memory problems. The
current GC is well capable of freeing up memory given that it has enough time
to do it. To give it the time needed, allocate more resources to apache and
throw a bigger number of concurrent mono instances. Also, you might want to
consider using a newer mono (even the 2.4 prerelease) - we've observed better
memory usage in that version for ASP.NET application. Another possibility to
improve GC performance is to enable the parallel marker code, for that you
would have to recompile mono configuring it with the --with-parallel-mark flag
passed to the configure script.

hope that helps

-- 
Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.