[Mono-dev] signal.c cross-thread access

Wed Feb 4 13:42:24 EST 2009

Before I forget, you should also write additional tests for this
behavior in mcs/class/Mono.Posix/Test/Mono.Unix/UnixSignalTest.cs.

An example test would be e.g.

	[Test]
	public void TestNestedInvocations ()
	{
		UnixSignal s = new UnixSignal (Signum.SIGINT);
		Thread a = CreateWaitSignalThread (s, 1000)
		Thread b = CreateWaitSignalThread (s, 500);
		a.Start (); b.Start ();
		a.Join ();  b.Join ();
	}

i.e. "nest" the WaitOne() calls, thus ensuring that WaitOne() doesn't
block other invocations of WaitOne() on the same UnixSignal instance.

On Mon, 2009-02-02 at 08:46 +0000, tim.jenks at realtimeworlds.com wrote:
> Couple of questions though:
> 
> > 1. Somehow find out which is preferable for "normal" UnixSignal
> usage:
> > 
> >   a. One pipe/UnixSignal instance for the lifetime of the
> UnixSignal.
> >   b. One pipe/UnixSignal instance for the lifetime of a WaitAny()
> call.
> > 
> > I'm assuming "normal" is that a UnixSignal will last for the life of
> > the application; that is, UnixSignal instances won't be short lived.
> > Meanwhile, I'd expect WaitAny() to be comparatively infrequent, but
> > have less clues about how it's used in practice.
> 
> We're using SIGRT* raised a bunch of times a second for the update
> tick of a game server, I'm talking 20-30Hz. For our use case I think
> a) is preferable, though I've implemented this as b) as suggested just
> now. 
> 
> What are your thoughts?

How much free time do you have? :-)

In an ideal world, we would write it both ways, then profile your app to
see how much of a performance impact (1.b) makes over (1.a).  If it's a
negligible change, then (1.b) should be used.

If necessary, we could add flags to UnixSignal to select between these
two behaviors, but before doing that I'd like to know if it's necessary.

> > You might need to change wait_for_any() to actually perform error
> > checking on the read(h->read_fd, &c, 1) call, as this could now
> > error if multiple threads are calling WaitAny().
> 
> What kind of error conditions are you expecting in this instance?

EINTR, for one. :-)

(Which, come to think of it, I should have handled in the first
place...)

> And finally, here's some comments on mph_int*:
> 
> Obviously the signal handler can be executing concurrently with the
> other functions in signal.c that are guarded by the mutex; to keep a
> consistent view of pipecnt in the signal handler I have used atomics:

Which is why pipecnt is incremented *after* the pipe(2) call completes
and h->read_fd & h->write_fd are set (as you do).  This way, if a signal
occurs while default_handler() is executing, it doesn't matter -- the
"new" pipe won't be used during that default_handler() invocation.

Though there could be a race if
Mono_Unix_UnixSignal_WaitAny()/setup_pipes() and default_handler() run
concurrently with each other, and _somehow_ (1) default_handler() read
pipecnt before setup_pipes() added the pipe, and (2) the same
default_handler() invocation was being processed when setup_pipes()
returned to _WaitAny() and wait_for_any() was invoked.  In this case
we'd have (from default_handler()'s perspective) pipecnt+1 threads
blocking on the pipe, and thus we'd send out one too few write(2) calls.

This seems _highly_ improbable to me, though.

The other solution is to increment pipecnt *before* the pipe(2) call,
and check in default_handler() that write_fd is valid before writing to
it (which is also currently done).  This is still subject to a race,
though, resulting in "too many" writes to write_fd, which will cause the
next call to WaitOne() to exit prematurely.

I think the current behavior is preferable -- better to signal too few
than too many (as the next default_handler() invocation for that signal
should wake up the right number next time).

> 1. There does not appear to be an equivalent to InterlockedIncrement
> in the glib functions (returning the value after atomic inc), so there
> is code in setup_pipes that would race if it were re-entrant. It isn't
> re-entrant due to the global mutex being held but this needs tidying
> to avoid confusion. Effectively, I need to: if
> (InterlockedIncrement(pipecnt) == 1) { setup pipes }. 
> 
> 2. Conversely, I needed an atomic decrement. Having looked at the glib
> functions I could only find g_atomic_int_dec_and_test in 2.4. Is this
> OK, or is there an a g_atomic_int_dec ? This is to achieve: if
> (InterlockedDecrement(pipecnt) == 0) { teardown pipes }.
> g_atomic_int_dec_and_test clearly does the job, but it's a little
> inconsistent with InterlockedDecrement with respect to the mph_int*
> macros.

It looks like you already went with (2), which seems fine to me as I
don't see an equivalent to InterlockedIncrement in glib either.

I'm also not sure what you mean by "[setup_pipes] needs tidying up to
avoid confusion."  Which confusion?

> Index: signal.c
> ===================================================================
> --- signal.c    (revision 123183)
> +++ signal.c    (working copy)
> @@ -101,16 +101,19 @@
>  #ifdef WAPI_ATOMIC_ASM
>         #define mph_int_get(p)     InterlockedExchangeAdd ((p), 0)
>         #define mph_int_inc(p)     InterlockedIncrement ((p))
> +       #define mph_int_dec_test(p)     InterlockedDecrement ((p)) == 0

This should be wrapped in parens: (InterlockedDecrement ((p)) == 0)

> @@ -128,7 +131,8 @@
>  static void
>  default_handler (int signum)
>  {
> -       int i;
> +       int i,j;
> +       int pipecounter;

Please narrow down variable declarations to their minimum scope.  In
this case, instead of declaring `j' and `pipecounter' here...

>         for (i = 0; i < NUM_SIGNALS; ++i) {
>                 int fd;
>                 signal_info* h = &signals [i];
> @@ -138,7 +142,9 @@
>                 fd = mph_int_get (&h->write_fd);
>                 if (fd > 0) {

...declare them here.

>                         char c = signum;
> -                       write (fd, &c, 1);
> +                       pipecounter = mph_int_get (&h->pipecnt);
> +                       for (j = 0; j < pipecounter; ++j)
> +                               write (fd, &c, 1);

We should check for EINTR, thus:

	int r;
	do { r = write (fd, &c, 1); } while (r == -1 && errno == EINTR);

> @@ -182,7 +188,7 @@
>                                 break;
>                         }
>                         else {
> -                               h->have_handler = 1;
> +                               h->state = 1;

We should either stick with have_handler (annoying, I know, as I first
suggested `state'), or use a HAVE_HANDLER #define.  'have_handler'
implies a boolean value, so 1/0 are fine, but `state' makes no such
implications, so HAVE_HANDLER would be better.

If we do follow HAVE_HANDLER, we should state that `state' is a
bitfield, and behave appropriately.

> @@ -195,13 +201,14 @@
>         }
>  
>         if (h && have_handler) {
> -               h->have_handler = 1;
> +               h->state = 1;
>                 h->handler      = handler;
>         }
>  
>         if (h) {
>                 mph_int_set (&h->count, h->count, 0);
>                 mph_int_set (&h->signum, h->signum, sig);
> +               mph_int_set (&h->pipecnt, h->pipecnt, 0);
>         }
>  
>         pthread_mutex_unlock (&signals_mutex);
> @@ -239,12 +246,12 @@
>                 errno = EINVAL;
>         else {
>                 /* last UnixSignal -- we can unregister */
> -               if (h->have_handler && count_handlers (h->signum) == 1) {
> +               if (h->state && count_handlers (h->signum) == 1) {

so `if ((h->state & HAVE_HANDLER) && count_handlers (h->signum) == 1)'
if state were a bitfield...

> @@ -348,9 +362,19 @@
>         FD_ZERO (&read_fds);
>  
>         r = setup_pipes (signals, count, &read_fds, &max_fd);
> +
> +       pthread_mutex_unlock (&signals_mutex);

We should check for errors here; pthread_mutex_unlock(3p) documents
EAGAIN as an error that we could handle:

	if( r == 0 ) {
		while ((r = pthread_mutex_unlock (&signals_mutex)) == EAGAIN) {
			/* keep trying */
		}
	}

The other uses of pthread_mutex_unlock() in this file should be fixed
for this as well.

>         if (r == 0) {
>                 r = wait_for_any (signals, count, max_fd, &read_fds, timeout);
>         }
> +
> +       mr = pthread_mutex_lock (&signals_mutex);
> +       if (mr != 0) {
> +               errno = mr;
> +               return -1;
> +       }

Similarly here, pthread_mutex_lock() could return EAGAIN, so we may want
to try again.  Or it could return EDEADLK, which means that we already
own the thread, and thus can continue anyway.

Overall, it looks good.

If you make the above changes, add some tests, and explicitly state that
this contribution is under MIT/X11 this should be good to check in.

Thanks,
 - Jon