Berkeley DB infinite sleep on Solaris [#3245]

Alex Kiernan alexk at demon.net
Mon Jan 8 21:36:50 UTC 2001


Keith Bostic <bostic at sleepycat.com> writes:

> Hi, my name is Keith Bostic and I'm with Sleepycat Software.
> I'll own your Support Request for now.
> 
> > From: "Kiernan, Alex" <alexk at demon.net>
> >
> > Running inn (from CVS) on Solaris we were seeing regular infinite sleeps
> > during expire. The Berkeley DB code was sleeping on _lwp_cond_wait due to
> > apparent lost wakeup from _lwp_cond_signal, this patch appears to fix it -
> > or at least we've run for a week without it dieing (plus a couple of other
> > Solaris sillies). In fact I can't think why this wouldn't affect every
> > platform.
> 
> We'd already made the first change (fixing the #define for
> pthread_mutex_destroy), so no problem there.  The second
> change (checking for EINTR) also seems harmless.  Did you
> actually catch it returning EINTR?
> 

Once, and looking at the time in the logs it was during a debug
session, so I'm not convinced it wasn't some strange effect from that.

> I don't understand the third change, though.  Why switch the
> order of the XXX_mutex_unlock and XXX_cond_signal() calls?
> 

Thats the real fix (the EINTR was my first attempt :-) What happens
with the calls in the order lock, unlock, signal is that the lock,
wait, unlock sequence checks the condition variable, finds it locked
and executes cond_wait. cond_wait unlocks the mutex and sleeps.

The cond_signal portion however sends the signal in between the
condition variable test and the cond_wait (which because its not
locking the mutex it can), the cond_signal doesn't queue and its lost
forever.

The Solaris man page (_lwp_cond_signal) is explicit:

    "Both functions should be called under the protection of  the
     same  LWP  mutex  lock  that  is used with the LWP condition
     variable being  signaled. Otherwise, the condition  variable
     may  be  signalled between the test of the associated condi-
     tion and blocking in _lwp_cond_wait().  This  can  cause  an
     infinite wait."

-- 
Alex Kiernan, Principal Engineer, Development, Thus PLC



More information about the inn-workers mailing list