BIND 10 #2198: make sure InterprocessSyncLocker is thread safe

Tue Sep 18 10:30:19 UTC 2012

#2198: make sure InterprocessSyncLocker is thread safe
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  muks
  jinmei                             |                Status:  reviewing
                       Type:         |             Milestone:
  defect                             |  Sprint-20120918
                   Priority:         |            Resolution:
  medium                             |             Sensitive:  0
                  Component:         |           Sub-Project:  Core
  logging                            |  Estimated Difficulty:  5
                   Keywords:         |           Total Hours:  0
            Defect Severity:  N/A    |
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => muks

Comment:

 Replying to [comment:7 muks]:
 > > First, the branch #2202 introduces a wrapped lock (and a thread, but
 it shouldn't be needed here). I think we should not be dealing directly
 with pthreads through the code, both because of exception safety and error
 handling and because of portability. Is there any reason why you didn't
 use the lock from there?
 >
 > I didn't know that there's a new lock class introduced in the #2202
 branch. :) That is the first answer for why I didn't use that lock. For
 the RAII style lock, #2202's locker seems relevant. Ignoring that the
 branch is still in review, we still can't use this implementation here as
 the interface provided by that lock class is insufficient. It's not
 possible to unlock the mutex before its destruction (i.e., before that
 basic block is exited). I have not read more into #2202.

 Looking at it again, I think the SyncMapMutex is more or less the same as
 the
 Locker from #2202. I don't think we want to have two implementations of
 the
 same thing, we have more than enough code already. Could it be reused at
 least
 for this mutex?

 For the rest, well, the interface of #2202 mutex is minimalistic, since I
 didn't need more. But it could indeed be extended. From a longer
 perspective, I
 think it is better to have one complete interface than two half-complete.

 > > Furthermore, the code seems to be doing no error checking whatsoever.
 Most of the pthread functions can return error codes and they should be
 handled. Also, the new or insertion to the map can throw. If that happens,
 the mutex stays locked, causing a deadlock the next time anyone tries to
 acquire it.
 >
 > I agree with the use of a RAII-style lock. A simple wrapper around a
 pthread mutex has been introduced. All pthread calls are now checked. Even
 then, I don't suppose our code can catch such exceptions and continue in
 case the map throws in this codepath.

 Well, it may or may not crash on these. It should be at least possible to
 do
 some kind of graceful shutdown or recovery by dropping the current query
 and
 trying again. Our policy says the classes should not get to invalid state
 on
 exception.

 Which brings me to these two parts of code:
 {{{#!c++
 data = new SyncMapData;
 }}}

 {{{#!c++
 sync_map.erase(it);

 int ret = pthread_mutex_destroy(&data->second);
 if (ret != 0) {
     isc_throw(isc::InvalidOperation,
               "Error destroying InterprocessSyncFile mutex: "
               << strerror(ret));
 }

 delete data;
 }}}

 In both cases, it can leak if the pthread function returns error. The
 second
 one could be solved by deleting data even before the erase. The first one
 could
 be done with something like auto_ptr.

 However, these are all patches for fragile design I think. I'd like to
 propose
 an alternative approach:
  * Use the mutex from #2202. That one tries to be very paranoid about
 errors
    and be exception safe. Also, it reuses current code and makes future
 porting
    to windows easier.
  * Have one global mutex for the map. This can use the locker on stack in
 the
    usual way. You unlock it sooner than you exit the method sometimes, but
 I
    don't think it makes a big difference on how long it is held and if it
 did,
    there can be an inner block inside the method to invoke the destructor
    sooner.
  * Each task has a mutex (wrapped one, from #2202) allocated dynamically.
 The
    `InterprocessSyncFile` holds a shared pointer to it (so more than one
 can have
    the same one) and the map (which is protected by the above mutex)
 contains a
    weak_ptr to the mutex. This way unused mutexes get cleaned up
 automatically.
  * An `InterprocessSyncFile` contains a scoped_ptr which holds a Locker
 when the
    corresponding mutex is locked.

 This has the advantage that whenever anything is lost due to exiting a
 method
 or exception or something like that, everything gets cleared and unlocked
 automatically. Also, it could become slightly more readable, there'd be
 less
 error checking.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2198#comment:8>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development