BIND 10 #2202: introduce a lock for the data source client in auth

Fri Sep 28 23:06:15 UTC 2012

#2202: introduce a lock for the data source client in auth
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20121009
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  b10-auth                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  5
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
  background zone loading            |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Responding to some earlier points first:

 Replying to [comment:10 vorner]:

 > > - Do we really need the recursive mode, at least right now?
 >
 > I don't know. Probably not now, but we may need it soon or later. And as
 the
 > extra work is really small (just a condition in the constructor changing
 the
 > attributes), I thought I just add it for completeness, so the interface
 won't
 > need an update soon.

 If we don't see sooner need, I'd rather skip it for now, partly
 because of the basic YAGNI principle (even if the amount is small,
 more code increases maintenance cost and risk of having bugs, and
 decreases readability).  Also, perhaps it's a personal opinion, but my
 general understanding and experiences suggest recursive mode locking
 is quite error prone and should generally be avoided.  If we share
 this view, it's even safer to prohibit the use of it by the interface
 from the beginning.

 Even if we really want to keep compatibility in case we ever want to
 allow the recursive mode, we could at least ensure source level
 compatibility if the revised interface uses a parameter with the
 default value like the current one.

 Replying to [comment:20 vorner]:

 > > - It's (now) probably better to avoid 'using namespace std' because
 > >   this space is getting larger with C++ 0x11, increasing the risk of
 > >   name collisions.
 >
 > I don't know, we may want to explicitly ask the compiler for the old
 standard
 > by some `--std=iso-cpp` or whatever it is. Then it should not give us
 more
 > things in the namespace. But I'm not really against using more concrete
 use
 > statements.

 I'd avoid relying on specific compiler options, but I see it'd be
 beyond the scope of this ticket.  I realized the code was changed as
 suggested, which is fine of course, and even if it were left
 untouched for now, that would have been okay for this ticket.  Maybe
 we should discuss this at the biweekly call.

 > > - Mutex constructor: may not be a big deal, but you should be able to
 > >   avoid using auto_ptr if you move mutex_init inside the Impl class
 > >   constructor.
 >
 > This goes against the way I think about the impl class. I consider it
 being a
 > data-only, so such complicated initialization doesn't belong there.
 Also,
 > having the initialization there and everything else outside seems
 strange and
 > inconsistent. The overhead for the `auto_ptr` should be small (and it's
 during
 > initialization only).

 Hmm, at least that's not against the sense of pimpl at all.  My
 understanding of the point of pimpl is to hide all detailed
 definitions from the public interface to maximize binary
 compatibility, minimize unnecessary build overhead (as header files
 contain minimal info/code), and even more flexibility of changing
 internal details without worrying about breaking anything.

 Also, my concern wasn't the performance overhead.  It's more about
 readability/complexity by the use of auto_ptr and get()/release()
 trick.

 But, in this case it's a minor concern and probably a matter of
 taste.  I don't insist my suggestion.

 Now, here's the remaining of my review comments:

 '''general'''

 - About performance impact: here are some query_bench results, using
   root zone in-memory, comparing 6e3d1a5 and f2408e0 (branch point).
   - with real query examples: 37061.27 vs 37689.95 (-1.67%)
   - with crafted nxdomain only (i.e. lightweight) case:
     72550.51 vs 76248.69 (-4.85%)

 The 1.67% drop in the average case is probably acceptable, but frankly
 I'd be concerned about the 5%-ish worse case overhead, especially
 because the lock should be unnecessary in the vast majority of cases.
 In future versions we should probably think about making the
 performance sensitive code lock-free, but for now, I think we should
 accept the overhead and complete the feature.

 - Maybe we should document somewhere why we introduce an in-house
   wrapper interface for threads rather than using something like boost
   classes, because it's against our general policy of "don't try to
   reinvent wheels".

 '''lock.h'''

 - "or even destructors" is now incorrect (see the next bullet)
 {{{#!cpp
 /// constructors or even destructors in this class can throw. Allocation
 errors
 }}}
 - Mutex destructor: it's now exception free.  Same for ~Locker().

 '''thread.h'''

 - The `Thread` constructor: it may not be obvious that `main` will be
   internally copied so a state inside the original object may not be
   shared (if that's the intended way to "return" value).  I'd note
   that in the documentation.
 {{{#!cpp
     /// If you need to pass parameters to body, or return some result, you
     /// may just want to use boost::bind or alike to store them within the
     /// body functor.
 }}}

 '''thread.cc'''

 - I'd avoid hardcoded constant:
 {{{#!cpp
         waiting_(2),
 }}}
 - the variable naming policy for `Thread::Impl` class members is
   inconsistent in terms of whether to add trailing underscores.
 - I'd provide some text for `exception_text_` in this case:
 {{{#!cpp
         } catch (...) {
             Mutex::Locker locker(impl->mutex);
             impl->exception_ = true;
         }
 }}}
 - `Thread` class constructor: did you intentionally use "zero" in '0K'?
 {{{#!cpp
         case 0: // All 0K
 }}}
 - I'd add comments about which member variables are subject to the
   protection of `mutex`.
 - `Thread::run()`: this cast can (should) be static_cast:
 {{{#!cpp
         Impl* impl = reinterpret_cast<Impl*>(impl_raw);
 }}}
 - `Thread::run()`: do we need to protect `exception_` and
 `exception_text_`?
 {{{#!cpp
         } catch (const exception& e) {
             Mutex::Locker locker(impl->mutex);
             impl->exception_ = true;
             impl->exception_text_ = e.what();
         } catch (...) {
             Mutex::Locker locker(impl->mutex);
             impl->exception_ = true;
         }
 }}}
  they are only referred to by the other thread after pthread_join, so
  the lock doesn't seem to be necessary.
 - `Thread::wait()`: we can use boost::scoped_ptr instead of auto_ptr
   here:
 {{{#!cpp
     auto_ptr<UncaughtException> ex;
 }}}
   And I generally prefer scoped_ptr over auto_ptr in the sense of
   preferring weaker primitives.  But in this case it's quite minor.

 - `Thread::wait()`: mostly a theoretical concern, but since this
   method can throw, it can make the object a strange state.  For
   example, consider a very rare case of new'ing `UncaughtException`
   throws.  Then even if pthread_join() was called impl_ will be still
   non NULL.  But I have no good idea to address this concern in a
   concise and clean way.  Maybe it should be very minor that we can
   ignore it.  In that case I'd add some note about it at least.

 '''lock_unittest'''

 - Shouldn't this be 2012?  Same for thread_unittest.
 {{{#!cpp
 // Copyright (C) 2010  Internet Systems Consortium, Inc. ("ISC")
 }}}
 - I suggest using "<util/thread/lock.h>", etc, here:
 {{{#!cpp
 #include "../lock.h"
 }}}
   as test directories could sometimes be moved to a different place
   due to reasons like resolving originally unexpected dependencies.  I
   actually encountered this for datasrc/memory/tests.

 - `DISABLED_destroyLocked`: forgetting the FIXME issue, this test
   isn't correct any more anyway because ~Mutex() now doesn't throw.
   If we want to test the condition, we'd need to change it to a death
   test (and I'm not sure if it would work as a workaround for the
   FIXME issue.  Maybe you can try it).
 - `DISABLED_destroyLocked` has another exception safety issue.  I'd
   change it as follows to avoid that:
 {{{#!cpp
     boost::scoped_ptr<Mutex> mutex(new Mutex);
     new Mutex::Locker(*mutex);
     EXPECT_DEATH(mutex.reset(), "");
 }}}
 - `DISABLED_destroyLocked`: this comment doesn't make perfect sense:
 {{{#!cpp
     // Note: This leaks the locker. But this is a test for development aid
     // exception. The exception won't happen in normal build anyway and
 seeing
     // it means there's a bug. And we can't delete the locker now, since
 it
     // would access uninitialized memory.
 }}}
   the exception (or death) should normally happen except for some
   deviant, non-compliant systems because pthread_mutex_destroy()
   should fail (recall the discussion in the previous comment).  Also
   "uninitialized" doesn't sound very like.  Shouldn't it rather be
   called something like "invalidated memory"?
 - `DISABLED_destroyLocked`: for compliant systems regarding
   pthread_mutex_destroy(), the leak is real, but I have to admit it's
   difficult to avoid (and in the "normal" case the process that has
   the leak immediately dies anyway).  So I'm okay with leaving the
   leak open with comments.
 - swarm test: I was impressed your imagination:-) but
   performStrangeOperation seems to be way too much complicated for the
   purpose.  After reading the comment and code 10 minutes, I found I
   still don't fully understand the code, much less whether it does
   what the document says.  I could spend for a few more hours to fully
   understand it with confidence, but before that please let me ask: do
   we really need this level of complexity?  I suspect, in practice,
   a much simpler test case is sufficient.
 - swarm test: independent from the test case details, when we test
   locks, there's a risk of having deadlock due to a bug.  That would
   be bad for auto-running tests.  Should we add some safety net like a
   deadline timer (if it expires it terminates the entire process
   unconditionally).
 - swarm test: not necessarily wrong, but it looks confusing to use the
   same name variable `value` for different purposes. I'd rename one of
   them:
 {{{#!cpp
 const unsigned long long value = 2000;
 ...
         unsigned long long value = array[position % length];
 }}}
 - swarm test: technically, we cannot use a variable length array
   without requiring C99:
 {{{#!cpp
     long long unsigned array[length];
 }}}
   although I admit there are other places in our code that implicitly
   relies on C99.  I' also not sure if it's valid even without C99 when
   length is a const variable, but the safest way would be to use
   std::vector.
 {{{#!cpp
     std::vector<long long unsigned> array_storage(length);
     long long unsigned* array = &array_storage[0];
 }}}

 '''thread_unittest.cc'''

 - detached test: this type of new-delete pair makes me nervous because
   such usage is often not exception safe.  If the purpose of
   `doSomething()` is really just `doSomething()`, I'll simply pass
   some pointer and have `doSomething()` ignore or play with it:
 {{{#!cpp
 void
 doSomething(int* x) {
     *x = 42;                  // may cause race, but it doesn't matter
 }

 TEST(ThreadTest, detached) {
     int x;
     for (size_t i = 0; i < detached_iterations; ++i) {
         Thread thread(boost::bind(&doSomething, &x));
     }
 }
 }}}

 - I think we should also test the case when the thread throws
   std::exception variants.  After all, that's the more likely case in
   practice.

 '''auth_srv.cc'''
 - `AuthSrvImpl::processNormalQuery`: as already commented, I suspect
   the lock should be held for a longer period, like this:
 {{{#!cpp
     isc::util::thread::Mutex::Locker locker(mutex_);
     try {
         const ConstQuestionPtr question = *message.beginQuestion();
 }}}

 '''auth_srv.h'''

 - comment style is different than the expected one per guideline
 {{{#!cpp
     /**
      * \brief Return a mutex for the client lists.
      *
 }}}
   etc.

 '''auth_srv_unittest.cc'''
 - mutex test: I don't understand this comment.  How would the test
   fail in the non-debug case?
 {{{#!cpp
     // TODO: Once we have non-debug build, this one will not work.
     // Skip then.
 }}}

-- 
Ticket URL: <http://bind10.isc.org/ticket/2202#comment:21>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development