BIND 10 #2400: notify auth::DataSrcClientsMgr when the builder thread dies

Wed Oct 24 17:35:58 UTC 2012

#2400: notify auth::DataSrcClientsMgr when the builder thread dies
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:
  jinmei                             |                Status:  new
                       Type:         |             Milestone:  Next-Sprint-
  defect                             |  Proposed
                   Priority:         |            Resolution:
  medium                             |             Sensitive:  0
                  Component:         |           Sub-Project:  DNS
  b10-auth                           |  Estimated Difficulty:  0
                   Keywords:         |           Total Hours:  0
            Defect Severity:  N/A    |
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:1 vorner]:

 This point first:

 > > - the manager checks variable every time it sends a new command to the
 > >   builder or allow the application to access the client lists via the
 > >   holder.  If the value has been changed to false, it takes an
 > >   appropriate action (in practice, it would throw a fatal exception to
 > >   terminate the process anyway).
 >
 > I'm not sure about this. So, on Monday an update to a zone happens and
 the
 > thread crashes during the time. But the server happily runs with the old
 > version of zone, because the server didn't exchange it. Than the next
 update
 > happens on for example Friday and then the server unexpectedly crashes
 during
 > an unrelated update, so the admin starts looking into what is wrong with
 the
 > Friday update.

 Yeah, I know this is suboptimal in terms of the notification timing.
 But in case you didn't notice it, note the "(every time) allow the
 application to access the client lists".  So, the main thread will
 notice it at the next query handling after the builder thread dies.
 It's still possible that there's been no query between Monday and
 Friday, so in theory the same worst case scenario could happen.  But
 in this case the data source is not really used, so it's not really
 incorrect in terms of externally visible behavior.

 > What I think could work is send the main thread a message over the msgq,
 for
 > example. Or signal it (though that's tricky). Or somehow notify it right
 away.

 I agree it's better if the main thread can notice failure of the
 builder thread without any effective delay, if it can be done easily.
 These two ways seem to be much more complicated than the idea I
 explained, however, so I'm not so sure if it's definitely better
 regarding pros and cons balance.

 > > - on unexpected termination, the builder sets it to false.  (this
 > >   could be protected by a mutex, but that's not absolutely necessary)
 >
 > Is there a reason why this should be safe not to protect it by mutex? If
 the
 > reason is that boolean is probably small, well, it doesn't sound like a
 very
 > good reason for me. I heard that even the `sig_atomic_t` is not safe to
 be used
 > this way.

 This is not really safe in its strictest sense.  But the race
 condition in this case is that the main thread may not immediately
 notice the change of the boolean.  Since we already made this
 compromise as discussed above (as long as we use this approach), it
 shouldn't be a big deal whether the main thread handles one more query
 than the ideal case before it notices the failure.

 BTW, when hinting omitting the lock, I was thinking it wouldn't be
 good if we need to acquire one more extra lock in every query handling
 only for a basically "should not happen" event.  But on second thought
 we can probably just use the same lock that currently protects the
 access to the client list map.

 > Hmm, I didn't notice yesterday during the review, but is `assert(false)`
 the
 > best way? Asserts can be turned off, this probably should have been
 `abort()`.
 > Should we fast-fix this somehow?

 As you know this is done.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2400#comment:2>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development