BIND 10 #2244: remove ddns component, but boss still keeps trying to start it

BIND 10 Development do-not-reply at isc.org
Fri Oct 5 05:38:41 UTC 2012


#2244: remove ddns component, but boss still keeps trying to start it
-------------------------------------+-------------------------------------
                   Reporter:  jreed  |                 Owner:  jinmei
                       Type:         |                Status:  accepted
  defect                             |             Milestone:
                   Priority:         |  Sprint-20121009
  medium                             |            Resolution:
                  Component:  Boss   |             Sensitive:  0
  of BIND                            |           Sub-Project:  Core
                   Keywords:         |  Estimated Difficulty:  6
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 trac2244 is ready for review.

 First off, I'd argue the subject is misleading.  After investigating
 it, I realized it primarily because b10-ddns repeatedly terminates
 and was then restarted.  "remove ddns" didn't work as expected in such
 specific situation.  If you remove ddns when it works fine, it should
 be cleanly stopped as expected.

 As far as I can see, what happened is this:

 b10-ddns repeatedly fails and then restarts.  When it fails, its
 internal component state is set to "FAILED", and it's pushed in
 the restart schedule queue of the main bind10 process.  If the
 "remove b10-ddns" command takes place between the failure and restart,
 the configurator class effectively ignores this command because it
 includes the removed component in the planned task list only when the
 component's state is "running":
 {{{#!python
                 if component.running():
                     plan.append({
                         'command': STOP_CMD,
                         'component': component,
                         'name': cname
                     })
 }}}

 And, even if the specified component was removed from the
 configurator, the restart queue of the bind10 process does not consult
 the consistency with the underlying configuration, so the restart
 would take place anyway:
 {{{#!python
     def restart_processes(self):
 ...
         for component in self.components_to_restart:
             if not component.restart(now):
 }}}

 So the fix is two-fold:

 - Make sure that the configurator honors the remove command as long as
   the state is running or "failed"
 - In bind10's restart_processes(), skip the restart if a component has
   been removed from the configuration

 The branch implements this idea with a few additional cleanups (see
 commit logs).

 Frankly, I didn't like this fix very much - "if the state is xxx then
 do something" type of code is generally quite fragile, and this fix is
 to add yet another such logic.  But to fix the fundamental issue
 cleanly, I guess we need some design level overhaul of the component
 framework.  I wish we eventually have time for that (the current
 implementation seems to be too fragile in general, and is difficult to
 maintain due to this logic and due to the fact that various modules
 can change the state), but that would be far beyond the scope of this
 ticket.  So, I reluctantly chose the bandaid fix.

 This is the proposed changelog entry:
 {{{
 486.    [bug]           jinmei
         The bind10 process now terminates a component (subprocess) by the
         "config remove Boss/components" bindctl command even if the
         process crashes immediately before the command is sent to bind10.
         Previously this led to an inconsistent state between the
         configuration and an internal component list of bind10, and bind10
         kept trying to restart the component.  A known specific case of
         this problem is that b10-ddns could keep failing (due to lack of
         dependency modules) and the administrator couldn't stop the
         restart via bindctl.
         (Trac #2244, git TBD)
 }}}

-- 
Ticket URL: <http://bind10.isc.org/ticket/2244#comment:4>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list