BIND 10 #2244: remove ddns component, but boss still keeps trying to start it
BIND 10 Development
do-not-reply at isc.org
Fri Oct 5 05:38:41 UTC 2012
#2244: remove ddns component, but boss still keeps trying to start it
-------------------------------------+-------------------------------------
Reporter: jreed | Owner: jinmei
Type: | Status: accepted
defect | Milestone:
Priority: | Sprint-20121009
medium | Resolution:
Component: Boss | Sensitive: 0
of BIND | Sub-Project: Core
Keywords: | Estimated Difficulty: 6
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
trac2244 is ready for review.
First off, I'd argue the subject is misleading. After investigating
it, I realized it primarily because b10-ddns repeatedly terminates
and was then restarted. "remove ddns" didn't work as expected in such
specific situation. If you remove ddns when it works fine, it should
be cleanly stopped as expected.
As far as I can see, what happened is this:
b10-ddns repeatedly fails and then restarts. When it fails, its
internal component state is set to "FAILED", and it's pushed in
the restart schedule queue of the main bind10 process. If the
"remove b10-ddns" command takes place between the failure and restart,
the configurator class effectively ignores this command because it
includes the removed component in the planned task list only when the
component's state is "running":
{{{#!python
if component.running():
plan.append({
'command': STOP_CMD,
'component': component,
'name': cname
})
}}}
And, even if the specified component was removed from the
configurator, the restart queue of the bind10 process does not consult
the consistency with the underlying configuration, so the restart
would take place anyway:
{{{#!python
def restart_processes(self):
...
for component in self.components_to_restart:
if not component.restart(now):
}}}
So the fix is two-fold:
- Make sure that the configurator honors the remove command as long as
the state is running or "failed"
- In bind10's restart_processes(), skip the restart if a component has
been removed from the configuration
The branch implements this idea with a few additional cleanups (see
commit logs).
Frankly, I didn't like this fix very much - "if the state is xxx then
do something" type of code is generally quite fragile, and this fix is
to add yet another such logic. But to fix the fundamental issue
cleanly, I guess we need some design level overhaul of the component
framework. I wish we eventually have time for that (the current
implementation seems to be too fragile in general, and is difficult to
maintain due to this logic and due to the fact that various modules
can change the state), but that would be far beyond the scope of this
ticket. So, I reluctantly chose the bandaid fix.
This is the proposed changelog entry:
{{{
486. [bug] jinmei
The bind10 process now terminates a component (subprocess) by the
"config remove Boss/components" bindctl command even if the
process crashes immediately before the command is sent to bind10.
Previously this led to an inconsistent state between the
configuration and an internal component list of bind10, and bind10
kept trying to restart the component. A known specific case of
this problem is that b10-ddns could keep failing (due to lack of
dependency modules) and the administrator couldn't stop the
restart via bindctl.
(Trac #2244, git TBD)
}}}
--
Ticket URL: <http://bind10.isc.org/ticket/2244#comment:4>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list