BIND 10 #213: Change hard-coded process startups to configuration-driven
BIND 10 Development
do-not-reply at isc.org
Thu Oct 20 03:57:07 UTC 2011
#213: Change hard-coded process startups to configuration-driven
-------------------------------------+-------------------------------------
Reporter: shane | Owner: jinmei
Type: | Status: reviewing
enhancement | Milestone:
Priority: major | Sprint-20111025
Component: Boss | Resolution:
of BIND | Sensitive: 0
Keywords: | Sub-Project: Core
Defect Severity: N/A | Estimated Difficulty: 9
Feature Depending on Ticket: | Total Hours:
Add Hours to Ticket: |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
I'll first provide comments on the component module (and its tests).
(In order to not disappoint you due to the amount of comment:-) In
general the design looks sensible to me. But since this is a newly
introduced concept, I encountered many points to discuss.
'''Regressions'''
- This branch seems to break system tests (maybe related to the next one)
- When I stopped the system via bindctl 'Boss shutdown', the bind10
process raised an exception:
{{{
Traceback (most recent call last):
File "bind10", line 952, in <module>
main()
File "bind10", line 947, in main
boss_of_bind.shutdown()
File "bind10", line 646, in shutdown
component.pid())
File
"/Users/jinmei/src/isc/git/bind10-213/src/lib/python/isc/bind10/component.py",
line 232, in pid
return self.__creator.pid()
AttributeError: 'NoneType' object has no attribute 'pid'
}}}
and b10-msgq process remained alive.
'''Design/General issues'''
- Do we allow multiple instances (processes) of the same component?
Like multiple auth processes for multiple cores? If we do, can we
handle that scenario in this framework?
- Don't we need an interval (and/or allow having an interval) before
restarting a component? (and should the interval also be
configured?). This is probably a TODO thing, which is okay for now,
but just checking.
- I wouldn't consider Auth/Resolver/CmdCtl "needed" components. For
example, if the system is intended to be DHCP only, we don't need
either auth or resolver.
- I'd avoid using magic numbers/values directly in the source code as
much as possible: e.g. 10 seconds, some well-known command names
like 'start', 'stop', so that we can unify the place where they are
defined and can be easily changed at once (when necessary) and also
so that the intent of the magic values is clearer.
- I'd keep this module independent from the knowledge of which
component is special for the boss, and let it focus on the generic
framework. In that sense, it's awkward to see classes like
SockCreator, Msgq, etc and the mapping dictionary of 'special' be
defined in this module. It seems to me to be more reasonable if
they are defined in the boss module (or even independently from the
module implementation itself, e.g. in a separate configuration file
or something) and passed to this module as kind of generic
parameters (see also the comment about 'special' in Configurator
below).
'''Component class'''
- An object of this class is a sort of finite state machine, but since
its transition is represented via changes to multiple mutable
attributes (__running (which can be True or False), __dead) it's
difficult to follow at least to me (I needed to draw a state
transition diagram to understand the code and be sure about its
soundness). I'd suggest making this point more explicit by, e.g.,
introducing a single attribute representing the current state. (see
bullets below for more specifics)
- Whether or not adopting the idea of explicit state transition, a
diagram in the documentation would still help for others to read and
understand the code. It would look something like:
{{{
start_intl fail
OR failed()
start() /---------->Failed----*----->Dead
Init----------->Running<----+----/(call
(call failed_intl)
start_intl)
^ | *: if kind == core
start()| |stop() or kind == needed and
| v it fails too soon
Stopped +: otherwise
(call stop_intl)
}}}
- What if stop_internal() raises an exception?
- The notion of "dead" is not documented.
- failed() doesn't check if it's called with __dead being True while
start() does this check. While it may not be a bug because if
__dead is true __running must be False, in order to understand that
you need to follow how and where these values can be changed. If we
had a single attribute of e.g., __state, it would be immediately
clear that the check of "__state != RUNNING" covers "__state ==
DEAD". (You may or may not agree with this idea itself, but at
least it was the case for me that it took time to understand this
implementation was okay due to the existence of multiple
attributes).
- It seems to make sense to allow derived class to override some of
the methods. But I suspect the requirement to the overridden
methods should be more clarified in terms of which part (attribute)
of the base class they can modify/ignore/refer to, or even which
method can be overridden:
- Can running() be overridden? If it can, and if the overridden
version ignores __running, behaviors that depend on the result of
running() may not meet the assumption of the implementation (and
may cause disruption).
- if a derived class overrides start_internal() and in its version
it doesn't set _procinfo, then pid() won't work unless pid() is
also overridden.
- there may be some more similar cases that I'm missing
- A related point to the previous general issue, but not specific to
that: strange thing will happen if pid() is called before _procinfo
is set (e.g., at the "Init" state). Also, what should happen if
this method is called without the underlying process? Further, this
is actually not specific to PID, but also generally applicable to
_procinfo.
- On the specific point of the relationship between start_internal()
and _procinfo, I guess it's better to request derived method not
modify base attributes and instead return an ProcessInfo object as a
contract. The start() method will then set _procinfo to the
returned value, and we can make sure _procinfo is always valid as
long as start() succeeds.
- Shouldn't xxx_internal better be named like "protected" methods,
e.g. _start_internal()?
'''SockCreator class'''
- stop_internal: as long as the guideline is held __creator should
never be None when this method is called (in other words should this
ever happen it means a bug). So I guess raising an exception is
probably better.
'''CfgMgr class, etc'''
- isn't it better (at least in terms of conciseness) to pass 'address'
to the base class constructor rather than setting it by themselves?
that is, instead of
{{{
self._address = 'ConfigManager'
}}}
do this:
{{{
Component.__init__(self, process, boss, kind, 'ConfigManager')
}}}
'''Configurator class'''
- The syntax and semantics of the 'configuration' (argument to
startup() and reconfigure()) are not clear enough in the
documentation. I needed to look into the implementation and test
cases to understand how it should be and how it works. This should
be documented somewhere (probably part of the class doc string),
more explicitly. It would cover the following points:
- 'configuration' is a dictionary, whose keys are the identifiers of
components (often derived from the program invoked for the
component)
- each value of a dictionary entry is another dictionary, which
gives the configuration for the component. The (currently
available) keys are: 'special', 'process', 'kind', 'address',
and 'param'. The expected type of each key value is explained
with other restrictions (e.g., 'kind' can only be one of the
well-known kinds). For those configuration parameters that can be
omitted, the default values are shown. And explain how these
values are used.
- (A related point) Isn't it okay to not perform syntax check against
the given parameters? What if an unknown parameter is given? What
if a parameter value is of an unexpected type? Or is this assumed
to be checked at the caller side? (if so, it should be documented
so).
- Configurator.running doesn't have doc string.
- The notion of 'special' is confusing, at least to me. This
essentially specifies a specific derived Component class (if
explicitly specified). So I'd rather simply call it something like
'component_class', specify the actual class Component class name
(e.g. 'SockCreator'), and use some reflection technique to
instantiate an object of the specified class.
- _run_plan(): Like the configuration, I'd like to see doc string that
explains more details of 'plan'.
- _run_plan(): s/beforehead/beforehand/?
{{{
Run a plan, created beforehead by _build_plan.
}}}
- _run_plan(): this doesn't seem to provide the strong exception
guarantee, i.e. self._components can have an incomplete update when
an exception is raised. Is that okay? (for example, _old_config
and _components will be inconsistent once this happens).
- A related comment: since _old_config and _components are inter
related (and have one-to-one relationships if I understand it
correctly), I'd rather manage them in a combined way, e.g.
{{{
_components = { 'msgq': (comonent_obj, componet_config), ... }
}}}
That way we can be more sure, when reading the code, about the
consistency between them, and it will be easier to ensure the
consistency at least per component basis.
- The exception message seems to be too broad:
{{{
raise NotImplementedError('Changing configuration
of' +
' a running component is
' +
'not yet supported.
Remove' +
' and re-add ' + cname +
'to get the same
effect')
}}}
because it only checks a subset of known configuration parameters
(e.g. even if 'address' is modified it doesn't complain)
- reconfigure() docstring: I'd like to see more desc, like the fact
that currently running components that are not specified in
'configuration' will be stopped.
- build_plan: using "params" as a local variable while also having
"'params'" in the configuration is a bit confusing. I'd rename the
former to something else.
- is it okay to allow Configurator.reconfigure() to even stop core
components? It seems to contradict the idea that the system cannot
even start if such components fail to start.
- what should happen if we do Configurator.shutdown() and try to
start it again? Is it expected to be an okay behavior? Unexpected
but not prohibited? Or should be prohibited?
'''component_test.py'''
=== ComponentTests ===
- __check_dead: what about calling failed()? (also see above for the
comments about the class)
- __check_dead: 'failed' instead of 'stopped'? ('dead' is a result of
'failed' not 'stop')
{{{
# Surely it can't be stopped again
}}}
- __check_dead: should also check it's not None?
{{{
self.assertNotEqual(0, self._exitcode)
}}}
=== ConfiguratorTest ===
- Is it intentional to test these two? (same for the shutdown case.
also note it's inconsistent with the other two cases)
{{{
self.assertTrue(configurator._running)
self.assertTrue(configurator.running())
}}}
=== main ===
- I don't know about this, but referring to other similar cases it
doesn't seem to be necessary:
{{{
isc.log.init("bind10") # FIXME Should this be needed?
}}}
--
Ticket URL: <http://bind10.isc.org/ticket/213#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list