BIND 10 #213: Change hard-coded process startups to configuration-driven
BIND 10 Development
do-not-reply at isc.org
Tue Oct 25 18:29:01 UTC 2011
#213: Change hard-coded process startups to configuration-driven
-------------------------------------+-------------------------------------
Reporter: shane | Owner: vorner
Type: | Status: reviewing
enhancement | Milestone:
Priority: major | Sprint-20111108
Component: Boss | Resolution:
of BIND | Sensitive: 0
Keywords: | Sub-Project: Core
Defect Severity: N/A | Estimated Difficulty: 9
Feature Depending on Ticket: | Total Hours:
Add Hours to Ticket: |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Some response:
> > - Do we allow multiple instances (processes) of the same component?
> > Like multiple auth processes for multiple cores? If we do, can we
> > handle that scenario in this framework?
>
> This framework should be able to handle that, provided their names are
different. I actually expected it to be used that way (or maybe having a
'count': 64 option for a component sometime in future, if copy-pasting
bunch of components is deemed uncomfortable).
>
> However, the rest of the system can't handle it yet (like we need them
to have different addresses on the message bus or something). Maybe we
should warn the user about it in config, that starting two auths won't do
what he wants.
It would be nice to document it somewhere. In any case actually
realizing it is far beyond the scope of this ticket.
> > - I wouldn't consider Auth/Resolver/CmdCtl "needed" components. For
> > example, if the system is intended to be DHCP only, we don't need
> > either auth or resolver.
>
> I'm not sure, maybe the "needed" name is a bit misleading. It says that
it should not start if these can't be started, but not bring the system
down if they crash later on.
>
> If someone uses boss to start dhcp, he would just remove auth and
resolver from configuration and have the dhcp part as needed.
Perhaps the point to consider is what should be specified as 'needed'
by default in bob.spec. If we see BIND 10 as a generic framework for
various kind of Internet servers (starting with DNS, then DHCP, and
perhaps even HTTP, etc), it would be more reasonable to begin with an
empty list of specific servers. If a user wants to use the framework
for DNS services, auth (and/or resolver) will then be specified as
'needed'. On the other hand, realistically speaking most people will
see BIND 10 as DNS software (at least for initial N years), so it
might be over generalization and just increase the configuration
overhead. Right now I have a strong opinion either way. Maybe one
option is to decide it at ./configure time, and make its default DNS
related servers. But in any case I'm okay with deferring this point.
I don't have a strong opinion about the naming of 'needed', btw.
> > - I'd keep this module independent from the knowledge of which
> > component is special for the boss, and let it focus on the generic
> > framework. [...]
>
> I put it to a different module.
Okay.
> > - An object of this class is a sort of finite state machine, [...]
>
> Yes, you're right. It happened in kind of evolutionary way, the
__running one was there first, then the __dead appeared later on and I
didn't think about it. This way it looks simpler. I also added the
diagram.
>
> > - What if stop_internal() raises an exception?
>
> Then we have a problem.
>
> Actually, the component is considered shut down at the time and the
exception is propagated. The idea behind this is, we can't really consider
it running, because it might be already stopped and if there's problem
stopping, if we try again (during system shutdown or sometime else), it
would fail again. This way, if it happens during real shutdown and the
process is still running, it will be at last killed. If it happens during
reconfiguration, I don't know. Any ideas what to do then?
On thinking about it more as being explicitly asked, I think we should
keep truck of the status of spawned processes more precisely. Right
now (both before or after this branch), it seems that we are not very
accurate on this point.
A child process can have the following states:
- dead (process doesn't exist)
- alive but not ready to run (in initialization)
- alive and running
- alive and shutting down (boss has sent a shutdown command)
- alive but hang (process exists but cannot do any active work and
cannot even receive a shutdown command)
We (at least partly) manage these states via BoB.processes and
BoB._componet_configurator(._components), but the relationship among
these doesn't seem to be well clarified. And, it causes some real bad
things:
- since we don't explicitly recognize the 'not ready' state, we have a
problem like #1271. We can (should) fix individual problems, but I
suspect it's a tip of iceberg.
- as far as I know we don't have any explicit way to detect the "hang"
state.
- we don't explicitly recognize the "shutting down" state, and once
the boss sends a shutdown command the boss basically forgets that
component (and cannot deal with the situation the process somehow
doesn't die)
My original question about stop_internal() is related to the last
point. Based on this observation, for this particular issue I believe
we should keep truck of the transition from "shutting down" to "dead"
more closely. For example, we don't immediately remove the component
on stop() it but maintain it in some "shutting down queue" and watch
the process. If it doesn't die for a certain amount of period the
boss will kill it more forcefully. (It's just an example sketch of
idea, rather than a concrete proposal).
In longer term, I believe we should clarify the above relationship,
then define, implement, and test the behavior based on the
clarification.
I think we should also check what other multi-process systems such as
postfix and xorp handle the issue of managing child processes.
All that said, this will be beyond the scope of this already-fat
task. After all, the pre-213 implementation isn't good in this sense,
so in the sense of porting the current behavior under a new framework
we don't have to solve it now. So, at the moment, I'm okay with just
leaving a comment that e.g. stop_process() is generally expected to be
exception free (for now) and the behavior is undefined if and when
that happens.
--
Ticket URL: <http://bind10.isc.org/ticket/213#comment:25>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list