BIND 10 #213: Change hard-coded process startups to configuration-driven
BIND 10 Development
do-not-reply at isc.org
Fri Oct 28 19:17:51 UTC 2011
#213: Change hard-coded process startups to configuration-driven
-------------------------------------+-------------------------------------
Reporter: shane | Owner: vorner
Type: | Status: reviewing
enhancement | Milestone:
Priority: major | Sprint-20111108
Component: Boss | Resolution:
of BIND | Sensitive: 0
Keywords: | Sub-Project: Core
Defect Severity: N/A | Estimated Difficulty: 9
Feature Depending on Ticket: | Total Hours:
Add Hours to Ticket: |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:23 vorner]:
> Hello
>
> Replying to [comment:16 jinmei]:
> > - kill_started_processes(): from a quick look at the library
> > reference, os.kill(signal.SIGTERM) is less portable than
> > Poepen.kill(), which was used for the original code (especially if
> > we consider Windows support), and in that sense this is a
> > regression. The same comment applies to other cases where os.kill()
> > is used.
>
> The killing was moved into the components which handle them in concrete
way.
As for kill(), I'd use higher level abstraction than SIGKILL instead
of SIGTERM. to match the abstraction level of the python interface.
> > - read_bind10_config(): the method's doc string may need to be
> > updated, depending on whether this work also means we are going to
> > decide how/where a list of running components should be defined.
>
> I believe that this ticket explicitly requires it to be in boss (it
doesn't seem doable to search for the correct configuration everywhere).
Besides, configuration for a module should be something the module itself
needs for correct working. As it is not the job of each component to start
itself, it doesn't seem to be the best place to put the configuration into
each respective component. But if you think we should talk this over on
the dev list, it is possible.
I already start forgetting my original intent on this comment:-),
but I guess it's just that the doc string did now not seem to match
what the code does and they should be consistent, rather than what
we actually should do. Your argument rather seems to focus on the latter.
As for what we should do, I personally believe what a module should start
should be determined and defined by the module itself, not around in
the boss as I believe I already mentioned. But I'm okay with leaving
that discussion out of scope of this ticket.
BTW, I guess this comment shouldn't be affected (thus shouldn't be
removed) because of this branch:
{{{
# [...] Note that the logging
# configuration may override the "-v" switch set on the command
line.
}}}
> > - start_auth(), etc: While the comment above start_auth() seems to
> > suggest the use of them in tests, they now are not used in
> > tests (overkilling?). Also, as already stated, ideally I'd move
> > these outside the bind10 module.
>
> I'd like that too, in the long term. But as they use quite a lot from
boss internals, I think we should defer it.
Deferring it is fine.
> > - start_all_processes(): is it okay to remove this?
> > {{{
> > - # Everything after the main components can run as non-root.
> > - # TODO: this is only temporary - once the privileged socket
creator is
> > - # fully working, nothing else will run as root.
> > - if self.uid is not None:
> > - posix.setuid(self.uid)
> > }}}
>
> Not really, I missed that one. But it's unhappy in the current state, it
doesn't do what it should, as there's no right place. We should finish the
socket creator soon enough.
Hmm...I'm especially not happy about letting xfrin with the root
privilege as a result of this. I suspect it's unlikely that
the socketcreator work is completed and merged before the boss config
stuff, I would introduce some intermediate workaround to retain the
previous behavior on this point. This would also be one example case
where incremental upgrade should have been beneficial.
> > - stop_process(): I was not sure how we could remove this comment due
> > to this branch:
> > {{{
> > - # TODO: Some timeout to solve processes that don't want to
die would
> > - # help. We can even store it in the dict, it is used only as
a set
> > }}}
>
> Ah, that one was removed because it now belongs somewhere else. But it
got lost in the ether until it got there, so I put it there now.
I don't understand this. If you are referring to a post-review
change, please be more specific about the exact commit ID (but of
course the more fundamental issue is that this branch is too big, so
it's extremely difficult for a reviewer to keep truck of every small
change).
> > and this line (or the concept of "expected_shutdowns"):
> > {{{
> > - self.expected_shutdowns[process] = 1
> > }}}
> > (I'm not necessarily saying we can't. I simply didn't understand).
>
> The expected shutdowns are no longer needed. A component is told to shut
down, which puts it into a stopped state. A failed is called only on the
running components. So the component remembers itself instead of relying
on external dictionary.
On further look based on this comment, I was first afraid that
removing expected_shutdowns might introduce a bad side effect in
restart_processes() because it changed the condition of how the for
loop goes. I then noticed that dead_processes were effectively
removed (it's never updated), so we could probably remove this for
loop and dead_processes altogether.
Frankly, this branch has too many such things, and it made me
(as a reviewer) frustrated. In your brain as the main architect you
may have a perfect view of everything and can be sure which one is
safe, which one is now unnecessary, etc, but as a reviewer who is
suddenly shown the whole set of changes, it's mostly impossible to
have that higher level confidence. I could point out individual
things that may break other parts (and that's what I've done so far),
but I can never be sure about the big picture, and can never feel
safe.
This again leads to the possibility of incremental upgrade. I'll see
what we can do on this point later.
> > - component_shutdown(): if I read the code correctly, `__stopping` can
> > never be True in the context where this method can be called:
> > {{{
> > if self.__stopping:
> > return
> > }}}
> > If the intent is to prevent a duplicate call, we'll probably need a
> > dedicated attribute like "shutting_down", and set it to True in this
> > own method.
>
> I don't think it can be called twice currently and it probably wouldn't
hurt either. I don't know what I was thinking before, when I added it, so
I removed it.
This doesn't seem to be removed.
> > - reap_children(): I was not sure if it's safe to remove this:
> > {{{
> > - self.dead_processes[proc_info.pid] = proc_info
> > }}}
>
> The dead_processes were used for the delayed starts. We need to
reintroduce them sometime, but maybe in a completely different way. So
that's why they are removed from the code now.
See above for expected_shutdowns. It may be safe, but it's unclear
and probably incomplete, and made me so confused.
> > - I'm not sure "b10-xxx" is the best choice for the component name.
>
> That is because I use the name as the default value of process to start,
to save the user some typing when introducing a component. I'd like to
unify it with the address on the message queue somehow as well, so the
only thing needed to add a new component would be to type:
>
> {{{
> config add Boss/components b10-component
> }}}
>
> And be done.
Then please describe the intent somewhere (it's okay to defer it to
the user doc task).
> > - likewise, while "params" are defined as a list of strings,
>
> > - I can't find "priority" in the spec.
> > - dhcp related config is missing, too.
>
> What dhcp related config? I think the DHCP wasn't started before (and it
didn't really have a start_dhcp option yet). But if a user wants, he can
always add the component, right? There's no special need for DHCP. Or did
I overlook something?
Ah, I thought DHCP daemons were invoked by the boss process, if that's
not the case, forget the DHCP comment per se. But if we'd say "if a
user wants...", I'd also argue that it's the same for
xfrin/xfrout/auth, etc. (or is this perhaps an intermediate state so
we preserve the current operational practice and it will eventually
move outside the boss spec? if so, that's okay to me)
> > General comment: TestStartStopProcessesBob was completely removed and
> > TestBossComponents was completely newly introduced. Both are pretty
> > big, and the changes are not trivial, so I was not sure such changes
> > in tests ensure safe refactoring. [...]
>
> Well, it's simply because the old way of configuration no longer exists,
so there's no reason (or way) to test it. Instead of testing that if
there's a start_resolver, the resolver is started, we test that if there's
a component in configuration, that component is started (by passing it to
the configurator and by the configurator tests).
With all due respect, it might be "simple" in the architect's brain,
but not for someone suddenly introduced to the whole big changes.
This is actually another instance of whether we couldn't make it
incremental introduction. I'll re-consider it in that context later.
> > - TestBossComponents.test_correct_run: about this point, my
> > understanding is that we are expected to return an error answer in
such
> > cases:
> > {{{
> > # Is it OK to raise, or should it catch also and convert to
error
> > # answer?
> > self.assertRaises(Exception, bob.config_handler,
> > {'components': compconf})
> > }}}
> > See also the comment about config_handler() above.
>
> I was thinking it should be the job of whatever calls the handler to
catch the exceptions and convert them to error messages. But I probably
didn't have a reason to expect that to do so.
config_handler is a callback function, and the caller is
ModuleCCSession. Since it's beyond the module boundary, IMO it's
generally safer not to propagate arbitrary exceptions to the caller
side (even if re-raise is okay it's safer to catch them once and make
sure the re-raised exception is one of the expected exceptions for the
caller). Besides, in my understanding, ModuleCCSession doesn't
actually expect to get an exception on the call to the callback
handler.
--
Ticket URL: <http://bind10.isc.org/ticket/213#comment:27>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list