BIND 10 #213: Change hard-coded process startups to configuration-driven

Fri Oct 28 19:17:51 UTC 2011

#213: Change hard-coded process startups to configuration-driven
-------------------------------------+-------------------------------------
                   Reporter:  shane  |                 Owner:  vorner
                       Type:         |                Status:  reviewing
  enhancement                        |             Milestone:
                   Priority:  major  |  Sprint-20111108
                  Component:  Boss   |            Resolution:
  of BIND                            |             Sensitive:  0
                   Keywords:         |           Sub-Project:  Core
            Defect Severity:  N/A    |  Estimated Difficulty:  9
Feature Depending on Ticket:         |           Total Hours:
        Add Hours to Ticket:         |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:23 vorner]:
 > Hello
 >
 > Replying to [comment:16 jinmei]:
 > > - kill_started_processes(): from a quick look at the library
 > >   reference, os.kill(signal.SIGTERM) is less portable than
 > >   Poepen.kill(), which was used for the original code (especially if
 > >   we consider Windows support), and in that sense this is a
 > >   regression.  The same comment applies to other cases where os.kill()
 > >   is used.
 >
 > The killing was moved into the components which handle them in concrete
 way.

 As for kill(), I'd use higher level abstraction than SIGKILL instead
 of SIGTERM. to match the abstraction level of the python interface.

 > > - read_bind10_config(): the method's doc string may need to be
 > >   updated, depending on whether this work also means we are going to
 > >   decide how/where a list of running components should be defined.
 >
 > I believe that this ticket explicitly requires it to be in boss (it
 doesn't seem doable to search for the correct configuration everywhere).
 Besides, configuration for a module should be something the module itself
 needs for correct working. As it is not the job of each component to start
 itself, it doesn't seem to be the best place to put the configuration into
 each respective component. But if you think we should talk this over on
 the dev list, it is possible.

 I already start forgetting my original intent on this comment:-),
 but I guess it's just that the doc string did now not seem to match
 what the code does and they should be consistent, rather than what
 we actually should do.  Your argument rather seems to focus on the latter.
 As for what we should do, I personally believe what a module should start
 should be determined and defined by the module itself, not around in
 the boss as I believe I already mentioned.  But I'm okay with leaving
 that discussion out of scope of this ticket.

 BTW, I guess this comment shouldn't be affected (thus shouldn't be
 removed) because of this branch:
 {{{
         # [...]  Note that the logging
         # configuration may override the "-v" switch set on the command
 line.
 }}}

 > > - start_auth(), etc: While the comment above start_auth() seems to
 > >   suggest the use of them in tests, they now are not used in
 > >   tests (overkilling?).  Also, as already stated, ideally I'd move
 > >   these outside the bind10 module.
 >
 > I'd like that too, in the long term. But as they use quite a lot from
 boss internals, I think we should defer it.

 Deferring it is fine.

 > > - start_all_processes(): is it okay to remove this?
 > > {{{
 > > -        # Everything after the main components can run as non-root.
 > > -        # TODO: this is only temporary - once the privileged socket
 creator is
 > > -        # fully working, nothing else will run as root.
 > > -        if self.uid is not None:
 > > -            posix.setuid(self.uid)
 > > }}}
 >
 > Not really, I missed that one. But it's unhappy in the current state, it
 doesn't do what it should, as there's no right place. We should finish the
 socket creator soon enough.

 Hmm...I'm especially not happy about letting xfrin with the root
 privilege as a result of this.  I suspect it's unlikely that
 the socketcreator work is completed and merged before the boss config
 stuff, I would introduce some intermediate workaround to retain the
 previous behavior on this point.  This would also be one example case
 where incremental upgrade should have been beneficial.

 > > - stop_process(): I was not sure how we could remove this comment due
 > >   to this branch:
 > > {{{
 > > -        # TODO: Some timeout to solve processes that don't want to
 die would
 > > -        # help. We can even store it in the dict, it is used only as
 a set
 > > }}}
 >
 > Ah, that one was removed because it now belongs somewhere else. But it
 got lost in the ether until it got there, so I put it there now.

 I don't understand this.   If you are referring to a post-review
 change, please be more specific about the exact commit ID (but of
 course the more fundamental issue is that this branch is too big, so
 it's extremely difficult for a reviewer to keep truck of every small
 change).

 > >   and this line (or the concept of "expected_shutdowns"):
 > > {{{
 > > -        self.expected_shutdowns[process] = 1
 > > }}}
 > >   (I'm not necessarily saying we can't.  I simply didn't understand).
 >
 > The expected shutdowns are no longer needed. A component is told to shut
 down, which puts it into a stopped state. A failed is called only on the
 running components. So the component remembers itself instead of relying
 on external dictionary.

 On further look based on this comment, I was first afraid that
 removing expected_shutdowns might introduce a bad side effect in
 restart_processes() because it changed the condition of how the for
 loop goes.  I then noticed that dead_processes were effectively
 removed (it's never updated), so we could probably remove this for
 loop and dead_processes altogether.

 Frankly, this branch has too many such things, and it made me
 (as a reviewer) frustrated.  In your brain as the main architect you
 may have a perfect view of everything and can be sure which one is
 safe, which one is now unnecessary, etc, but as a reviewer who is
 suddenly shown the whole set of changes, it's mostly impossible to
 have that higher level confidence.  I could point out individual
 things that may break other parts (and that's what I've done so far),
 but I can never be sure about the big picture, and can never feel
 safe.

 This again leads to the possibility of incremental upgrade.  I'll see
 what we can do on this point later.

 > > - component_shutdown(): if I read the code correctly, `__stopping` can
 > >   never be True in the context where this method can be called:
 > > {{{
 > >         if self.__stopping:
 > >             return
 > > }}}
 > >   If the intent is to prevent a duplicate call, we'll probably need a
 > >   dedicated attribute like "shutting_down", and set it to True in this
 > >   own method.
 >
 > I don't think it can be called twice currently and it probably wouldn't
 hurt either. I don't know what I was thinking before, when I added it, so
 I removed it.

 This doesn't seem to be removed.

 > > - reap_children(): I was not sure if it's safe to remove this:
 > > {{{
 > > -                self.dead_processes[proc_info.pid] = proc_info
 > > }}}
 >
 > The dead_processes were used for the delayed starts. We need to
 reintroduce them sometime, but maybe in a completely different way. So
 that's why they are removed from the code now.

 See above for expected_shutdowns.  It may be safe, but it's unclear
 and probably incomplete, and made me so confused.

 > > - I'm not sure "b10-xxx" is the best choice for the component name.
 >
 > That is because I use the name as the default value of process to start,
 to save the user some typing when introducing a component. I'd like to
 unify it with the address on the message queue somehow as well, so the
 only thing needed to add a new component would be to type:
 >
 > {{{
 > config add Boss/components b10-component
 > }}}
 >
 > And be done.

 Then please describe the intent somewhere (it's okay to defer it to
 the user doc task).

 > > - likewise, while "params" are defined as a list of strings,
 >
 > > - I can't find "priority" in the spec.
 > > - dhcp related config is missing, too.
 >
 > What dhcp related config? I think the DHCP wasn't started before (and it
 didn't really have a start_dhcp option yet). But if a user wants, he can
 always add the component, right? There's no special need for DHCP. Or did
 I overlook something?

 Ah, I thought DHCP daemons were invoked by the boss process, if that's
 not the case, forget the DHCP comment per se.  But if we'd say "if a
 user wants...", I'd also argue that it's the same for
 xfrin/xfrout/auth, etc.  (or is this perhaps an intermediate state so
 we preserve the current operational practice and it will eventually
 move outside the boss spec?  if so, that's okay to me)

 > > General comment: TestStartStopProcessesBob was completely removed and
 > > TestBossComponents was completely newly introduced.  Both are pretty
 > > big, and the changes are not trivial, so I was not sure such changes
 > > in tests ensure safe refactoring.  [...]
 >
 > Well, it's simply because the old way of configuration no longer exists,
 so there's no reason (or way) to test it. Instead of testing that if
 there's a start_resolver, the resolver is started, we test that if there's
 a component in configuration, that component is started (by passing it to
 the configurator and by the configurator tests).

 With all due respect, it might be "simple" in the architect's brain,
 but not for someone suddenly introduced to the whole big changes.
 This is actually another instance of whether we couldn't make it
 incremental introduction.  I'll re-consider it in that context later.

 > > - TestBossComponents.test_correct_run: about this point, my
 > >   understanding is that we are expected to return an error answer in
 such
 > >   cases:
 > > {{{
 > >         # Is it OK to raise, or should it catch also and convert to
 error
 > >         # answer?
 > >         self.assertRaises(Exception, bob.config_handler,
 > >                           {'components': compconf})
 > > }}}
 > >  See also the comment about config_handler() above.
 >
 > I was thinking it should be the job of whatever calls the handler to
 catch the exceptions and convert them to error messages. But I probably
 didn't have a reason to expect that to do so.

 config_handler is a callback function, and the caller is
 ModuleCCSession.  Since it's beyond the module boundary, IMO it's
 generally safer not to propagate arbitrary exceptions to the caller
 side (even if re-raise is okay it's safer to catch them once and make
 sure the re-raised exception is one of the expected exceptions for the
 caller).  Besides, in my understanding, ModuleCCSession doesn't
 actually expect to get an exception on the call to the callback
 handler.

-- 
Ticket URL: <http://bind10.isc.org/ticket/213#comment:27>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development