BIND 10 #213: Change hard-coded process startups to configuration-driven

Mon Oct 31 11:52:11 UTC 2011

#213: Change hard-coded process startups to configuration-driven
-------------------------------------+-------------------------------------
                   Reporter:  shane  |                 Owner:  jinmei
                       Type:         |                Status:  reviewing
  enhancement                        |             Milestone:
                   Priority:  major  |  Sprint-20111108
                  Component:  Boss   |            Resolution:
  of BIND                            |             Sensitive:  0
                   Keywords:         |           Sub-Project:  Core
            Defect Severity:  N/A    |  Estimated Difficulty:  9
Feature Depending on Ticket:         |           Total Hours:
        Add Hours to Ticket:         |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Replying to [comment:28 jinmei]:
 > I've created a kind of quick hack branch ("trac213-incremental") to
 > see how the step-by-stp approach was difficult or could be feasible.
 > It begins with the latest component stuff and the master version of
 > boss code, and modifies the latter so that only the start and stop
 > processes are updated to use the component/configurator framework.

 Hmm, that looks like it could work. As it seems my current approach
 doesn't, as it is hard to review, let's try it.

 So, I'll try to address everything related to the configurator framework
 and the stuff that's already in the branch and then branch from it to add
 a little bit more.

 Anyway, I noticed two things:
  - The start_all_pracesses method starts only b10-auth, not all the xfrin,
 xfrout, etc. That doesn't seem to matter much, as the configuration
 handler is called soon enough anyway. Is there a reason for this? Anyway,
 this is temporary, so it doesn't matter.
  - The start_xfrin in master now has some kind of hack with paths. Should
 we preserve that one?

 > Isn't it possible we can once step back to something like this and
 > rebuild the things one by one?  If you still think it's impossible...,

 Yes, it seems doable now. I had some kind of mental block, most probably
 because when I started the task, I didn't have really clear idea how the
 result will look like.

 > BTW, in either case I believe we need commit c916095.  Without this
 > killing the boss process by a single would also kill sockcreator, and
 > it will make the shutdown process a bit unexpected (this problem
 > exists in the current code, but the configurator code logic will make
 > it more visible).

 Ah, now I see why the socket creator threw so often on shutdown for me.
 That was one of the TODO items up there.

 > > > - `BoB.__init__()`: ideally I'd like to move the setting of
 > > >   `__core_components` outside of the bind10 module (and even more
 > > >   ideally make it "configurable" by a command line option, etc).
 > >
 > > I don't think configuration on command line makes any sense. These are
 the real core components. The boss won't be able to work (and load the
 configuration) without them. I don't see a reason why boss shouldn't know
 the bootstrap process and why a user would want to modify it (or in which
 way it would make any sense). In fact, I'd like to keep users as far from
 these as possible. If anybody has a reason to have a different set of
 these in bootstrap process, they are coding already anyway. I don't expect
 even the authors of additional bind10 components to need to modify them.
 >
 > I'm not 100% sure if we never want to modularize (e.g.) the config
 > manager and make it replaceable, but I admit that may be
 > over-generalization.  So I don't oppose to leaving this point as is.

 We may want to have some modularization. But it wouldn't work from command
 line anyway, as these core components are tightly coupled into the rest of
 the code (the socket creator will be used to answer requests for sockets,
 we need to connect to the c_channel using the message queue, etc.). So, I
 think we should leave this for something in the year 5 or so.

 Replying to [comment:25 jinmei]:
 > > If someone uses boss to start dhcp, he would just remove auth and
 resolver from configuration and have the dhcp part as needed.
 >
 > Perhaps the point to consider is what should be specified as 'needed'
 > by default in bob.spec.  If we see BIND 10 as a generic framework for
 > various kind of Internet servers (starting with DNS, then DHCP, and
 > perhaps even HTTP, etc), it would be more reasonable to begin with an
 > empty list of specific servers.  If a user wants to use the framework
 > for DNS services, auth (and/or resolver) will then be specified as
 > 'needed'.  On the other hand, realistically speaking most people will
 > see BIND 10 as DNS software (at least for initial N years), so it
 > might be over generalization and just increase the configuration
 > overhead.  Right now I have a strong opinion either way.  Maybe one
 > option is to decide it at ./configure time, and make its default DNS
 > related servers.  But in any case I'm okay with deferring this point.

 When we introduced the start_auth and start_resolver configuration
 options, we discussed this AFAIK and we decided to have start_auth as true
 as the default. The current (nonincremental) version starts the same set
 of processes as the default. I did want the default to be empty and let
 user start what he wants at the time and we might bring it up sometime for
 wider discussion. But I don't think it should be just changed in the
 ticket, there'll be enough new things for users in that anyway.
 >
 > > > - An object of this class is a sort of finite state machine, [...]
 > >
 > > Yes, you're right. It happened in kind of evolutionary way, the
 __running one was there first, then the __dead appeared later on and I
 didn't think about it. This way it looks simpler. I also added the
 diagram.
 > >
 > > > - What if stop_internal() raises an exception?
 > >
 > > Then we have a problem.
 > >
 > > Actually, the component is considered shut down at the time and the
 exception is propagated. The idea behind this is, we can't really consider
 it running, because it might be already stopped and if there's problem
 stopping, if we try again (during system shutdown or sometime else), it
 would fail again. This way, if it happens during real shutdown and the
 process is still running, it will be at last killed. If it happens during
 reconfiguration, I don't know. Any ideas what to do then?
 >
 > On thinking about it more as being explicitly asked, I think we should
 > keep truck of the status of spawned processes more precisely.  Right
 > now (both before or after this branch), it seems that we are not very
 > accurate on this point.
 >
 > [...]
 >
 > All that said, this will be beyond the scope of this already-fat
 > task.  After all, the pre-213 implementation isn't good in this sense,
 > so in the sense of porting the current behavior under a new framework
 > we don't have to solve it now.  So, at the moment, I'm okay with just
 > leaving a comment that e.g. stop_process() is generally expected to be
 > exception free (for now) and the behavior is undefined if and when
 > that happens.

 Yes, you're right, that would help. We could require components to report
 to boss „I'm ready“ when it starts (which could solve the problem of
 explicitly configuring address) and have few more states and more things
 like that. I also agree with leaving it out for now, but I added little
 bit of documentation with that regard.

 Replying to [comment:26 jinmei]:
 > Replying to (the rest of) [comment:19 vorner]:
 >
 > > > - It seems to make sense to allow derived class to override some of
 > > >   the methods.  But I suspect the requirement to the overridden
 > > >   methods should be more clarified [...]
 > >
 > > I added some documentation.
 >
 > Ack.  One small point in the revised doc:
 > {{{
 >         You should also register any processes started within boss.
 > }}}
 > If this is something that all derived class (whether overridden or
 > not) should meet, maybe we should move it to start().

 Hmm, I'm not sure about that. Actually, I envisioned the component could
 have more than one process running (in which case it would register
 multiple times) or no process at all. So, doing this would decrease the
 flexibility. Should I document this idea somewhere?

 And, well, I don't expect that many components that are not started the
 usual way. In the long term, I'd really like to see only the real core
 components to be inherited.

 > documented.  As for pid(), I guess the (more) proper condition is
 > `if self._procinfo is not None` here:
 > {{{
 >         return self._procinfo.pid if self._procinfo else None
 > }}}

 Ah, right. Too much perl education in my case O:-). The original was in
 fact correct, as the _procinfo is an object, so it can be False only if it
 is None, but this is more the python way.

 > Also, for pid() to always work correctly, I suspect we should reset
 > _procinfo to None once the component stops running.

 Then kill wouldn't work in some cases, or I'd need to store the ProcInfo
 object twice, once for kill and once for pid. Or check the state in pid,
 but then, it wouldn't be there for the log message when it is killed.

 So, instead of making it more complicated this way, I believe nothing is
 hurt by the way it is now, so I just added a note.

 > > I added some documentation there.
 >
 > Maybe my intent wasn't clearly delivered...let me rephrase it:
 > Assume I wanted to completely override `_start_internal()` for some
 > reason.  Then I should set `_procinfo` in the override version;

 As the current version of _start_internal only calls the _start_func(),
 stores the procinfo and registers the process, the only reason I can come
 up with would be to not use ProcInfo at all (the way the socket creator
 does). If you have the ProcInfo object, you are OK by providing the
 _start_func hook.

 > otherwise kill() won't work correctly.  And, just like I commented
 > about "You should also register..." part, if this is something all
 > overridden the original implementation should do, I'd move it to
 > start(), and (in this specific case) ask the implementor to return a
 > `ProcessInfo` object from the overridden version of `_start_internal()`.
 > Or, if you don't like that, at least we should document that
 > `_procinfo` must be set in the overridden version of
 `_start_internal()`.

 Ah, I added the kill later than the documentation. It should say kill
 should be overridden in the case you provide _start_internal. In that
 case, everything that uses _procinfo would be overriden.

 > I'd also envision some external developers want to use the BIND 10
 > framework to control a third party server process and want the boss
 > process to invoke the third party server without modifying the BIND 10
 > source code.  My overall point is that I'd like this framework to have
 > this type of flexibility.  Whether to use the so-called "reflection"
 > is implementation details.  But in any case, I don't think we need it
 > today, especially not in this ticket, so I'm not insisting this should
 > be solved within this task.

 Well, in that case we would need a way to push the code into the running
 master somehow. I'm not against that, but I had hoped that in the long
 term, the only 'special' components would be the real core ones (msgq,
 sockcreator, cfgmgr). The rest would have all the parameters & other
 needed info in the specification and started the usual way. Then we could
 take 'special' out of the spec file as well.

 If it turns out we really need a way to push additional code into boss
 that does a different startup, I hope we would come up with a way how to
 modify the dictionary with special components (eg. each plugin or whatever
 it would be would plug itself into some place to be known).

 > Okay, but do you mean this?
 > {{{
 >     Note that this will allow you to stop (by invoking reconfigure) a
 core
 >     component. There should be some kind of layer protecting users from
 ever
 >     doing so (users must not stop the config manager, message queue and
 stuff
 >     like that or the system won't start again).
 > }}}
 > I fail to understand that this means that the caller shouldn't stop a
 > core component.  It's probably better to describe that more
 > explicitly.

 No, that doesn't mean caller shouldn't stop a core component, that is
 completely safe. The caller shouldn't stop a component which can't be
 started again. Stoping b10-auth set as core is completely OK. I updated
 the doc to be more clear about it.

 Replying to [comment:27 jinmei]:
 > Replying to [comment:23 vorner]:
 > > Hello
 > >
 > > Replying to [comment:16 jinmei]:
 > > > - kill_started_processes(): from a quick look at the library
 > > >   reference, os.kill(signal.SIGTERM) is less portable than
 > > >   Poepen.kill(), which was used for the original code (especially if
 > > >   we consider Windows support), and in that sense this is a
 > > >   regression.  The same comment applies to other cases where
 os.kill()
 > > >   is used.
 > >
 > > The killing was moved into the components which handle them in
 concrete way.
 >
 > As for kill(), I'd use higher level abstraction than SIGKILL instead
 > of SIGTERM. to match the abstraction level of the python interface.

 I'm not sure I understand here. Currently the component uses the
 terminate() and kill() calls from the procinfo.process(). That is what was
 used before.

 Do you mean something else?

 > > Not really, I missed that one. But it's unhappy in the current state,
 it doesn't do what it should, as there's no right place. We should finish
 the socket creator soon enough.
 >
 > Hmm...I'm especially not happy about letting xfrin with the root
 > privilege as a result of this.  I suspect it's unlikely that
 > the socketcreator work is completed and merged before the boss config
 > stuff, I would introduce some intermediate workaround to retain the
 > previous behavior on this point.  This would also be one example case
 > where incremental upgrade should have been beneficial.

 This one is not in there right now, but I might have a proposal for a
 workaround. We may define a special drop-root-privileges component to
 place the dropping in the middle of the startup by the priorities. Does it
 sound reasonable?

 > > > -        # TODO: Some timeout to solve processes that don't want to
 die would
 > > > -        # help. We can even store it in the dict, it is used only
 as a set
 > > > }}}
 > >
 > > Ah, that one was removed because it now belongs somewhere else. But it
 got lost in the ether until it got there, so I put it there now.
 >
 > I don't understand this.   If you are referring to a post-review
 > change, please be more specific about the exact commit ID (but of
 > course the more fundamental issue is that this branch is too big, so
 > it's extremely difficult for a reviewer to keep truck of every small
 > change).

 It's git fd3c952098c46d84c9a277b1409442813a263876.

 > > That is because I use the name as the default value of process to
 start, to save the user some typing when introducing a component. I'd like
 to unify it with the address on the message queue somehow as well, so the
 only thing needed to add a new component would be to type:
 > >
 > > {{{
 > > config add Boss/components b10-component
 > > }}}
 > >
 > > And be done.
 >
 > Then please describe the intent somewhere (it's okay to defer it to
 > the user doc task).

 Yes, I think it should be in the user docs. The programmer doc's describe
 the behaviour, but it's true not really the rationale behind that, but I
 don't think it belongs there.

 > > What dhcp related config? I think the DHCP wasn't started before (and
 it didn't really have a start_dhcp option yet). But if a user wants, he
 can always add the component, right? There's no special need for DHCP. Or
 did I overlook something?
 >
 > Ah, I thought DHCP daemons were invoked by the boss process, if that's
 > not the case, forget the DHCP comment per se.  But if we'd say "if a
 > user wants...", I'd also argue that it's the same for
 > xfrin/xfrout/auth, etc.  (or is this perhaps an intermediate state so
 > we preserve the current operational practice and it will eventually
 > move outside the boss spec?  if so, that's okay to me)

 Actually, there was a code to start it depending the config, but the
 config wasn't in the spec file yet.

 So, this should be all regarding the code that is there for now. I can
 give this to you and do the next incremental step on a new branch (without
 a new ticket).

 Thank you

-- 
Ticket URL: <http://bind10.isc.org/ticket/213#comment:30>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development