BIND 10 #213: Change hard-coded process startups to configuration-driven

Thu Oct 27 22:50:30 UTC 2011

#213: Change hard-coded process startups to configuration-driven
-------------------------------------+-------------------------------------
                   Reporter:  shane  |                 Owner:  vorner
                       Type:         |                Status:  reviewing
  enhancement                        |             Milestone:
                   Priority:  major  |  Sprint-20111108
                  Component:  Boss   |            Resolution:
  of BIND                            |             Sensitive:  0
                   Keywords:         |           Sub-Project:  Core
            Defect Severity:  N/A    |  Estimated Difficulty:  9
Feature Depending on Ticket:         |           Total Hours:
        Add Hours to Ticket:         |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to (the rest of) [comment:19 vorner]:

 > > - It seems to make sense to allow derived class to override some of
 > >   the methods.  But I suspect the requirement to the overridden
 > >   methods should be more clarified [...]
 >
 > I added some documentation.

 Ack.  One small point in the revised doc:
 {{{
         You should also register any processes started within boss.
 }}}
 If this is something that all derived class (whether overridden or
 not) should meet, maybe we should move it to start().

 > > - A related point to the previous general issue, but not specific to
 > >   that: strange thing will happen if pid() is called before _procinfo
 > >   is set (e.g., at the "Init" state).  Also, what should happen if
 > >   this method is called without the underlying process?  Further, this
 > >   is actually not specific to PID, but also generally applicable to
 > >   _procinfo.
 >
 > The _procinfo should not be accessed if it's not running and it should
 in fact be more private than protected. I'm not sure if I access it from
 some kind of tests, or if it's leftover from some older version. Anyway,
 pid() now checks and returns None if it's not running (or derived
 component is bad).

 Maybe the accessibility assumption for _procinfo should be clearly
 documented.  As for pid(), I guess the (more) proper condition is
 `if self._procinfo is not None` here:
 {{{
         return self._procinfo.pid if self._procinfo else None
 }}}
 or revise the whole statement to:
 {{{
         return None if self._procinfo is None else self._procinfo.pid
 }}}
 (same comment applies to `kill()`).

 Also, for pid() to always work correctly, I suspect we should reset
 _procinfo to None once the component stops running.

 > > - On the specific point of the relationship between start_internal()
 > >   and _procinfo, I guess it's better to request derived method not
 > >   modify base attributes and instead return an ProcessInfo object as a
 > >   contract.  The start() method will then set _procinfo to the
 > >   returned value, and we can make sure _procinfo is always valid as
 > >   long as start() succeeds.
 >
 > I think there are none that modify it. This is actually what _start_func
 hook is for, it returns the procinfo and the rest is handled the usual
 way.
 >
 > Overriding start_internal is for bigger differences, where there might
 be no procinfo involved or no process (therefore PID) at all.
 >
 > I added some documentation there.

 Maybe my intent wasn't clearly delivered...let me rephrase it:
 Assume I wanted to completely override `_start_internal()` for some
 reason.  Then I should set `_procinfo` in the override version;
 otherwise kill() won't work correctly.  And, just like I commented
 about "You should also register..." part, if this is something all
 overridden the original implementation should do, I'd move it to
 start(), and (in this specific case) ask the implementor to return a
 `ProcessInfo` object from the overridden version of `_start_internal()`.
 Or, if you don't like that, at least we should document that
 `_procinfo` must be set in the overridden version of `_start_internal()`.

 > > - (A related point) Isn't it okay to not perform syntax check against
 > >   the given parameters? [...]
 >
 > It's not completely OK, but it's one of the TODO items (better error
 handling and checks). It currently implicitly relies on bindctl not
 allowing anything that's not allowed by spec file, but the checks should
 be definitelly added.

 Okay.

 > >   essentially specifies a specific derived Component class (if
 > >   explicitly specified).  So I'd rather simply call it something like
 > >   'component_class', specify the actual class Component class name
 > >   (e.g. 'SockCreator'), and use some reflection technique to
 > >   instantiate an object of the specified class.
 >
 > I'm not against some renaming (but the component_class is rather long
 and expects that someone has an idea what a class is). But I'd be against
 the reflection techniques, for one, they complicate tests, for another, I
 don't want the user to be able to type any class name there and expect
 things to work.

 It doesn't necessarily have to be reflection-like programming
 technique, but my point was that I'd like to separate the knowledge
 and process of how specific components should be created/killed from
 boss related modules, because it's ultimately something that only the
 author of that component can (and should) know.  So, for example, in
 my ideal world we'd write down how b10-auth should be spawned and
 killed somewhere and somehow, maybe as part of b10-config.db and/or in
 a form of plugin python module.  The boss process passes it to the
 component module mostly transparently, and eventually performs the
 required spawning steps, again, transparently.

 I'd also envision some external developers want to use the BIND 10
 framework to control a third party server process and want the boss
 process to invoke the third party server without modifying the BIND 10
 source code.  My overall point is that I'd like this framework to have
 this type of flexibility.  Whether to use the so-called "reflection"
 is implementation details.  But in any case, I don't think we need it
 today, especially not in this ticket, so I'm not insisting this should
 be solved within this task.

 As for the name, 'special' sounded too vague to me.  But naming is
 often a bikeshed matter (while sometimes it's very important), so as
 long as it's clearly documented we should probably focus on more
 important points.

 > > - _run_plan(): Like the configuration, I'd like to see doc string that
 > >   explains more details of 'plan'.
 >
 > I added a little bit, but that thing is mostly private to the
 configurator and tests.

 I know it's private.  My point is that we need some form of
 documentation that helps people who try to understand (and perhaps
 modify) the implementation.  It may not necessarily have to be doc string.

 > > - A related comment: since _old_config and _components are inter
 > >   related (and have one-to-one relationships if I understand it
 > >   correctly), I'd rather manage them in a combined way, e.g.
 > > {{{
 > >   _components = { 'msgq': (comonent_obj, componet_config), ... }
 > > }}}
 > >   That way we can be more sure, when reading the code, about the
 > >   consistency between them, and it will be easier to ensure the
 > >   consistency at least per component basis.

 > I implemented this.

 Change (18e970e1) looks okay, but using two different words
 'config(uration)' and 'spec' is confusing.  Please unify them.

 > But should we try some kind of rollback in case of failure-to-start or
 something?

 Ideally, but I suspect it's very difficult if not impossible because
 spawning/killing a process is not always a reversible operation.  If
 and when we clarify the relationship between the process status and
 component status (discussed in my previous comment message) we may be
 able to solve this in a cleaner way (for example, we may be able to
 ensure that Configurator.reconfigure() itself doesn't involve
 a non-reversible operation).  But for now, I suspect it's too
 difficult and should be remove from the scope of the ticket.

 > > - The exception message seems to be too broad:
 > > {{{
 > >                         raise NotImplementedError('Changing
 configuration of' +
 > >                                                   ' a running
 component is ' +
 > >                                                   'not yet supported.
 Remove' +
 > >                                                   ' and re-add ' +
 cname +
 > >                                                   'to get the same
 effect')
 > > }}}
 > >   because it only checks a subset of known configuration parameters
 > >   (e.g. even if 'address' is modified it doesn't complain)
 >
 > Not checking address was actually a bug (as the check was there before
 the address was added). Anyway, I'd like this one to go sometime soon and
 be able to handle the change if at all possible without bothering the
 user.

 Hardcoding the (un)expected items is quite error prone (it easily
 causes inconsistency), but considering it a short term workaround, I
 wouldn't bother further.  Also we'll need a space before 'to' here:
 {{{
                                                   'to get the same
 effect')
 }}}
 but, again, if it's short term it doesn't matter much anyway.

 > > - is it okay to allow Configurator.reconfigure() to even stop core
 > >   components?  It seems to contradict the idea that the system cannot
 > >   even start if such components fail to start.
 >
 > Yes, as described at the class's docstring, it's up to the caller to
 ensure this doesn't happen (and boss does that by making sure the three
 core components are always there).

 Okay, but do you mean this?
 {{{
     Note that this will allow you to stop (by invoking reconfigure) a core
     component. There should be some kind of layer protecting users from
 ever
     doing so (users must not stop the config manager, message queue and
 stuff
     like that or the system won't start again).
 }}}
 I fail to understand that this means that the caller shouldn't stop a
 core component.  It's probably better to describe that more
 explicitly.

 > > - what should happen if we do Configurator.shutdown() and try to
 > >   start it again?  Is it expected to be an okay behavior?  Unexpected
 > >   but not prohibited?  Or should be prohibited?
 >
 > I didn't think about that one. Boss doesn't use it so and I don't think
 there ever will be another user than a boss.
 >
 > But I guess it would work, so I wouldn't bother forbidding it explicitly
 or making sure it does unless it's really needed.

 I have no strong opinion about what it should be, but it's probably
 worth noting that it's not expected behavior but not explicitly
 prohibited (for now).

-- 
Ticket URL: <http://bind10.isc.org/ticket/213#comment:26>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development