BIND 10 #640: cfgmgr hanging and command and configurations for no-longer-running components

Fri Feb 3 17:38:14 UTC 2012

#640: cfgmgr hanging and command and configurations for no-longer-running
components
-------------------------------------+-------------------------------------
                   Reporter:  jreed  |                 Owner:  jinmei
                       Type:         |                Status:  reviewing
  defect                             |             Milestone:
                   Priority:  major  |  Sprint-20120207
                  Component:         |            Resolution:
  Unclassified                       |             Sensitive:  0
                   Keywords:         |           Sub-Project:  DNS
            Defect Severity:  N/A    |  Estimated Difficulty:  0.0
Feature Depending on Ticket:         |           Total Hours:  0
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jelte):

 Replying to [comment:7 jinmei]:
 > I've not yet reviewed everything, but as I'm taking a day off tomorrow
 > I'll provide feedback on the things I've covered so far.
 >

 Ack, thanks so far :)

 > First off, it was not clear which part of the original report was
 > solved.  I see the potential that this set of changes can make bindctl
 > more responsive in the reported scenario, but was it identified why
 > cfgmgr was hanging (or did it really hang)?  And does this branch
 > solve that problem?  What about the traceback output?  (BTW I
 > understand this branch cannot solve the issue of configuring inactive
 > component).
 >

 Oh, sorry for not being clear; that ConfigManager timeout was actually
 closely related; the timeout values for communication from bindctl to
 configmanager and from configmanager to any other (in this case non-
 running) module are the same; so the 'configmanager is not responding' was
 not because configmanager was hanging, but because it was simply waiting
 for a timeout itself. So this would address the root cause, but not that
 specific problem (we probably need to make cmdctl->cfgmgr timeouts a bit
 higher than the rest).

 > A related point is that since this is an issue concerning interactions
 > of multiple processes, unittests wouldn't be enough to be sure whether
 > the problem was solved.  It may be okay to defer it to a separate
 > ticket, but I think we need a system level test for this kind of
 > thing.
 >

 oh yeah, good point. I've added a few steps to terrain/bind10_control.py
 that I needed, and then cleaned it up a bit.
 Added a new feature file 'bindctl_commands.feature', which currently only
 contains this test (run basic setup, remove a number of modules, and check
 if they are not shown anymore)

 > Next, I made a few trivial changes to the branch.
 >
 > '''ccsession.cc'''
 >
 > - Not a big deal, but `sendStopping()` is a private method, not very
 >   big, and used only within the ModuleCCSession destructor.  Is it
 >   worth a separate method?

 I personally prefer it in a separate method; not so much because of its
 size, but because the action it performs is not directly related to local
 resource cleanup, and is kind of a side effect, so constructing the
 message and sending it seems (imho) nicer in a separate method. No strong
 opinion though (side not, i even considered making it a public call, and
 having some check to see if it had been called already in the destructor,
 so the client code can decide when it should be called. But that did not
 seem worth it).

 > - logging itself could cause an exception, so the destructor could
 >   still throw.  However, this is not the only point that has this
 >   problem, so maybe we can leave it open for now...
 >

 hmm, imo we should either make logging calls exception-free or add
 exception-free ones, since it is usually about the only thing you can do
 in a destructor.

 > '''ccsession.h'''
 >
 > - I thought we decided to use C++ style comments for doxygen

 oops, fixed

 > - In our convention we'd explicitly add 'virtual' (for clarity) to the
 >   destructor declaration.
 >

 actually, it's even needed for technical reasons for classes that are
 subclassed, but this one isn't supposed to be subclassed afaik I realize
 we can't stop people from doing it, but personally i tend to look at the
 virtualness of the destructor to see if it should be subclassed in the
 first place. Of course, one could argue that it would be better to add it
 when it's not necessary than to not have it when it is, so if you insist
 i'll make it virtual.

 > '''config_message.mes'''
 >
 > If not yet done, you may want to reorder the messages using
 > tools/reorder_message_file.py
 >

 done (it only moved an older one)

 > '''ccsession_unittests.cc'''
 >
 > - maybe a matter of taste, but the intermediate block just to have a
 >   separate scope looks a bit awkward.  I'd do something like this:
 > {{{#!c++
 >     scoped_ptr<ModuleCCSession> mccs(new ModuleCCSession(
 >                                          ccspecfile("spec2.spec"),
 >                                          session, NULL, NULL,
 >                                          true, false));
 >     ...
 >     mccs.reset();               // this will invoke destructor
 > }}}
 >   At least I think some comments about the intent of the scope would
 >   be necessary.

 changed

 > - it would be nicer if we could test the case where the destructor
 >   throws.
 >

 ack, added a simple throw_on_send option to fakesession, and added test

 > '''ccsession.py'''
 >
 > - send_stopping: while the pydoc says "any errors are logged", only
 >   SessionError is caught.  Is that intentional?  Can we assume
 >   create_command() doesn't raise an exception?

 Oh, right, made it except Exception. create_command can raise as well, but
 i don't really want to catch those; imo they can only be programmer errors
 (well, except for out of mem of course), and should fail as loudly as
 possible. Added a comment to that effect.

 > - send_stopping: I believe you don't need 'str' for 'se'.  (might not
 >   be a big deal in this case, but delaying 'str' could possibly help
 >   improve performance).
 >

 ack, changed

 > '''cfgmgr.py'''
 >
 > - if `_handle_module_stopping` is private, should it better be
 >   prefixed with `__`?  Same for `_send_module_spec_to_cmdctl`.

 as you can see from the rest of the code here, i tend to only use the one
 when I don't think the class/method name mangling is necessary. Should we
 start doing this as a principle? (and if so, should we update all
 'private' methods?)

 > - `_handle_module_stopping` returns None, saying "This command is not
 >   expected to be answered".  But returning None doesn't seem to
 >   suppress sending a response.

 ah doh, fixed.

 > - Not directly related to this ticket, but the if-else in handle_msg
 >   is getting too big to understand/manage.  I guess it's time to
 >   consider refactoring, maybe in some OO way.
 >

 depends on how; as it is now, it is indeed getting long, but is
 essentially a switch. I'm open to proposals, but some 'smart' solutions
 might actually be more complicated :p

 > '''ccsession_test.py'''
 >
 > - test_stop: not necessarily a problem, but it seems the same pattern
 >   repeats in multiple tests.  It would be better to unify the common
 >   sequence of tests for conciseness.

 Yes, actually, more generally, this is partly why I was looking at the
 callcounter thingy a few nights ago; i am working during my down-hours on
 a slightly more general version of that, and I think we can replace a lot
 of 'checking if stuff has been called' tests with such a beasty. But I
 guess you're not just talking about that. I actually think we should not
 just unify more test code; we can probably do a much better job of
 unifying the common patterns in the modules themselves as well (and
 unified test code should then kind of roll out as part of that work).

-- 
Ticket URL: <http://bind10.isc.org/ticket/640#comment:9>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development