BIND 10 #640: cfgmgr hanging and command and configurations for no-longer-running components

Thu Feb 2 14:52:03 UTC 2012

#640: cfgmgr hanging and command and configurations for no-longer-running
components
-------------------------------------+-------------------------------------
                   Reporter:  jreed  |                 Owner:  UnAssigned
                       Type:         |                Status:  reviewing
  defect                             |             Milestone:
                   Priority:  major  |  Sprint-20120207
                  Component:         |            Resolution:
  Unclassified                       |             Sensitive:  0
                   Keywords:         |           Sub-Project:  DNS
            Defect Severity:  N/A    |  Estimated Difficulty:  0.0
Feature Depending on Ticket:         |           Total Hours:  0
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by jelte):

 * owner:  jelte => UnAssigned
 * status:  assigned => reviewing

Comment:

 Notes:

 the diff has gotten quite big already, so it's about time to call this
 good enough for now; it solves the problem, though not ideally just yet.

 What it does now is inform the config manager that a module is stopping,
 which through some more signaling ends up removing it from the 'running
 modules' list in bindctl; so when you remove a component from
 Boss/components, commit it, and then try to send that component a command
 or change its configuration, bindctl will immediately report that the
 module is unknown or not running.

 So in effect, this is actually an implementation of bullet point 1 of
 #123; when a module (defined by using the ModuleCCSession class) quits
 correctly, it causes a message 'stopping' to be sent to the channel
 ConfigManager (when it is killed, currently there is no such message, we
 can consider that as well, and implement it in Boss, but as we expect the
 module to be restarted anyway, I think we can live without that for now).

 Reviewing notes:
 We currently have separate implementations for ModuleCCsession in C++ and
 python, which in a way is a small advantage; the C++ version automatically
 handles this when the destructor is called, but for python this is not
 practical; the moment when __del__() will be called is not predictable
 enough, and some quick tests have shown that is is currently not called
 when the object goes out of scope in a number of our modules.

 So for the python version, I added an explicit send_stopping() method,
 which will send the message. If you have a suggestion for a better name,
 please let me know :) Originally called it stop(), but it's not to be
 confused with close(), which actually closes the communication channel,
 but I'd like to keep them separate.

 This does mean that I needed to add a call to every python module, which
 is the main cause of the patch growing. And since our modules use the
 ModuleCCSession in a different way, it's not as straightforward everywhere
 as one may hope. In fact, I think that we should have a followup task to
 make the ModuleCCSession a Context (the pythonic way to do RAII), and make
 our modules more consistent in this regard. I considered doing this right
 now but for some (of not all) modules this requires some big changes in
 the way they work. As for the followup, I recommend we make
 ModuleCCSession a context-usable object, implement it in a relatively
 clean module (ddns comes to mind), then use that as the template to clean
 up the others.

 Another thing, closely related to this, is that our tests have a similar
 problem; we have a growing number of local Mock classes (and I had to add
 a few more). In a lot of cases, these tend to check the same thing. For
 some we can move them to testutil, so at least there is a common class for
 them; I added a very minimal MockModuleCCSession to testutils, and used
 that one in a number of tests (but not all; some mockclasses provide extra
 things only needed for those specific tests). But I have been playing
 around a bit and I think that we can do something way better in python; a
 general mock/proxy call-counting class, I'll send a proposal to the list
 separately. This would not be direct followup for this, but since this
 branch inspired me to look into that, I thought i'd mention it anyway.

 So the actual changes are these:
 - updated ccsession.h ccsession.cc and ccsession.py to allow for the
 sending of the 'stopping' informational message
 - updated cfgmgr to act on this message and send cmdctl another message
 (in fact a form of spec_update, but the content signaling 'remove this
 spec from your known list).
 - fixed bindctl to clear its list of known modules before asking the new
 ones (it originally only overwrote them)
 - added calls to the python modules to send said stopping message (for the
 C++ modules, it is automatically called upon destruction of the
 ModuleCCSession)
 - added MockCCSession test and check whether stop() is called

 Proposed changelog:
 [func] jelte
 modules now inform the system when they are stopping. As a result, they
 are removed from the 'active modules' list in bindctl, which can then
 inform the user directly when it tries to send them a command or
 configuration update. Previously this would result in a 'not responding'
 error instead of 'not running'.

 (i personally think this is more func than bug, but no strong opinion)

-- 
Ticket URL: <http://bind10.isc.org/ticket/640#comment:6>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development