BIND 10 #2676: Provide ccsession::rpc_call and replace usual calls with it

Mon Feb 18 22:05:05 UTC 2013

#2676: Provide ccsession::rpc_call and replace usual calls with it
-------------------------------------+-------------------------------------
            Reporter:  vorner        |                        Owner:
                Type:  enhancement   |  jinmei
            Priority:  medium        |                       Status:
           Component:  Inter-module  |  reviewing
  communication                      |                    Milestone:
            Keywords:                |  Sprint-20130219
           Sensitive:  0             |                   Resolution:
         Sub-Project:  Core          |                 CVSS Scoring:
Estimated Difficulty:  2.5           |              Defect Severity:  N/A
         Total Hours:  0             |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:4 vorner]:

 > I'd like to discuss what to do with these, I have several ideas:
 >  * Move the `create_command` and `parse_reply` (and the others) to
 `isc.cc`.
 >    This, however, has the drawback that a lot of code would need to be
 changed
 >    (because, for example, some of them throw `ModuleCCSessionError`).
 >  * Create two `ModuleCCSession`s for the modules (and don't start the
 others to
 >    listen for commands, and provide empty command handlers for them).
 Use them
 >    for the rpc calls.
 >  * Solve the problems with threads.
 >  * Leave it as it is, with the FIXME as in the branch and manually
 specifying
 >    `want_answer`.

 I didn't yet understand why we needed two sessions in these cases, but
 if they are about xfrin and zonemgr, I guess it's just another
 incident that xfrs and zonemgr only have experimental quality,
 especially about the use of threads.  I'm not sure when we can make
 them production ready, but I'd generally defer tricky cases related to
 the buggy use of threads in these modules to fundamental redesign
 work.

 Comments on the branch follow:

 '''general'''

 - a lettuce test (xfrin_notify_handling.feature) often, but not
   always, fails.  I'm not sure if it's related to this branch.
 - not really about this branch, but the rpc_call() implementation made
   me wonder whether we should add a constant of "CC_REPLY_SUCCESS"
   instead of the magic number of 0:
 {{{#!python
         if code == CC_REPLY_NO_RECPT:
             raise RPCRecipientMissing(value)
         elif code != 0:
             raise RPCError(code, value)
 }}}

 '''unittest_fakesession.py'''

 - this comment seems to have to be updated:
 {{{#!python
         # each entry is of the form [ channel, instance, message ]
         self.message_queue = []
 }}}

 '''cmdctl.py.in'''

 - totally a nit, but did you intentionally use "ourself" instead of
   "ourselves"?
 {{{
             # rpc-call method, we use the low-level way and create the
             # command ourself.
 }}}
   According to my dictionary: "Ourself is sometimes used instead of
   'ourselves' when it clearly refers to a singular subject."  I'm not
   sure if this "ourself" clearly refers to a singular thing.

 '''ddns.py.in'''

 - some log messages seem to have to be updated.  For example, the
   description of DDNS_START_FORWARDER_ERROR is not very accurate any
   more because it includes the case that auth doesn't exist (or at
   least isn't listening to the commands).  I suggest you go through
   all relevant log messages (including detailed description) and
   revise them appropriately.
 - `__notify_start_forwarder`: also not related to this branch, but
   this case is one example where just getting a response from one
   instance (of possibly multiple ones) of the component isn't
   sufficient.  We'll have to clarify this in the protocol doc (and
   refinement) task and revise the implementation so it'll work
   robustly.
 - `__notify_auth`: I see why auth_loadzone_command() wasn't used, but
   I don't think it a good practice that each individual module has to
   construct this level of objects from the scratch and with hardcoding
   magic keywords.
 - `__notify_update`: on this comment,
 {{{
             # FIXME? Do we really need to wait for the answer here?
 Shouldn't
             # this be some kind of notification instead?
 }}}
   see the detailed description of DDNS_UPDATE_NOTIFY_FAIL.  And, in
   fact, IMO if we want to make production quality system, not a toy of
   a student project, we shouldn't just "notify and forget" in many
   more cases.  IMO our implementation is rather too loose in this
   sense.  In this particular case, the discussion is quite subtle
   though...first, if the error is `RPCRecipientMissing`, it can be
   less serious: while the fact that auth/xfrout has likely terminated
   may indicate something wrong to auth/xfrout themselves, the entire
   system can recover from that situation as they were restarted by the
   init process.  So we could handle this specific case separately,
   with a lower level of log.  But then we face the fundamental
   question of the inter-module communication model.  This is another
   case where if not all recipients receives the message the system can
   be in an inconsistent state.

 '''stats.py.in'''

 - do_polling: doesn't it change the behavior?  Previously if the
   recipient doesn't exist it should have resulted in a timeout and an
   exception.  It's now reported as `RPCError` and ignored.
   Furthermore, there now seems to be a possibility that `modules` can
   be used without set (even if detailed conditions prevent that from
   happening, it's not obvious from the code, and it means the quality
   of this code is low - although it's not a fault of your change).
 - do_polling: the big `try` block reduces readability and it
   actually shouldn't have to be that big because `RPCError` could only
   happen in rpc_call.
 - I don't understand this comment:
 {{{
             # Not using rpc_call here, we query a lot of modules in
 parallel
             # here
 }}}
 - update_modules: likewise, the behavior seems to be slightly changed.

 '''stats_httpd.py.in'''

 - get_stats_data: raising `StatsHttpdDataError` in case of
   `RPCRecipientMissing` doesn't seem to be correct (at least it
   doesn't preserve the previous behavior).

 '''xfrin.py.in'''

 - about the FIXME comment: as stated above, IMO it's another
   incident that xfr's are a student's hobby project, not a
   production-quality system, that it simply ignores the response or
   timeout.  (But this specific case may not have to be an "RPC" anyway
   - all xfrin should know and handle if the message was surely
   delivered to zonemgr).

 '''zonemgr.py.in'''

 - If `ZonemgrRefresh._cc` isn't used anymore, it should be removed
   (while it's defined to be "proected", I suspect the intent was it's
   private).
 - What's the intent of the comment?
 {{{#!python
         except socket.error: # FIXME: WTF?
             logger.error(ZONEMGR_SEND_FAIL, module_name)
 }}}
   That it's not clear in which case this exception happens?  Please be
   more specific in any case.

-- 
Ticket URL: <https://bind10.isc.org/ticket/2676#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development