BIND 10 #2676: Provide ccsession::rpc_call and replace usual calls with it
BIND 10 Development
do-not-reply at isc.org
Mon Feb 18 22:05:05 UTC 2013
#2676: Provide ccsession::rpc_call and replace usual calls with it
-------------------------------------+-------------------------------------
Reporter: vorner | Owner:
Type: enhancement | jinmei
Priority: medium | Status:
Component: Inter-module | reviewing
communication | Milestone:
Keywords: | Sprint-20130219
Sensitive: 0 | Resolution:
Sub-Project: Core | CVSS Scoring:
Estimated Difficulty: 2.5 | Defect Severity: N/A
Total Hours: 0 | Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:4 vorner]:
> I'd like to discuss what to do with these, I have several ideas:
> * Move the `create_command` and `parse_reply` (and the others) to
`isc.cc`.
> This, however, has the drawback that a lot of code would need to be
changed
> (because, for example, some of them throw `ModuleCCSessionError`).
> * Create two `ModuleCCSession`s for the modules (and don't start the
others to
> listen for commands, and provide empty command handlers for them).
Use them
> for the rpc calls.
> * Solve the problems with threads.
> * Leave it as it is, with the FIXME as in the branch and manually
specifying
> `want_answer`.
I didn't yet understand why we needed two sessions in these cases, but
if they are about xfrin and zonemgr, I guess it's just another
incident that xfrs and zonemgr only have experimental quality,
especially about the use of threads. I'm not sure when we can make
them production ready, but I'd generally defer tricky cases related to
the buggy use of threads in these modules to fundamental redesign
work.
Comments on the branch follow:
'''general'''
- a lettuce test (xfrin_notify_handling.feature) often, but not
always, fails. I'm not sure if it's related to this branch.
- not really about this branch, but the rpc_call() implementation made
me wonder whether we should add a constant of "CC_REPLY_SUCCESS"
instead of the magic number of 0:
{{{#!python
if code == CC_REPLY_NO_RECPT:
raise RPCRecipientMissing(value)
elif code != 0:
raise RPCError(code, value)
}}}
'''unittest_fakesession.py'''
- this comment seems to have to be updated:
{{{#!python
# each entry is of the form [ channel, instance, message ]
self.message_queue = []
}}}
'''cmdctl.py.in'''
- totally a nit, but did you intentionally use "ourself" instead of
"ourselves"?
{{{
# rpc-call method, we use the low-level way and create the
# command ourself.
}}}
According to my dictionary: "Ourself is sometimes used instead of
'ourselves' when it clearly refers to a singular subject." I'm not
sure if this "ourself" clearly refers to a singular thing.
'''ddns.py.in'''
- some log messages seem to have to be updated. For example, the
description of DDNS_START_FORWARDER_ERROR is not very accurate any
more because it includes the case that auth doesn't exist (or at
least isn't listening to the commands). I suggest you go through
all relevant log messages (including detailed description) and
revise them appropriately.
- `__notify_start_forwarder`: also not related to this branch, but
this case is one example where just getting a response from one
instance (of possibly multiple ones) of the component isn't
sufficient. We'll have to clarify this in the protocol doc (and
refinement) task and revise the implementation so it'll work
robustly.
- `__notify_auth`: I see why auth_loadzone_command() wasn't used, but
I don't think it a good practice that each individual module has to
construct this level of objects from the scratch and with hardcoding
magic keywords.
- `__notify_update`: on this comment,
{{{
# FIXME? Do we really need to wait for the answer here?
Shouldn't
# this be some kind of notification instead?
}}}
see the detailed description of DDNS_UPDATE_NOTIFY_FAIL. And, in
fact, IMO if we want to make production quality system, not a toy of
a student project, we shouldn't just "notify and forget" in many
more cases. IMO our implementation is rather too loose in this
sense. In this particular case, the discussion is quite subtle
though...first, if the error is `RPCRecipientMissing`, it can be
less serious: while the fact that auth/xfrout has likely terminated
may indicate something wrong to auth/xfrout themselves, the entire
system can recover from that situation as they were restarted by the
init process. So we could handle this specific case separately,
with a lower level of log. But then we face the fundamental
question of the inter-module communication model. This is another
case where if not all recipients receives the message the system can
be in an inconsistent state.
'''stats.py.in'''
- do_polling: doesn't it change the behavior? Previously if the
recipient doesn't exist it should have resulted in a timeout and an
exception. It's now reported as `RPCError` and ignored.
Furthermore, there now seems to be a possibility that `modules` can
be used without set (even if detailed conditions prevent that from
happening, it's not obvious from the code, and it means the quality
of this code is low - although it's not a fault of your change).
- do_polling: the big `try` block reduces readability and it
actually shouldn't have to be that big because `RPCError` could only
happen in rpc_call.
- I don't understand this comment:
{{{
# Not using rpc_call here, we query a lot of modules in
parallel
# here
}}}
- update_modules: likewise, the behavior seems to be slightly changed.
'''stats_httpd.py.in'''
- get_stats_data: raising `StatsHttpdDataError` in case of
`RPCRecipientMissing` doesn't seem to be correct (at least it
doesn't preserve the previous behavior).
'''xfrin.py.in'''
- about the FIXME comment: as stated above, IMO it's another
incident that xfr's are a student's hobby project, not a
production-quality system, that it simply ignores the response or
timeout. (But this specific case may not have to be an "RPC" anyway
- all xfrin should know and handle if the message was surely
delivered to zonemgr).
'''zonemgr.py.in'''
- If `ZonemgrRefresh._cc` isn't used anymore, it should be removed
(while it's defined to be "proected", I suspect the intent was it's
private).
- What's the intent of the comment?
{{{#!python
except socket.error: # FIXME: WTF?
logger.error(ZONEMGR_SEND_FAIL, module_name)
}}}
That it's not clear in which case this exception happens? Please be
more specific in any case.
--
Ticket URL: <https://bind10.isc.org/ticket/2676#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list