BIND 10 #2676: Provide ccsession::rpc_call and replace usual calls with it
BIND 10 Development
do-not-reply at isc.org
Tue Feb 19 18:33:39 UTC 2013
#2676: Provide ccsession::rpc_call and replace usual calls with it
-------------------------------------+-------------------------------------
Reporter: vorner | Owner:
Type: enhancement | jinmei
Priority: medium | Status:
Component: Inter-module | reviewing
communication | Milestone:
Keywords: | Sprint-20130305
Sensitive: 0 | Resolution:
Sub-Project: Core | CVSS Scoring:
Estimated Difficulty: 2.5 | Defect Severity: N/A
Total Hours: 0 | Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:9 vorner]:
> It seems the ModuleCCSession is busy locked in one thread in the loop
checking
> commands. It doesn't cope with another request to wait for messages
there. We
> may want to refactor the libraries in around there a lot, because we had
some
> similar problems in C++ library as well (not with threads though, during
the
> startup).
>
> Anyway, thinking about this, it might be solved by some kind of
asynchronous
> wait on the session. Or, just by creating a ModuleCCSession instead of
> CCSession as the second one. That could be easy. Should I try if it
helps?
It depends on the amount of work, which I'm not really sure about.
But my personal impression at that moment is that it's better to
complete this task and create another one for it.
> > - a lettuce test (xfrin_notify_handling.feature) often, but not
> > always, fails. I'm not sure if it's related to this branch.
[...]
> I'd propose not trying to hunt down the lettuce timing here, but create
a
> ticket for it.
That's fine.
> > - `__notify_start_forwarder`: also not related to this branch, but
> > this case is one example where just getting a response from one
> > instance (of possibly multiple ones) of the component isn't
> > sufficient. We'll have to clarify this in the protocol doc (and
> > refinement) task and revise the implementation so it'll work
> > robustly.
>
> That is true, but just logging here seems like not the way to go either.
I
> believe it should just crash the whole DDNS in such case.
Probably.
> Anyway, it is not
> related to this ticket.
Right, this can be left open for this ticket.
> > see the detailed description of DDNS_UPDATE_NOTIFY_FAIL.
[...]
> Actually, the current behaviour with waiting for one (random) answer is
wrong too, because:
> * They may be no recipients on purpose. We can have no XfrOut running
(because
> the server is not primary). We can have a hidden master updated by
DDNS, not
> running Auth. We can have neither, for example user might have added
their
> own tool to write blog articles automatically about the changes being
made
> and the server just gets copies of the updates sent to the main
server.
> * What if the first of Auths succeeds and the others fail?
At least ddns confirmed auth was running at the time of its startup,
so it's reasonable that it expects to get a response from auth. DDNS
also confirms the updated zone is primary (in an ad hoc way though,
and it's still true that xfrout may not be running). Also, although
not the point of this discussion, a "hidden master" can't work without
auth anyway; xfrout doesn't accept xfr requests directly, at least
right now.
All these said, your point is valid: it's true that it's wrong that
DDNS is satisfied with getting only one answer.
> What I think should be done is:
> * Make sure the sender (DDNS in this case) does not need to know what
the
> recipients will be. In that case, we can run no Auth without any
problem and
> the customer also can run a custom module that wants to know about
the
> updates the developers are not aware of.
> * Leave the responsibility of guaranteed delivery to all on the msgq or
> libraries. Modules should not babysit the communication system and
check if
> it was delivered.
> * If something fails, we currently only log. Let the logging be done in
the
> recipient state.
> * Maybe introduce advanced response tracking and collate it into one
answer of
> form:
> - All OK
> - or These modules failed
I don't fully understand all of them, but I agree these kinds of
solution are generally what we should actually do for this type of
issue.
> > - do_polling: doesn't it change the behavior? Previously if the
> > recipient doesn't exist it should have resulted in a timeout and an
> > exception. It's now reported as `RPCError` and ignored.
> > Furthermore, there now seems to be a possibility that `modules` can
> > be used without set (even if detailed conditions prevent that from
> > happening, it's not obvious from the code, and it means the quality
> > of this code is low - although it's not a fault of your change).
>
> I changed it to mimic the original. But I don't understand what you mean
by the
> „without set“ part.
I guess the revised code highlights the concern more clearly.
Consider what happens in `type(value)` after rpc_call raises
`RPCError`.
{{{#!python
try:
value = self.mccs.rpc_call('show_processes', 'Init')
except isc.config.RPCRecipientMissing:
# This has been SessionTimeout before, so we keep the original
# behavior.
raise
except isc.config.RPCError:
# TODO: Is it OK to just pass? As part of refactoring,
preserving
# the original behaviour.
pass
if type(value) is list:
}}}
> > - update_modules: likewise, the behavior seems to be slightly changed.
>
> It is changed only slightly. If the module doesn't exist, it raises
StatsError
> instead of SessionTimeout. But I don't think either is handled on the
above
> levels. I can return the original, but I don't know it is important
enough to
> complicate the code because of that.
I'm okay with keeping the code without converting the exception per
se. This should have caught tests (and made us think about it),
though; it suggests stats are only poorly tested (not an issue of this
branch itself).
> > - about the FIXME comment: as stated above, IMO it's another
> > incident that xfr's are a student's hobby project, not a
> > production-quality system, that it simply ignores the response or
> > timeout. (But this specific case may not have to be an "RPC" anyway
> > - all xfrin should know and handle if the message was surely
> > delivered to zonemgr).
>
> Do you propose to fix it in this branch? I don't think it is in scope.
The
> errors were ignored before.
No, it's just a comment about the FIXME as it contained a question.
It should be fixed so it won't be a student toy, but that's outside
the scope of this task.
On the revised branch:
'''other changes'''
I made a few unrelated cleanups.
'''ddns.py.in'''
- apparently I forgot to note this before, but see (and confirm)
3bec0f1. We should now be able to remove that comment.
'''stats.py.in'''
- s/round-time trips/round-trip time(s)/?
{{{
# collect all the answers. This eliminates some of the round-
time
# trips. Unfortunately, rpc_call is not flexible enough to
allow
}}}
--
Ticket URL: <http://bind10.isc.org/ticket/2676#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list