BIND 10 #2738: Clarify high-level design of the CC protocol
BIND 10 Development
do-not-reply at isc.org
Wed Apr 3 08:55:28 UTC 2013
#2738: Clarify high-level design of the CC protocol
-------------------------------------+-------------------------------------
Reporter: vorner | Owner:
Type: task | jinmei
Priority: medium | Status:
Component: Inter-module | reviewing
communication | Milestone:
Keywords: | Sprint-20130423
Sensitive: 0 | Resolution:
Sub-Project: DNS | CVSS Scoring:
Estimated Difficulty: 5 | Defect Severity: N/A
Total Hours: 0 | Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Changes (by vorner):
* owner: vorner => jinmei
Comment:
Hello
Replying to [comment:7 jinmei]:
> - I think the high level design should use a higher level abstraction
> of message "bus" (or "system" or whatever), and should be described
> without the concept of msgq (which is just a specific implementation
> of the bus).
OK, I'm calling it „the daemon“ now. But I doubt there'll ever be other
implementation.
> - I'd like to clarify the response (answer) semantics for broadcasting
> (i.e. "addressing by group"). If we expect it to work in some way,
> we should describe how it should work in more detail. But, from
> what I've read from this document, I guess an implicit assumption is
> that we actually didn't expect it to work; if the sender needs a
> response to the same single message from multiple recipients, it
> should first get a list of individual recipients and send a direct
> message to each of them. I'm okay with that model, but then I
> suggest explicitly prohibiting (or at least discouraging, saying
> "the behavior is undefined and you shouldn't do it) broadcasting
> with expecting an answer(s).
I added something less scary than undefined behaviour, I described what
the problem is. But I did say it is discouraged.
> - "Undeliverable notification": I guess we should revisit this concept
> at a higher level, while I see it addresses some real issues due to
> our current implementation details. First, it only matters when the
> sender requests an answer. And, since an answer can be delayed due
> to a reason at the recipient side, which the sender cannot control,
> if the sender cannot do blocking wait for the answer it must wait
> for it asynchronously anyway; in the case the sender can do blocking
> wait, it might be a good optimization to tell the sender the failure
> sooner, but it seems to be a implementation-specific bonus feature,
> rather than a matter of higher level design. So my personal
> suggestion is to remove it from this high level design (or if we
> mention it, clarify it's an implementation level optimization and
> modules that can't block should need asynchronous read anyway).
I don't really agree here it's only optimisation. There are modules that
are
not expected to take long to answer. For example the statistics daemon
doesn't
do anything but collect and answer statistics. But it doesn't have to be
there.
Let's say the user requests statistics over bindctl. We can do a blocking
wait
there, because user won't do anything in the one second before the answer
arrives (considering a very slow system, it would take less usually). If
the
stats daemon is not there, the answer is delivered right away.
But if the notifications were not there, there would be a timeout (which
I'd
like to be in order of minutes rather than the current 10 seconds or so;
generally the timer firing would mean a serious bug somewhere and there
should
be a very loud error message in logs about it), the user would have to
wait a
long time to discover there'll be no answer. Such interface would be
considered
broken.
Other case would be, imagine a model of xfrin where we fork for each
transfer.
The transfer is done in the child and the child then reports over msgq
that it
finished and asks if there's another transfer it should do or terminate
(OK, we
probably won't have this model of xfrin, but we could, it's not that much
crazy). Now, because the child has nothing to do before it gets the answer
anyway, it may just wait blocking for whatever short time it would take
the
xfrin master (which wouldn't do any long-term task itself, so if it took
long,
it would be a bug) to answer. But if it got a notification „there's no
master“,
it knows it should terminate, with corresponding error in the logs.
Obviously, there are cases where 1 second is too long (auth server) or
where
the task might take longer time, and we would want to have an asynchronous
read. But I believe there are places for the blocking waiting for answer.
So I
added a note warning that a blocking wait may be wrong and leave it for
consideration of whoever implements that.
--
Ticket URL: <http://bind10.isc.org/ticket/2738#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list