BIND 10 #2738: Clarify high-level design of the CC protocol

Wed Apr 3 08:55:28 UTC 2013

#2738: Clarify high-level design of the CC protocol
-------------------------------------+-------------------------------------
            Reporter:  vorner        |                        Owner:
                Type:  task          |  jinmei
            Priority:  medium        |                       Status:
           Component:  Inter-module  |  reviewing
  communication                      |                    Milestone:
            Keywords:                |  Sprint-20130423
           Sensitive:  0             |                   Resolution:
         Sub-Project:  DNS           |                 CVSS Scoring:
Estimated Difficulty:  5             |              Defect Severity:  N/A
         Total Hours:  0             |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Replying to [comment:7 jinmei]:
 > - I think the high level design should use a higher level abstraction
 >   of message "bus" (or "system" or whatever), and should be described
 >   without the concept of msgq (which is just a specific implementation
 >   of the bus).

 OK, I'm calling it „the daemon“ now. But I doubt there'll ever be other
 implementation.

 > - I'd like to clarify the response (answer) semantics for broadcasting
 >   (i.e. "addressing by group").  If we expect it to work in some way,
 >   we should describe how it should work in more detail.  But, from
 >   what I've read from this document, I guess an implicit assumption is
 >   that we actually didn't expect it to work; if the sender needs a
 >   response to the same single message from multiple recipients, it
 >   should first get a list of individual recipients and send a direct
 >   message to each of them.  I'm okay with that model, but then I
 >   suggest explicitly prohibiting (or at least discouraging, saying
 >   "the behavior is undefined and you shouldn't do it) broadcasting
 >   with expecting an answer(s).

 I added something less scary than undefined behaviour, I described what
 the problem is. But I did say it is discouraged.

 > - "Undeliverable notification": I guess we should revisit this concept
 >   at a higher level, while I see it addresses some real issues due to
 >   our current implementation details.  First, it only matters when the
 >   sender requests an answer.  And, since an answer can be delayed due
 >   to a reason at the recipient side, which the sender cannot control,
 >   if the sender cannot do blocking wait for the answer it must wait
 >   for it asynchronously anyway; in the case the sender can do blocking
 >   wait, it might be a good optimization to tell the sender the failure
 >   sooner, but it seems to be a implementation-specific bonus feature,
 >   rather than a matter of higher level design.  So my personal
 >   suggestion is to remove it from this high level design (or if we
 >   mention it, clarify it's an implementation level optimization and
 >   modules that can't block should need asynchronous read anyway).

 I don't really agree here it's only optimisation. There are modules that
 are
 not expected to take long to answer. For example the statistics daemon
 doesn't
 do anything but collect and answer statistics. But it doesn't have to be
 there.

 Let's say the user requests statistics over bindctl. We can do a blocking
 wait
 there, because user won't do anything in the one second before the answer
 arrives (considering a very slow system, it would take less usually). If
 the
 stats daemon is not there, the answer is delivered right away.

 But if the notifications were not there, there would be a timeout (which
 I'd
 like to be in order of minutes rather than the current 10 seconds or so;
 generally the timer firing would mean a serious bug somewhere and there
 should
 be a very loud error message in logs about it), the user would have to
 wait a
 long time to discover there'll be no answer. Such interface would be
 considered
 broken.

 Other case would be, imagine a model of xfrin where we fork for each
 transfer.
 The transfer is done in the child and the child then reports over msgq
 that it
 finished and asks if there's another transfer it should do or terminate
 (OK, we
 probably won't have this model of xfrin, but we could, it's not that much
 crazy). Now, because the child has nothing to do before it gets the answer
 anyway, it may just wait blocking for whatever short time it would take
 the
 xfrin master (which wouldn't do any long-term task itself, so if it took
 long,
 it would be a bug) to answer. But if it got a notification „there's no
 master“,
 it knows it should terminate, with corresponding error in the logs.

 Obviously, there are cases where 1 second is too long (auth server) or
 where
 the task might take longer time, and we would want to have an asynchronous
 read. But I believe there are places for the blocking waiting for answer.
 So I
 added a note warning that a blocking wait may be wrong and leave it for
 consideration of whoever implements that.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2738#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development