BIND 10 #2738: Clarify high-level design of the CC protocol
BIND 10 Development
do-not-reply at isc.org
Mon Apr 8 08:53:32 UTC 2013
#2738: Clarify high-level design of the CC protocol
-------------------------------------+-------------------------------------
Reporter: vorner | Owner:
Type: task | jinmei
Priority: medium | Status:
Component: Inter-module | reviewing
communication | Milestone:
Keywords: | Sprint-20130423
Sensitive: 0 | Resolution:
Sub-Project: DNS | CVSS Scoring:
Estimated Difficulty: 5 | Defect Severity: N/A
Total Hours: 0 | Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Changes (by vorner):
* owner: vorner => jinmei
Comment:
Hello
Replying to [comment:11 jinmei]:
> First, please be careful about the status of the branch: I
> accidentally merged a different branch to trac2738, then reverted it,
> and restored the original commits by cherry-pick (apparently I also
> did something wrong in the revert). I hope I've recovered the
> original state, but please check if it doesn't break anything.
I examined the branch, concluded the history doesn't look nice (and the
branch included some non-reverted changes, I believe you managed to revert
the other side of the merge than you wanted) and rebuilt it completely.
So, update the branch by:
{{{
git fetch
git reset --hard origin/trac2738
}}}
instead of `git pull`.
> Secondly, after thinking over details I realized I had a higher level
> concern and/or perhaps I really didn't understand what was expected in
> this task. The current documentation seems to mixture of some high
> (e.g., concept of RPC - which seems to be a convenient wrapper on top
> of the IPC system, not the essential part of the system itself) and
> low level concepts (e.g., mention of EINTR), while still missing
> defining basic notions (client, lname, group, etc).
Actually, I believe it was me who didn't understand the goal properly;
after all, the ticket was created because you asked for it in some review.
On the other hand, I don't really understand the reason for this kind of
document. Is it clarification for us, how we should use it (which was my
original impression), or documentation for others? If it's the first, do
we really need to formally define what a session is? Everybody of us know.
If the second, how low-level do we want to go? Because anyone above the
cc.session library will never get in contact with the getlname message,
for example. And the people who go digging inside the library will need to
read the code anyway (since there's not enough detail for it here) or at
least the low-level CC protocol.
Anyway, I'd like to discuss the goal and also the details in your version
first, before trying to merge them, so I didn't update anything on the
branch.
Is the content meant to be an example only, or really what you envision?
Because I don't think I can agree with all of the things there:
* I don't think the session establishment is non-blocking.
* Is it OK to consider the synchronous read non-blocking, even in the
case of talking to msgq? I'm OK with calling it fast, but I don't think we
can call it non-blocking.
* What error can be reported by asynchronous read (I mean low-level
error, not a payload signifying an error response)? Would there be error
reading the message, it is impossible to decide which callback to pick, so
it would be handled somewhere else than the callback.
* Also, I believe currently the asynchronous read does not have any
timeout. Being it asynchronous, it is possible to implement timeout on the
client side, with synchronous it is not possible.
* The concept of watch you describe seems different than what I though,
on several levels:
- You seem to combine the one-shot command to get list of sessions in a
group with subscribing to the notifications for a given group. I don't
think these should be combined together, you might want one without
another (eg. you don't need to be subscribed to the notifications if you
want to send a command to the whole group, you just need the current list
of subscribers).
- I envisioned the notifications would be done by subscribing to a
group. By combining it together with answer, you make this impossible, so
you actually increase the amount of special-case code both in msgq and
libraries.
- After thinking about how the command-to-group would work, a client
would constantly subscribe and unsubscribe to the notifications of various
groups. So I think it might be better to just subscribe to all these
notifications (not to a specific group, but just to all the groups at
once) and pass both the lname and the group in question with the
notification.
* As mentioned above, the message types seem very low level. On other
hand, I think we should document the higher-level (JSON) somewhere too ‒
format of command, reply, etc.
* The unique ID is in all messages and it is returned from the
sendGroupMsg at all times, not only on the ones that need response. It's
just, the ID is ignored if response is not needed.
* The ID is unique only per sender, so it can track responses. On the
recipient (or whole system level) they are not unique. Also, the per-
sender uniqueness is not actually mandated by the protocol, it would work
even if they were reused, as long the sender no longer expects responses
to the original message with the same ID.
* How can the system know the difference between close and termination?
It just gets EOF in both cases.
* I don't think we should forbid sending a command (eg. message that
wants a response) to a group. As I mentioned, there are „singleton groups“
‒ more like aliases than actual groups. Such examples are msgq itself,
cfgmgr, or stats daemon. There's no sense in having multiple of them and
it probably wouldn't even work properly. Requiring the whole round of
getting list of single subscriber, collecting the responses, etc, seems
suboptimal. Also, I expect there'd be two interfaces, one simpler (for
single-recipient RPC) that just provides the response (rpcCall) or calls a
callback with it (rpcCallAsync) and one for collecting all the responses
(rpcCallMulti). The interface of the later must be more complex, because
it needs to return multiple answers/errors at once. I don't think it makes
sense to disallow the simpler interface in the (much more usual) case
where there's at most one subscriber expected.
> At the high level, I thought "daemon" is also too implementation
> specific. Even though we currently implement the "bus" as a separate
> daemon process, I've believed we conceptually regard it as an abstract
> messaging system at this level. In ipc-high-2, I simply called it the
> "system".
I don't know if I can imagine any way without a routing daemon that would
reasonably work, with all these assumptions. But I don't mind „system“
either.
> What's the point of keeping it open? If we leave it open (not
> prohibiting it), it would simply open up the possibility of arbitrary
> understanding and use based on it, just like we currently have it in
> the implementation. Now that we are introducing the more reliable
> membership (subscribers) management/notification framework, it seems
> to me more helpful at the design level to just prohibit it (while
> noting until we fully migrate to the membership notification we need
> to keep using this model of group communication).
I'm not keeping it open, I'm quite clearly saying „Don't do it“, with
explanation why it is wrong. OK, it might be more British „Don't do it“
than American.
If you write „undefined behaviour“, you are not providing the explanation
why it is wrong, and you also scare the person that reads it and finds
occurrence of just this thing in our code (and there are such), because
they can think „Hmm. Now anything can happen, including, for example,
termination of msgq“.
> > I don't really agree here it's only optimisation. There are modules
that are
> > not expected to take long to answer. For example the statistics daemon
doesn't
> > do anything but collect and answer statistics. But it doesn't have to
be there.
>
> On working on another version of doc, I now actually feel it optional
> more strongly. Regarding the statics example above, it would be
> implemented as a by-group communication, right (the cmdctl, on receipt
> of the command from bindctl, sends a message to the "Stats" group)?
> If so, design-wise cmdctl should get the subscriber of the group
> first, because direct group communication with response is either
> undefined at best and discouraged (in the current doc) or prohibited
> (my suggestion). But then the cmdctl doesn't have to rely on the
> "undeliverable" result; it can simply avoid sending the hopeless
> message if there's no subscriber. There's still a subtle case where
> the recipient dies during the message exchange, which could lead to a
> longer timeout, but that also applies to the "undeliverable" case.
Only if you really insists on forbidding the singleton groups (eg. groups
where it's expected to be at most 1 client large).
> - I still don't understand how these would be used:
> {{{
> * Client connected (sent with the lname of the client)
> * Client disconnected (sent with the lname of the client)
> }}}
Well, I imagined these four kinds of notifications can happen (examples):
* Notification: A new client with lname = '12345' connected to the system.
* Notification: Client with lname = '12345' subscribed to group named
'Group'.
* Notification: Client with lname = '12345' unsubscribed from group named
'Group'.
* Notification: Client with lname = '12345' disconnected from the system.
These would be sent to whoever would be subscribed to a group
'SessionManagement' (or any other well-known name).
> - On writing my own version, I realized the RPC call is rather
> considered an application and API sugar on top of the IPC system,
> rather than the part of the system itself. that is, it's an
> encapsulation of send-and-receive operations, and the message data
> somehow mean executing something at the receiver side (and its
> result). But the semantics of the data is basically a matter of
> between two users (clients). The IPC system itself doesn't have to
> care about that level.
Well, RPC is one part of IPC, from the application level. It's true the
„system“ itself (msgq, the isc.cc.session library) doesn't have to care
about it, but the higher level (from the isc.config.Session), the
applications do care. It's part of what we do with the system and it makes
sense to mention it. I think it is worth, at least, note that such
functions exists so clients don't reinvent the wheel (preferably with
description of how each type of communication works).
And, I believe `lname` means „link name“.
--
Ticket URL: <http://bind10.isc.org/ticket/2738#comment:13>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list