BIND 10 trac2738, updated. 379721d9d1280cc517b95a867e199403b6f1cd13 [2738] another version of high level IPC document

BIND 10 source code commits bind10-changes at lists.isc.org
Fri Apr 5 07:54:02 UTC 2013


The branch, trac2738 has been updated
       via  379721d9d1280cc517b95a867e199403b6f1cd13 (commit)
       via  5003aa13afef26cd5d48ac3b06618ccfd9b09226 (commit)
       via  872d42b8d8bb2e22e7724543fcccedf6fa1a683f (commit)
       via  c73f2dec43af11df5da861ff30d44febb7dfc940 (commit)
       via  d10244b603153dfae3327eb98a4cdc185dba3c53 (commit)
       via  1c476d60b788d109854946f96641ee47358dd9ae (commit)
       via  8c75eee71f36c0ce159aa1958f368cd7d8b780ab (commit)
      from  a0a2ac527fb36f1dcf9762f4e659c3446fe902e2 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 379721d9d1280cc517b95a867e199403b6f1cd13
Author: JINMEI Tatuya <jinmei at isc.org>
Date:   Fri Apr 5 00:31:09 2013 -0700

    [2738] another version of high level IPC document

commit 5003aa13afef26cd5d48ac3b06618ccfd9b09226
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Wed Apr 3 10:41:32 2013 +0200

    [2738] [2738] Write considerations

commit 872d42b8d8bb2e22e7724543fcccedf6fa1a683f
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Wed Apr 3 10:24:18 2013 +0200

    [2738] [2738] Clarifications to the text
    
    Many smaller clarifications and corrections. No change to the described
    behaviour.

commit c73f2dec43af11df5da861ff30d44febb7dfc940
Author: JINMEI Tatuya <jinmei at isc.org>
Date:   Tue Apr 2 12:54:45 2013 -0700

    [2738] [2738] corrected a trivial typo

commit d10244b603153dfae3327eb98a4cdc185dba3c53
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Thu Mar 28 13:06:09 2013 +0100

    [2738] [2738] Limitations

commit 1c476d60b788d109854946f96641ee47358dd9ae
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Thu Mar 28 12:57:32 2013 +0100

    [2738] [2738] Ways to communicate

commit 8c75eee71f36c0ce159aa1958f368cd7d8b780ab
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Mon Mar 25 16:50:47 2013 +0100

    [2738] [2738] Primitives of the low-level IPC

-----------------------------------------------------------------------

Summary of changes:
 doc/design/ipc-high-2.txt |  225 +++++++++++++++++++++++++++++++++++++++++++++
 doc/design/ipc-high.txt   |  185 +++++++++++++++++++++++++++++++++++++
 2 files changed, 410 insertions(+)
 create mode 100644 doc/design/ipc-high-2.txt
 create mode 100644 doc/design/ipc-high.txt

-----------------------------------------------------------------------
diff --git a/doc/design/ipc-high-2.txt b/doc/design/ipc-high-2.txt
new file mode 100644
index 0000000..6c34554
--- /dev/null
+++ b/doc/design/ipc-high-2.txt
@@ -0,0 +1,225 @@
+The communication system consists of
+  - the "bus" (or "msgq" (as a concept name; not necessarily mean a
+    daemon process), or whatever).  below we simply call it "(the)
+    system".
+  - users: a user of the system, which would mean some application
+    process in practice, but in this high level design it's a
+    conceptual entity.
+  - sessions: a session is an interface for a user of the
+    system, through which the user can communicate with other users or
+    with the system itself.  A single user can have multiple
+    sessions (but a session only belongs to one user).
+  - messages: a message is a data blob to be exchanged between users
+    or between a user and the system.  each message has the
+    destination, which is either: the system, a specific session, or a
+    group of sessions.  some types of messages expect a response, and
+    some others are expected to be one-way.  a message sent to a group
+    must always be one-way.
+  - groups: a named set of sessions.  a message can be destined to a
+    specific group, in which case the same copy of the message will be
+    delivered to these sessions.
+
+'''Session Interface'''
+
+The session interface is a conceptual programming interface for a user
+of the system.  It encapsulates one single session to the system, and
+provides methods of sending and receiving messages through the session
+(in practice of our implementation, this basically means the `Session`
+class interface).
+
+- Session establishment: any operation on the session interface begins
+  with session establishment.  It must succeeds and is considered a
+  non blocking operation.  If there's any error the user must consider
+  it fatal and terminate.
+
+- Send operation: a user can send a message over the session.  This
+  interface ensures it always succeeds and is non blocking (note:
+  internally, this could be just based on a (naive) assumption, via
+  internal buffering, or whatever).  If any error is reported the user
+  must consider it fatal and terminate.  If the message expects a
+  response, a unique ID is given for identifying the response.
+
+- Synchronous read operation: a user can receive a message from the
+  system or other user (or possibly even itself).  The message could
+  be either a response to a specific message (called a "request") it
+  has sent before, or any new incoming message (maybe a request from
+  other user or one-way message).  In the former case, the user
+  specifies the ID of the request message.  If this is a response from
+  the system, it must succeed and can (effectively) be considered non
+  blocking; in all other cases it can block.  This operation can fail
+  due to "timeout", which means either the receiver of the request is
+  not compliant and didn't respond, or it's extremely busy and non
+  responsive, or simply isn't running.  If any other error is
+  reported, including a timeout for a response from the system, the
+  user must consider it fatal and terminate.
+
+- Asynchronous read operation: the session interface should also
+  support asynchronous read operation, where the user registers an
+  interest on any new incoming message or a response to a specific
+  request message with a callback.  The registration itself must
+  always succeed and is non blocking.  When the request message is
+  delivered, the interface triggers the registered callback.  It could
+  also mean a timeout or any other error (but any other errors must
+  be considered fatal as the synchronous case).
+
+- Group membership management: this interface supports the concept of
+  session "groups".  a group consists of a set of sessions (called
+  "subscribers") that are interested in receiving messages destined to
+  that group.  the following operations are available to manage the
+  subscription:
+  - subscription operation: a user can indicate an interest on
+    receiving messages for the group on the session.  this would
+    actually be realized via a message exchange with the system,
+    and it must succeed and be non block.
+  - unsubscription operation: a user that has subscribed to a group
+    can indicate it's no longer interested in subscribing to the
+    group.  this would actually be realized via a message exchange
+    with the system, and it must succeed and be non block.
+  - watch operation: a user can indicate an interest on a list of
+    current subscribers of a given group.  a watch operation would
+    actually be realized via a message exchange with the system,
+    and it must succeed and be non block.  it will request a response,
+    and the response contains the list of subscriber sessions.  the
+    system periodically sends a message whenever there's a change in
+    the group subscribers.
+  - unwatch operation: a user that has watched a group can indicate
+    it's no longer interested in watching the group.  this would
+    actually be realized via a message exchange with the system, and
+    it must succeed and be non block.  the system will stop sending
+    the periodic message of changes to the group on receiving this
+    message.
+
+- Group communication: there are two ways for a user to communicate
+  with sessions of a group:
+  - sending a message to a group: a user can send a message to the
+    sessions of a group as a single operation.  this interface and the
+    system ensures the message is delivered to all the sessions
+    subscribing (see the "Reliability" bullet below), but the user
+    cannot get a response from the destination sessions.  in fact,
+    there's technically quite difficult, if not impossible, to
+    effectively get responses in this case, and it will make corner
+    cases such as a subset of the sessions is non responsive trickier.
+  - get a list of subscribers of the group using the "watch"
+    operation, and send the same message to each of the subscriber
+    sessions.  In this case the user can request a response, and make
+    a higher level decision on corner cases for the convenience of the
+    user.  in general, this way of communication is preferred unless
+    the user doesn't have to care who actually receives the message
+    (e.g., when the message is purely informational and could even be
+    lost).
+
+- Reliability: this interface, with the help of the system, ensures
+  message delivery is reliable.  When a user sends messages from
+  session A to session B (or to the system), all of the messages will
+  be delivered in the sent order without any modification, as long as
+  the destination session (B) exists.
+
+'''Additional/supplemental concepts'''
+- session: a session is established by the owner user with the
+  system.  it's given a unique (and never reused) ID (throughout the
+  entire system) by the system at the time of establishment.
+  This ID is called "lname" (but we might revisit this naming at this
+  opportunity - no one knows what 'l' means and it could cause
+  unnecessary confusion).
+
+- group: a "group" consists of a set of "subscriber" sessions and a
+  set of "watcher" sessions.  at least one of the two sets must be non
+  empty, but one of them can be empty.  A group is given a name that
+  is unique throughout the system by the users of these sessions.
+  Implementations of users are assumed to have consistent naming
+  policy of the groups.
+
+- message: a blob of data exchanged between users or between a user
+  and the system.  messages are categorized by their "types".  Known
+  types include: "SEND" used for message exchanges between users;
+  "GET LNAME" used between a user and the system so the user gets
+  the ID (lname) of a session; "SUBSCRIBE" used between a user and
+  the system so the user tells the system it wants to subscribe to a
+  group.  Each message also has a "need response" flag.  If it's on,
+  the sender needs a response to the message.  In that case the
+  message contains a unique sequence ID by the sender (unique per the
+  session through which the message is sent).  that sequence ID should
+  be copied in to the corresponding response.  If the "need response"
+  flag is off, the receiver of the message shouldn't respond to it; if
+  it does, that response should be ignored by the original sender.
+
+- message destination: each message of type "SEND" is associated with
+  its intended destination.  its either a group name or a session ID
+  (lname).  If it's a group name, the message is intended to be
+  delivered to its all subscriber sessions.  If it's a session ID,
+  it's intended to be delivered to that specific session.  The
+  destination of a response cannot be a group.
+
+'''System behavior'''
+
+The following may be too detailed for the purpose of the high level
+design doc.  But hopefully it helps understand it more concretely,
+and, in any event, we'll need this level of specification too.
+
+== Session Management ==
+- the system maintains a list of active sessions established by users
+  with their lnames.
+- when a user establishes a session with the system, the system
+  gives a unique ID ("lname") to the session, and adds the pair of the
+  ID and session to the list.
+- if the system receives a "GET LNAME" message that asks the ID of the
+  session through which the message is delivered, it returns a response
+  containing the requested lname in the data.
+- when a user explicitly closes (one of) its session(s), the system
+  immediately knows the corresponding session is now unusable and
+  updates the list accordingly.  
+- when a user terminates, the system immediately knows any unclosed
+  sessions to the user are now unusable and updates the list
+  accordingly.
+
+== Group Management ==
+- the system maintains a list of active groups.
+- if the system receives a "SUBSCRIBE" type message for a group, it
+  adds the receiving session as a subscriber session of the group in
+  the list.  If the group did not exist in the list, it creates a new
+  one.  It also sends a "NEW SUBSCRIBER" (or something) message,
+  containing the newly added session lname, to each of the current
+  watchers of the group.
+- if the system receives a "WATCH" type message for a group, it
+  adds the receiving session as a watcher session of the group in
+  the list.  If the group did not exist in the list, it creates a new
+  one.  This message must have a "need response" flag on, and the
+  system sends a response containing a list of the current subscribers.
+- if the system receives an "UNSUBSCRIBE" type message for a group, it
+  removes the receiving session as a subscriber session of the group
+  from the list.  If both subscribers and watchers for the group become
+  empty, it removes the group from the list.  It also sends a "LEAVING
+  SUBSCRIBER" (or something) message, containing the lname of the
+  leaving session, to each of the watchers of the group.
+- if the system receives a "UNWATCH" type message for a group, it
+  removes the receiving session as a watcher session of the group
+  from the list.  If both subscribers and watchers for the group become
+  empty, it removes the group from the list.  this message should not
+  have a response flag on.
+- when a user explicitly closes (one of) its session(s), the system
+  goes through the group list.  For each group that has the closing
+  session as a subscriber, it handles the session as if it receives an
+  "UNSUBSCRIBE" message over that session for that group.  Likewise,
+  for each group that has the closing session as a watcher, it handles
+  the session as if it receives an "UNWATCH" message over that session
+  for that group.
+- when a user terminates, the system identifies any unclosed
+  sessions to the user, and performs the action of the previous
+  bullet for each of these sessions.
+
+== Message Routing ==
+- if the system receives a "SEND" message from a session destined to
+  another session (specified as its lname), it identifies the
+  corresponding destination session and delivers the message to it.
+- if the system receives a "SEND" message from a session destined to
+  a session group, it identifies the subscriber sessions of the group,
+  and delivers the message to each of these sessions.
+
+'''User behavior'''
+- A user must have at least one session with the system.  It can
+  have multiple sessions.
+- For each established session, the user must first send the "GET
+  LNAME" type of message and wait for a response.  For any subsequent
+  messages sent from that session should have the given lname to
+  indicate the sender.
+- ... and so on
diff --git a/doc/design/ipc-high.txt b/doc/design/ipc-high.txt
new file mode 100644
index 0000000..5bfeada
--- /dev/null
+++ b/doc/design/ipc-high.txt
@@ -0,0 +1,185 @@
+The IPC protocol
+================
+
+While the cc-protocol.txt describes the low-level primitives, here we
+describe how the whole IPC should work and how to use it.
+
+Assumptions
+-----------
+
+We assume the low-level protocol keeps ordering of messages. That is,
+if A sends messages 1 and 2 to B, they get delivered in the same order
+as they were sent. However, if A sends message 1 to B and 2 to C, the
+order in which get them or the order in which they answer is not
+defined.
+
+We also assume that the delivery is reliable. If B gets a message from
+A, it can be sure that all previous messages were delivered too. If A
+sends a message to B, B either gets the message or either A or B is
+disconnected during the attempt.
+
+Also, we expect the messages don't get damaged or modified on their
+way.
+
+On unrecoverable error (errors like EINTR or short read/write are
+recoverable, since there's clear way how to continue without losing
+any messages, errors like connection reset are unrecoverable), the
+client should abort completely. If it deems better to reconnect, it
+must assume anything might have happened during the time and start
+communication from scratch, discarding any knowledge gathered from the
+previous connection (configuration, addresses of other clients, etc).
+
+Addressing
+----------
+
+We can specify the recipient in two different ways:
+
+ * Directly. Each connected client has an unique address. A message
+   addressed to that address is sent only to the one client.
+ * By a group. A client might subscribe to any number of groups.
+   When a message is sent to the group, all clients subscribed to the
+   group receive it. It is legal to send to an empty group.
+
+[NOTE]
+If it is possible a group may contain multiple recipients, it is
+discouraged to send messages expecting an answer addressed to the
+group. It is not known how many answers are to come. See below for
+details on one-to-many communication.
+
+Feedback from the IPC system
+----------------------------
+
+The IPC system generates some additional information to aid the
+communicating clients.
+
+Undeliverable notification::
+  If the client requests it (by a per-message flag) and the set of
+  recipients specified is empty (either because the connection
+  ID/lname is not connected or because the addressed group is empty),
+  an answer message is sent from the daemon to notify it about
+  the situation. However, since the recipient still can take a long
+  time to answer (if it exists), clients that need high availability
+  should not wait for the answer in blocking way.
+Notifications about connections and disconnections::
+  The system generates notification about following events:
+  * Client connected (sent with the lname of the client)
+  * Client disconnected (sent with the lname of the client)
+  * Client subscribed (sent with the name of group and lname of
+    client)
+  * Client unsubscribed (sent with the name of group and lname of
+    client)
+List of group members:
+  The daemon provides a command to list lnames of clients subscribed
+  to given group, and lnames of all connections.
+
+Communication paradigms
+-----------------------
+
+Event notifications
+~~~~~~~~~~~~~~~~~~~
+
+Sometimes, an event that may be interesting to other parts of the
+system happens. The originating module may not know what other modules
+are interested in that kind of event, nor it may know if any at all
+wants to know that. With such event, the originating module does not
+need any feedback.
+
+For each kind or family of notifications, there's a group. Everybody
+interested in that family of notifications subscribes to the group.
+When the event happens, it is sent (broadcasted) to the group, without
+requiring an answer.
+
+[[NOTE]]
+A care should be taken to avoid race conditions. Imagine one module
+provides some kind of state (let's say it's the configuration manager
+and the configuration is the shared state). The other modules are
+using notifications to update their copy when the configuration
+changes (eg. when the configuration changes, the configuration manager
+sends a notification with description of the change).
+
+The correct order is to first subscribe to the notifications and then
+request the whole configuration. If it was done the other way around,
+there would be a short time between the request and the subscription
+when an update to the state could happen without the module noticing.
+
+With first subscribing, the notification could come before the initial
+version is known or arrive even when the initial version already
+includes the change, but these are possible to handle, while the
+missing update is not.
+
+One-to-one RPC call
+~~~~~~~~~~~~~~~~~~~
+
+Sometimes, a process needs to call remote function (or command) in
+other process. An example could be asking the configuration manager
+for the current configuration or asking it to change it, asking single
+process to terminate, etc.
+
+It may be that the group is a singleton group (eg. the command
+manager, there must be exactly one in a running system, and is used
+just as a stable name for the process) or an lname received by means
+of other communication (like a previous subscribe notification).
+
+A command message (containing the parameters, name of the command,
+etc) is sent, with the want-answer flag set. The other side processes
+the command and sends a result or error back.
+
+If the recipient does not exist, the daemon sends an error right away.
+
+There are still two ways this may fail to provide an answer:
+
+ * The receiving module reads the command, but does not provide an
+   answer. Clearly, such module is broken. There should be some (long)
+   timeout for this situation, and loud logging to get it fixed.
+ * The receiving module terminated at the exact time when daemon tried
+   to send to it, or crashed handling the command. Therefore the
+   sender listens for disconnect or unsubscription notifications
+   (depending on if it was sent by lname or group name) and if the
+   recipient disconnects, the sender knows it should not expect the
+   answer any more.
+
+An asynchronous waiting for the answer is preferred.
+
+One-to-many RPC call
+~~~~~~~~~~~~~~~~~~~~
+
+Sometimes it is needed to send a command to bunch of modules at once,
+usually all members of a group that can contain any number of clients.
+
+This would be done by requesting the members of the group from the
+daemon and then sending a one-to-one RPC call to each of them,
+tracking them separately.
+
+[NOTE]
+It might happen the list of group members changes between the time it
+was requested and the time the commands are sent. If a client gets
+disconnected, the sender gets an undeliverable error back from the
+daemon.  If anything else happens (the client unsubscribes, connects,
+subscribes), it must explicitly synchronise to the state anyway,
+because we could have sent the commands before the change actually
+happened and it would look the same to the client.
+
+[WARNING]
+It would look better to first request the list of group members and
+then send the command to the group, and use the list to track the
+answers only. But that is prone to race conditions ‒ if there's any
+change between the request for the member list and sending the
+command, the actual recipients don't match the list and the server
+could get more answers than expected or could wait for answer of a
+module that no longer exists.
+
+Known limitations
+-----------------
+
+It is meant mostly as signalling protocol. Sending millions of
+messages or messages of several tens of megabytes is probably a bad
+idea. While there's no architectural limitation with regards of the
+number of transferred messages or their sizes, the code is not
+optimised and it would probably be very slow.
+
+We currently expect the system not to be at heavy load. Therefore, we
+expect the daemon to keep up with clients sending messages. The
+libraries write in blocking mode, which is no problem if the
+expectation is true, as the write buffers will generally be empty and
+the write wouldn't block, but if it turns out it is not the case, we
+might need to reconsider.



More information about the bind10-changes mailing list