BIND 10 #1522: Implementatio of isc::server_common::SocketRequestor

Mon Jan 2 17:48:40 UTC 2012

#1522: Implementatio of isc::server_common::SocketRequestor
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  vorner                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120110
  blocker                            |            Resolution:
                  Component:         |             Sensitive:  0
  Unclassified                       |           Sub-Project:  Core
                   Keywords:         |  Estimated Difficulty:  0
            Defect Severity:  N/A    |           Total Hours:  10
Feature Depending on Ticket:         |
  Socket creator                     |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Trac is back, so the answer should go here as well.

 Michal 'vorner' Vaner
 Replying to [comment:14 jinmei]:
 > Replying to [comment:13 vorner]:
 > Is there any documentation that describes the protocol between the
 > boss and other clients?  I cannot find it in bind10/README or in
 > lib/bind10/README, so I've looked at the bind10.py implementation and
 > guessed what's expected, and updated the code based on it.

 No, I didn't find the time to write the documentation yet and I doubt
 anyone
 else did, as it was mostly me (with the exception of this particular
 branch) who
 worked on the socket creator. I know I should have, so I'll probably
 create a
 ticket for it to do it sometime soon.

 > This caused an additional, unexpected issue: if the remote end (in
 > practice, boss) closes the connection first, requestSocket() would
 > result in SIGPIPE (and the application would die).  Just like we did
 > for `SocketSession`, we need to at least ignore that signal.  I've
 > added that code too (btw I suspect this is also the case for the bos -
 > an app could die after requestSocket(), which would automatically
 > close the connection).  The fact that we did the same trick again is
 > another reason why we should unify this and `SocketSession` (but
 > that's beyond the scope of our current work).

 Well, this wouldn't be such a problem, because boss is one of the
 components
 that aren't allowed to crash. And it doesn't close the socket anywhere in
 its
 code. But maybe better to have it there.

 > > Anyway, it would be bad to close the connection. That would signal the
 boss to reclaim all the sockets transferred previously.
 >
 > Hmm, on further considering this and other bullet points below, I now
 > think we should probably manage the states of open sockets at the boss
 > and applications separately from the channel between them (i.e, the
 > connection between UNIX domain sockets).  We should probably introduce
 > (re)synchronization mechanism about the open socket states between the
 > boss and the application when the connection is (re)established, and
 > then we can simply drop the connection if it's possibly in an unusable
 > state (and re-establish a new one).

 Well, they are bound to the socket they were sent over for a simple
 reason: we
 need to know when an application crashed, to release its sockets. And it
 seems
 an overkill to do so much to handle a possible bug, at last for this stage
 of
 the project, it seems to me.

 And anyway, I consider this code to be mostly temporary, for two reasons:
  * If/when we adopt dbus instead of msgq, we can register at the buss to
 tell
    the boss when an application that sent given request disconnects from
 the
    boss.
  * If/when we adopt dbus, we might want to get rid of all this low-level
    error-prone code and send the sockets over the dbus (it doesn't allow
 sending
    them on windows, but our current code doesn't either). And the socket
 could go
    directly in the response to the request, avoiding this two-phase way.

 So, having a note in the code or low-priority ticket for it is OK, but I'd
 be
 against trying to solve it now or in the next sprint or like that.

 > > Hmm, what do you mean? Like if the boss sent a "2\n" and the socket?
 Yes, then it would leak, but can we really avoid that in any way?
 >
 > See the third bullet, for example.  If we pass the pair of status (1
 > byte) and FD via a single sendmsg()/recvmsg() (the status will be in
 > the 1-byte "normal" data; the FD is passed as ancillary data),
 > requestSocket() could examine both the FD and status and can close the
 > socket (e.g.) if FD is bogus.

 But if the „status“ byte would be bogus, the fd could leak anyway,
 couldn't it?
 If there was some?

 Anyway, I do agree that sending it within the sendmsg packet, all at once,
 is a
 good idea.

 > > > - I personally don't think it the best idea to signal the case of
 > > >   "CREATOR_SOCKET_UNAVAILABLE" by exception.  At the very least it
 > > >   should be specifically separated from other exceptions such as the
 > > >   one as a result of bogus status data; otherwise, assuming the app
 > > >   should generally drop the connection on the latter case, we could
 > > >   drop and re-establish the connection unnecessarily
 > >
 > > I'd like to point that such situation shouldn't happen in practice and
 would be same kind of bug as a wrong data sent, etc. In the time the
 client picks up the socket, it was already prepared and reserved. It can
 be unavailable only in case the token is wrong.
 >
 > I thought "unavailable" can be possible for a reason we cannot
 > control, such as unrelated third-party application already opens the
 > port.  Am I mistaken?

 We might have misunderstood each other. The CREATOR_SOCKET_UNAVAILABLE
 constant
 is for the case I described. If the socket can't be created for reasons
 like the
 port being taken already, the result would be sent over the msgq. But
 you're
 right this case isn't signalled by a different exception now. The problem
 lies
 within boss, as it sends the same error status for all errors, which
 should be
 improved.

 Another cleanup ticket, maybe?

 > > The newline is there to preserve some kind of consistency, and I
 wasn't sure if python wouldn't want to do some kind of line based
 buffering (well, it probably wouldn't).
 >
 > Hmm, I'm not sure whether you mean the newline is good or bad by the
 > Python thing, but as long as we use multi-byte data that's probably
 > just a matter of taste.  More substantially, if we could switch to
 > 1-byte status code (obviously it would exclude a newline), we'll be
 > able to pass the pair of status code and FD via a single call to
 > sendmsg()/recvmsg(), which will make error handling simpler.

 My point was, if python did line-oriented buffering, the newline would
 force it
 to flush. It seems a crazy idea to thing there would be line-oriented
 buffering,
 looking back, but as you say, this may turn out to be irrelevant when we
 go for
 the status with sendmsg.

 Few notes for the new code:
  * Is there no logging when the request fails for some reason (error
 response
    over msgq, CREATOR_SOCKET_UNAVAILABLE, other failures…).
  * The protocolString function could be used inside the
    createRequestSocketMessage as well and reuse some code.
  * Is it possible to make gtest produce a warning (similar with the „You
 have
    disabled tests“ line at the bottom of the tests) if some of the tests
 are
    skipped when it is impossible to do the timeout?
  * When reading the tokens in the child, the hardcoded length is 4. Maybe
 a note
    to the function it shouldn't be used with tokens of different lengths?
  * What is wrong with ssize_t in 85bf91d5836306a39b471567952ad1e0af91d948?
    ssize_t is signed, so the error handling should work, and seems more
 clean
    than int, as it is 64bit on 64bit systems (therefore it allows sending
 more
    than 2GB of data at once, though it is unlikely anyone would do that).

-- 
Ticket URL: <http://bind10.isc.org/ticket/1522#comment:17>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development