[bind10-dev] BIND 10 #1522: Implementatio of isc::server_common::SocketRequestor

Mon Jan 2 16:50:34 UTC 2012

Hello

As the trac is down today, I'm responding to the automatic email. I'll post a
overview once it is up there, so it is complete (others probably can ignore it,
but trac allows anyone to look as well, so I send it to the list).

On Fri, Dec 30, 2011 at 08:11:48AM -0000, BIND 10 Development wrote:
>  > Anyway, the code as is looks OK, but I noticed one more problem I
>  overlooked before. I don't see the code sending the token over the unix
>  domain socket to tell the boss which socket is being requested.
> 
>  Is there any documentation that describes the protocol between the
>  boss and other clients?  I cannot find it in bind10/README or in
>  lib/bind10/README, so I've looked at the bind10.py implementation and
>  guessed what's expected, and updated the code based on it.

No, I didn't find the time to write the documentation yet and I doubt anyone
else did, as it was mostly me (with the exception of this particular branch) who
worked on the socket creator. I know I should have, so I'll probably create a
ticket for it to do it sometime soon.

>  This caused an additional, unexpected issue: if the remote end (in
>  practice, boss) closes the connection first, requestSocket() would
>  result in SIGPIPE (and the application would die).  Just like we did
>  for `SocketSession`, we need to at least ignore that signal.  I've
>  added that code too (btw I suspect this is also the case for the bos -
>  an app could die after requestSocket(), which would automatically
>  close the connection).  The fact that we did the same trick again is
>  another reason why we should unify this and `SocketSession` (but
>  that's beyond the scope of our current work).

Well, this wouldn't be such a problem, because boss is one of the components
that aren't allowed to crash. And it doesn't close the socket anywhere in its
code. But maybe better to have it there.

>  > Anyway, it would be bad to close the connection. That would signal the
>  boss to reclaim all the sockets transferred previously.
> 
>  Hmm, on further considering this and other bullet points below, I now
>  think we should probably manage the states of open sockets at the boss
>  and applications separately from the channel between them (i.e, the
>  connection between UNIX domain sockets).  We should probably introduce
>  (re)synchronization mechanism about the open socket states between the
>  boss and the application when the connection is (re)established, and
>  then we can simply drop the connection if it's possibly in an unusable
>  state (and re-establish a new one).

Well, they are bound to the socket they were sent over for a simple reason: we
need to know when an application crashed, to release its sockets. And it seems
an overkill to do so much to handle a possible bug, at last for this stage of
the project, it seems to me.

And anyway, I consider this code to be mostly temporary, for two reasons:
• If/when we adopt dbus instead of msgq, we can register at the buss to tell
  the boss when an application that sent given request disconnects from the
  boss.
• If/when we adopt dbus, we might want to get rid of all this low-level
  error-prone code and send the sockets over the dbus (it doesn't allow sending
  them on windows, but our current code doesn't either). And the socket could go
  directly in the response to the request, avoiding this two-phase way.

So, having a note in the code or low-priority ticket for it is OK, but I'd be
against trying to solve it now or in the next sprint or like that.

>  > Hmm, what do you mean? Like if the boss sent a "2\n" and the socket?
>  Yes, then it would leak, but can we really avoid that in any way?
> 
>  See the third bullet, for example.  If we pass the pair of status (1
>  byte) and FD via a single sendmsg()/recvmsg() (the status will be in
>  the 1-byte "normal" data; the FD is passed as ancillary data),
>  requestSocket() could examine both the FD and status and can close the
>  socket (e.g.) if FD is bogus.

But if the „status“ byte would be bogus, the fd could leak anyway, couldn't it?
If there was some?

Anyway, I do agree that sending it within the sendmsg packet, all at once, is a
good idea.

>  > > - I personally don't think it the best idea to signal the case of
>  > >   "CREATOR_SOCKET_UNAVAILABLE" by exception.  At the very least it
>  > >   should be specifically separated from other exceptions such as the
>  > >   one as a result of bogus status data; otherwise, assuming the app
>  > >   should generally drop the connection on the latter case, we could
>  > >   drop and re-establish the connection unnecessarily
>  >
>  > I'd like to point that such situation shouldn't happen in practice and
>  would be same kind of bug as a wrong data sent, etc. In the time the
>  client picks up the socket, it was already prepared and reserved. It can
>  be unavailable only in case the token is wrong.
> 
>  I thought "unavailable" can be possible for a reason we cannot
>  control, such as unrelated third-party application already opens the
>  port.  Am I mistaken?

We might have misunderstood each other. The CREATOR_SOCKET_UNAVAILABLE constant
is for the case I described. If the socket can't be created for reasons like the
port being taken already, the result would be sent over the msgq. But you're
right this case isn't signalled by a different exception now. The problem lies
within boss, as it sends the same error status for all errors, which should be
improved.

Another cleanup ticket, maybe?

>  Hmm, I'm not sure whether you mean the newline is good or bad by the
>  Python thing, but as long as we use multi-byte data that's probably
>  just a matter of taste.  More substantially, if we could switch to
>  1-byte status code (obviously it would exclude a newline), we'll be
>  able to pass the pair of status code and FD via a single call to
>  sendmsg()/recvmsg(), which will make error handling simpler.

My point was, if python did line-oriented buffering, the newline would force it
to flush. It seems a crazy idea to thing there would be line-oriented buffering,
looking back, but as you say, this may turn out to be irrelevant when we go for
the status with sendmsg.

Few notes for the new code:
• Is there no logging when the request fails for some reason (error response
  over msgq, CREATOR_SOCKET_UNAVAILABLE, other failures…).
• The protocolString function could be used inside the
  createRequestSocketMessage as well and reuse some code.
• Is it possible to make gtest produce a warning (similar with the „You have
  disabled tests“ line at the bottom of the tests) if some of the tests are
  skipped when it is impossible to do the timeout?
• When reading the tokens in the child, the hardcoded length is 4. Maybe a note
  to the function it shouldn't be used with tokens of different lengths?
• What is wrong with ssize_t in 85bf91d5836306a39b471567952ad1e0af91d948?
  ssize_t is signed, so the error handling should work, and seems more clean
  than int, as it is 64bit on 64bit systems (therefore it allows sending more
  than 2GB of data at once, though it is unlikely anyone would do that).

Thank you

With regards

-- 
Let me show you my collection of bugs.

Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20120102/08a8480c/attachment.bin>