BIND 10 #1705: attempt to run multiple auth servers causes FATAL [b10-auth.server_common] SRVCOMM_EXCEPTION_ALLOC exception when allocating a socket: File exists

BIND 10 Development do-not-reply at isc.org
Thu Feb 23 15:08:34 UTC 2012


#1705: attempt to run multiple auth servers causes FATAL [b10-auth.server_common]
SRVCOMM_EXCEPTION_ALLOC exception when allocating a socket: File exists
-------------------------------------+-------------------------------------
                   Reporter:  jreed  |                 Owner:  UnAssigned
                       Type:         |                Status:  reviewing
  defect                             |             Milestone:
                   Priority:  major  |  Sprint-20120306
                  Component:         |            Resolution:
  Inter-module communication         |             Sensitive:  0
                   Keywords:         |           Sub-Project:  Core
            Defect Severity:  N/A    |  Estimated Difficulty:  0
Feature Depending on Ticket:         |           Total Hours:  0
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => UnAssigned
 * status:  accepted => reviewing
 * subproject:  DNS => Core
 * component:  b10-auth => Inter-module communication
 * milestone:  New Tasks => Sprint-20120306


Comment:

 The problem seems to be from the category „You'll probably not believe me,
 but I'm telling the truth, even if it sounds incredible…“.

 So, the file exists error comes from `epoll_ctl` when adding a new socket
 from inside an asio tcp acceptor. It means the file descriptor being added
 to the watcher (or whatever it is) is already there. Which was strange,
 because the file descriptor was just received from the boss when it was
 added. However, it turned out that the recvmsg actually does return the
 same file descriptor multiple times (maybe something inside the kernel
 gets confused when we are sending the same file descriptor to multiple
 applications).

 I found a workaround that seems to help ‒ I dup the file descriptor after
 I receive it and close the original one. Now they are not being
 duplicated, but I fear two things:
  * There's a bug in linux kernel and we should report it.
  * When it can create a duplicate FD to one the recvmsg returned, it might
 as well hit a different one. I don't know how to check this, but it could
 be quite a disaster if it did.

 And, I'm not sure how to write any kind of test for this. I propose we
 include this fix now and create a ticket to investigate the kernel or
 something. I'd be very interesting where this comes from.

 As the feature was not in previous release, I don't think it needs a
 changelog entry (and I wouldn't like to describe the error there in few
 sentences).

-- 
Ticket URL: <http://bind10.isc.org/ticket/1705#comment:3>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list