BIND 10 #420: Unresponsive process can block msgq

BIND 10 Development do-not-reply at isc.org
Thu Jan 20 18:59:41 UTC 2011


#420: Unresponsive process can block msgq
-------------------------------------+-------------------------------------
                 Reporter:  shane    |                Owner:  UnAssigned
                     Type:  defect   |               Status:  reviewing
                 Priority:  major    |            Milestone:  A-Team-
                Component:  msgq     |  Sprint-20110126
                 Keywords:           |           Resolution:
Estimated Number of Hours:  13.0     |            Sensitive:  0
                Billable?:  1        |  Add Hours to Ticket:  0
                Internal?:  0        |          Total Hours:  0.5
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => UnAssigned
 * status:  accepted => reviewing


Comment:

 The fix itself is probably straightforward ‒ buffer data that do not fit
 into the socket (in non-blocking way) and if the socket does not eat
 anything in 0.1 seconds, just drop the connection. This fixes the fact
 that sock.send() might send less data then actually passed. Maybe we might
 want that timeout configurable in future and maybe something more should
 be done in case of the timeout (like telling it to boss or something).
 Should we create a ticket for it?

 There are two kinds of tests now. Two tests hammer it with data that
 nobody reads and expects the socket to get closed and therefore raise an
 exception on the next write. The other two try pinging (I added ping
 command for that reason) the msgq for some time to see the answers are
 returning and that the msgq will not close it if the connection works.

 I fixed a place where None was put into dictionary instead of deleting the
 entry, which could potentially lead to resource leak, when I saw it in the
 code.

 There are two kinds of selecting sockets in msgq ‒ the default is poll,
 but it has kqueue as fallback. Is there anyone with the system that
 actually needs the fallback? Could such person test it works with it as
 well?

 I didn't reproduce the original bug (the freeze of system), but if anyone
 can test it in reality, it would be nice as well.

-- 
Ticket URL: <https://bind10.isc.org/ticket/420#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list