BIND 10 #2934: xfrout session can be broken due to EAGAIN
BIND 10 Development
do-not-reply at isc.org
Mon Apr 29 16:41:50 UTC 2013
#2934: xfrout session can be broken due to EAGAIN
-------------------------------------+-------------------------------------
Reporter: | Owner:
jinmei | Status: new
Type: | Milestone: Next-Sprint-
defect | Proposed
Priority: | Keywords:
medium | Sensitive: 0
Component: | Sub-Project: DNS
xfrout | Estimated Difficulty: 0
CVSS Scoring: | Total Hours: 0
Defect Severity: N/A |
Feature Depending on Ticket: |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
I noticed xfrout-ing a large zone from b10-xfrout can be abruptly
terminated if I dump the transferred record to a terminal using 'dig
axfr'. On a closer look it seems `XfroutSession._send_data()` raises
(an exception due to) EAGAIN:
{{{#!python
while total_count < size:
count = os.write(sock_fd, data[total_count:])
total_count += count
}}}
(It should be reproducible even more easily by, e.g., starting axfr
with dig and suspend it before it completes).
It might be system dependent, but on my system sock_fd is non
blocking (probably derived from the original TCP socket with which
b10-auth received the AXFR query), which is the reason for the error.
While this might be relatively minor, it should easily happen in real
world, due to a slow link or packet loss, etc, too. So I think we
should fix it sooner.
A cleanest solution would be to do the asynchronous write correctly,
communicating with the parent thread so it can gracefully terminate on
shutdown. But, assuming we'll redesign xfr* fundamentally, an easier
workaround is sufficient: making the FD (socket) non blocking. I'm
attaching a patch to do this. I confirmed it solved the problem.
--
Ticket URL: <http://bind10.isc.org/ticket/2934>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list