BIND 10 #988: Infinite loop on xfrout

BIND 10 Development do-not-reply at isc.org
Wed Jun 1 07:48:50 UTC 2011


#988: Infinite loop on xfrout
-------------------------------------+-------------------------------------
            Reporter:  shane         |                        Owner:
                Type:  defect        |                       Status:  new
            Priority:  major         |                    Milestone:  New
           Component:  Unclassified  |  Tasks
           Sensitive:  0             |                     Keywords:
         Sub-Project:  DNS           |              Defect Severity:
Estimated Difficulty:  0             |  Medium
         Total Hours:  0             |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
 There is an infinite loop in xfrout on my server.

 The process list:

 {{{
 root at h:/opt/bind10/var/bind10-devel/log# ps -eLf | grep xfrout
 root       625   621   625  0    5 May25 pts/1    00:00:01
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621   632  0    5 May25 pts/1    00:00:00
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621   633  0    5 May25 pts/1    00:00:52
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621  3771 96    5 May28 pts/1    3-04:54:56
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621  3772  0    5 May28 pts/1    00:00:00
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root     12322 12117 12322  0    1 07:42 pts/0    00:00:00 grep xfrout
 }}}

 It appears to be getting a read event on a file descriptor, and then
 getting 0 bytes when reading, and then not taking action:

 {{{
 root at h:/opt/bind10/var/bind10-devel/log# strace -p 3771 2>&1 | head -10
 Process 3771 attached - interrupt to quit
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 }}}

 This file descriptor is for the Unix-domain socket used to transport file
 descriptors:

 {{{
 root at h:/opt/bind10/var/bind10-devel/log# lsof -p 625 | grep 13u
 b10-xfrou 625 root   13u  unix 0xffff88001e642900       0t0   35734
 /opt/bind10/var/auth_xfrout_conn
 }}}

 This is almost certainly being returned to notify_out.py:

 {{{
     def _get_notify_reply(self, sock, tgt_addr):
         try:
             msg, addr = sock.recvfrom(512)
         except socket.error:
             self._log_msg('error', "notify to %s failed: can't read notify
 reply" % addr_to_str(tgt_addr))
             return None

         return msg
 }}}

 What should happen is that an error needs to be recognized and the socket
 needs to be re-opened.

 Ideally logging will be added to find out about when this happens so we
 can figure out why we are getting this response.

-- 
Ticket URL: <http://bind10.isc.org/ticket/988>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list