BIND 10 #988: Infinite loop on xfrout

BIND 10 Development do-not-reply at isc.org
Wed Jun 1 09:50:03 UTC 2011


#988: Infinite loop on xfrout
-------------------------------------+-------------------------------------
                   Reporter:  shane  |                 Owner:
                       Type:         |                Status:  new
  defect                             |             Milestone:  New Tasks
                   Priority:  major  |            Resolution:
                  Component:         |             Sensitive:  0
  Unclassified                       |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  0.0
            Defect Severity:         |           Total Hours:  0
  Medium                             |
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Description changed by shane:

Old description:

> There is an infinite loop in xfrout on my server.
>
> The process list:
>
> {{{
> root at h:/opt/bind10/var/bind10-devel/log# ps -eLf | grep xfrout
> root       625   621   625  0    5 May25 pts/1    00:00:01
> /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
> root       625   621   632  0    5 May25 pts/1    00:00:00
> /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
> root       625   621   633  0    5 May25 pts/1    00:00:52
> /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
> root       625   621  3771 96    5 May28 pts/1    3-04:54:56
> /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
> root       625   621  3772  0    5 May28 pts/1    00:00:00
> /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
> root     12322 12117 12322  0    1 07:42 pts/0    00:00:00 grep xfrout
> }}}
>
> It appears to be getting a read event on a file descriptor, and then
> getting 0 bytes when reading, and then not taking action:
>
> {{{
> root at h:/opt/bind10/var/bind10-devel/log# strace -p 3771 2>&1 | head -10
> Process 3771 attached - interrupt to quit
> recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
> msg_flags=0}, 0) = 0
> select(14, [9 13], [], [], NULL)        = 1 (in [13])
> recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
> msg_flags=0}, 0) = 0
> select(14, [9 13], [], [], NULL)        = 1 (in [13])
> recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
> msg_flags=0}, 0) = 0
> select(14, [9 13], [], [], NULL)        = 1 (in [13])
> recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
> msg_flags=0}, 0) = 0
> select(14, [9 13], [], [], NULL)        = 1 (in [13])
> recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
> msg_flags=0}, 0) = 0
> }}}
>
> This file descriptor is for the Unix-domain socket used to transport file
> descriptors:
>
> {{{
> root at h:/opt/bind10/var/bind10-devel/log# lsof -p 625 | grep 13u
> b10-xfrou 625 root   13u  unix 0xffff88001e642900       0t0   35734
> /opt/bind10/var/auth_xfrout_conn
> }}}
>
> This is almost certainly being returned to notify_out.py:
>
> {{{
>     def _get_notify_reply(self, sock, tgt_addr):
>         try:
>             msg, addr = sock.recvfrom(512)
>         except socket.error:
>             self._log_msg('error', "notify to %s failed: can't read
> notify reply" % addr_to_str(tgt_addr))
>             return None
>
>         return msg
> }}}
>
> What should happen is that an error needs to be recognized and the socket
> needs to be re-opened.
>
> Ideally logging will be added to find out about when this happens so we
> can figure out why we are getting this response.

New description:

 There is an infinite loop in xfrout on my server.

 The process list:

 {{{
 root at h:/opt/bind10/var/bind10-devel/log# ps -eLf | grep xfrout
 root       625   621   625  0    5 May25 pts/1    00:00:01
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621   632  0    5 May25 pts/1    00:00:00
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621   633  0    5 May25 pts/1    00:00:52
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621  3771 96    5 May28 pts/1    3-04:54:56
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root       625   621  3772  0    5 May28 pts/1    00:00:00
 /usr/bin/python3 /opt/bind10/libexec/bind10-devel/b10-xfrout
 root     12322 12117 12322  0    1 07:42 pts/0    00:00:00 grep xfrout
 }}}

 It appears to be getting a read event on a file descriptor, and then
 getting 0 bytes when reading, and then not taking action:

 {{{
 root at h:/opt/bind10/var/bind10-devel/log# strace -p 3771 2>&1 | head -10
 Process 3771 attached - interrupt to quit
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 select(14, [9 13], [], [], NULL)        = 1 (in [13])
 recvmsg(13, {msg_name(0)=NULL, msg_iov(1)=[{"\0", 1}], msg_controllen=0,
 msg_flags=0}, 0) = 0
 }}}

 This file descriptor is for the Unix-domain socket used to transport file
 descriptors:

 {{{
 root at h:/opt/bind10/var/bind10-devel/log# lsof -p 625 | grep 13u
 b10-xfrou 625 root   13u  unix 0xffff88001e642900       0t0   35734
 /opt/bind10/var/auth_xfrout_conn
 }}}

--

-- 
Ticket URL: <http://bind10.isc.org/ticket/988#comment:2>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list