Recursive queries fail after bind has been running for a few hours

Mr X xproject128 at gmail.com
Tue Mar 13 16:29:49 UTC 2012


On Mon, Mar 12, 2012 at 3:37 PM, Kevin Oberman <kob6558 at gmail.com> wrote:

> On Mon, Mar 12, 2012 at 12:05 PM, Mr X <xproject128 at gmail.com> wrote:
> > Hey there
> >
> > I'm having a bizarre issue with 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 -
> > recursive queries stop functioning after bind has been running for a few
> > hours. It's a very low volume system (dev), maybe a few queries per hour
> at
> > most. It's not due to cache filling or anything like I've dealt with in
> the
> > past. I suspect it's related to DNSSEC and root-server validation but I
> > could use another set of eyes on my debug log. Sorry for posting from a
> > inconspicuous e-mail address. My employer asks that I'm careful about the
> > information I disclose on public mailing lists.
> >
> > You can see my debug log during a failed query
> > http://pastebin.com/5hh05WjM
> >
> > Successful query here
> > http://pastebin.com/H9qSQcyG
> >
> > If you would like to see my config, I can include portions, but it's
> huge so
> > please let me know exactly what parts you're looking for.
>
> You are getting timeouts for some reason. The obvious question is
> whether the queries are actually being sent or whether they and and
> responses are not coming back. Or,perhaps the response IS coming back,
> but named is not picking them up.
>
> Could you try getting a packet capture? As these are UDP and assuming
> Unix, something like 'tcpdump -w badquery.bpf -s0 -p port 53`. This
> will capture all DNS traffic to/from this system, but you say it is
> not all that much, so it should be tractable.
>
> Once you have captured the data, you can use a tool like wireshark to
> look at it.
>


I had to sanitize some data, so the -vvv output of the packet capture is
here:

http://pastebin.com/GKSspL2L

Unfortunately this server is both authoritative and recursive. I have an
upcoming project to segment these two functions, but for now getting this
host operational is my priority. It's also worth mentioning that I have IO
data center nameservers as a forwarder as seen in this packet capture. When
bind is in a failed state I can query against these nameservers directly -
so I had not considered this a potential cause.

I really appreciate everyones help.


> --
> R. Kevin Oberman, Network Engineer
> E-mail: kob6558 at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20120313/78877d85/attachment.html>


More information about the bind-users mailing list