Recursive queries fail after bind has been running for a few hours

Kevin Oberman kob6558 at gmail.com
Mon Mar 12 22:37:54 UTC 2012


On Mon, Mar 12, 2012 at 12:05 PM, Mr X <xproject128 at gmail.com> wrote:
> Hey there
>
> I'm having a bizarre issue with 9.7.3-P3-RedHat-9.7.3-8.P3.el6_2.2 -
> recursive queries stop functioning after bind has been running for a few
> hours. It's a very low volume system (dev), maybe a few queries per hour at
> most. It's not due to cache filling or anything like I've dealt with in the
> past. I suspect it's related to DNSSEC and root-server validation but I
> could use another set of eyes on my debug log. Sorry for posting from a
> inconspicuous e-mail address. My employer asks that I'm careful about the
> information I disclose on public mailing lists.
>
> You can see my debug log during a failed query
> http://pastebin.com/5hh05WjM
>
> Successful query here
> http://pastebin.com/H9qSQcyG
>
> If you would like to see my config, I can include portions, but it's huge so
> please let me know exactly what parts you're looking for.

You are getting timeouts for some reason. The obvious question is
whether the queries are actually being sent or whether they and and
responses are not coming back. Or,perhaps the response IS coming back,
but named is not picking them up.

Could you try getting a packet capture? As these are UDP and assuming
Unix, something like 'tcpdump -w badquery.bpf -s0 -p port 53`. This
will capture all DNS traffic to/from this system, but you say it is
not all that much, so it should be tractable.

Once you have captured the data, you can use a tool like wireshark to
look at it.
-- 
R. Kevin Oberman, Network Engineer
E-mail: kob6558 at gmail.com



More information about the bind-users mailing list