Frequent timeout

Alex mysqlstudent at
Tue Sep 11 18:19:06 UTC 2018


Here is a much more reasonable network capture during the period where
there are numerous SERVFAIL errors from bind over a short period of
high utilization.

This is when our 20mbs cable upstream link was saturated and resulted
in DNS query timeout errors. resulting in these SERVFAIL messages.

The packet trace shows multiple TCP out-of-order and TCP Dup ACK
packets. Would these retransmits cause enough of a delay for the
queries to fail?

Would someone more knowledgeable look into these packet errors for me?

It might seem obvious that we should increase the bandwidth of our
link, since it occurs during periods of high utilization, but it
doesn't occur on our other 10mbs DIA links in the datacenter when the
link is saturated.

11-Sep-2018 11:53:25.692 query-errors: info: client @0x7fc7ef343740
( query failed
at ../../../bin/named/query.c:8580

11-Sep-2018 11:53:25.687 query-errors: debug 2: fetch completed at
../../../lib/dns/resolver.c:3927 for in 30.000084:
timed out/success


On Mon, Sep 10, 2018 at 12:11 PM Alex <mysqlstudent at> wrote:
> Hi,
> > >> tcpdump -s0 -n -i eth0 port domain -w /tmp/domaincapture.pcap
> > >>
> > >> You don't need all of the extra stuff because -s0 captures the full packet.
> >
> > On 06.09.18 18:42, Alex wrote:
> > >This is the command I ran to produce the pcap file I sent:
> > >
> > ># tcpdump -s0 -vv -i eth0 -nn -w domain-capture-eth0-090518.pcap udp
> > >dst port domain
> >
> > and that is the problem. "dst port domain" captures packets going to DNS
> > servers, not responses coming back.
> >
> > "-vv" and "-nn" are useless when producing packet capture and "-s0" is
> > default for some time. I often add "-U" so file is flushed wich each packet.
> >
> > you can strip incoming queries by using filter
> >
> > "(src host 68.195.XXX.45 and dst port domain) or (src port domain and dst host 68.195.XXX.45)"
> I've generated a new tcpdump file using these criteria and uploaded it here:
> The SERVFAIL errors didn't really occur over the weekend. I believe it
> has something to do with mail volume, link congestion/bandwidth
> utilization.
> Thanks,
> Alex

More information about the bind-users mailing list