mysqlstudent at gmail.com
Tue Sep 11 18:19:06 UTC 2018
Here is a much more reasonable network capture during the period where
there are numerous SERVFAIL errors from bind over a short period of
This is when our 20mbs cable upstream link was saturated and resulted
in DNS query timeout errors. resulting in these SERVFAIL messages.
The packet trace shows multiple TCP out-of-order and TCP Dup ACK
packets. Would these retransmits cause enough of a delay for the
queries to fail?
Would someone more knowledgeable look into these packet errors for me?
It might seem obvious that we should increase the bandwidth of our
link, since it occurs during periods of high utilization, but it
doesn't occur on our other 10mbs DIA links in the datacenter when the
link is saturated.
11-Sep-2018 11:53:25.692 query-errors: info: client @0x7fc7ef343740
(8cb54bfffc54eee06342d5619246d67166abc6cf.ebl.msbl.org): query failed
(SERVFAIL) for 8cb54bfffc54eee06342d5619246d67166abc6cf.ebl.msbl.org/IN/A
11-Sep-2018 11:53:25.687 query-errors: debug 2: fetch completed at
ac949d5d947f8f5cad13e98c68bac6f284c367fd.ebl.msbl.org/A in 30.000084:
On Mon, Sep 10, 2018 at 12:11 PM Alex <mysqlstudent at gmail.com> wrote:
> > >> tcpdump -s0 -n -i eth0 port domain -w /tmp/domaincapture.pcap
> > >>
> > >> You don't need all of the extra stuff because -s0 captures the full packet.
> > On 06.09.18 18:42, Alex wrote:
> > >This is the command I ran to produce the pcap file I sent:
> > >
> > ># tcpdump -s0 -vv -i eth0 -nn -w domain-capture-eth0-090518.pcap udp
> > >dst port domain
> > and that is the problem. "dst port domain" captures packets going to DNS
> > servers, not responses coming back.
> > "-vv" and "-nn" are useless when producing packet capture and "-s0" is
> > default for some time. I often add "-U" so file is flushed wich each packet.
> > you can strip incoming queries by using filter
> > "(src host 68.195.XXX.45 and dst port domain) or (src port domain and dst host 68.195.XXX.45)"
> I've generated a new tcpdump file using these criteria and uploaded it here:
> The SERVFAIL errors didn't really occur over the weekend. I believe it
> has something to do with mail volume, link congestion/bandwidth
More information about the bind-users