retry limit exceeded / possible network problem?

Alex mysqlstudent at
Wed Mar 23 19:43:45 UTC 2016


I have a fedora23 system with bind-9.10.3 that's been running fine for
a long time. For some reason this morning, queries started timing out.
This is a mail server, so queries to spamhaus, barracuda, etc, started
timing out with:

Mar 23 14:46:57 mail03 postfix/postscreen[12635]: warning: dnsblog
reply timeout 10s for

where 'mykey' is the key assigned to me for the service. (this isn't a
"query volume reached" kind of error).

It's almost like there's a firewall blocking outbound access, but
that's not the case. Sometimes queries work, sometimes they timeout:

# host
;; connection timed out; no servers could be reached

Trying the same command again, and it might work. Here's an example
with messagelabs:

# host
;; connection timed out; no servers could be reached
# host has address has address
;; connection timed out; no servers could be reached
# host
Using domain server:
Aliases: has address has address mail is handled by 10 mail is handled by 20

It does appear to work reliably when using google's nameservers.

Just running "dig" returns all the forward entries for the top-level
servers, but not the reverse. My hints file does have both, however.

Then I noticed these in the bind logs:

23-Mar-2016 15:12:10.603 general: info: zone
refresh: retry limit for master exceeded (source
23-Mar-2016 15:12:10.603 general: info: zone
Transfer started.
23-Mar-2016 15:12:10.615 xfer-in: info: transfer of
'' from connected using
23-Mar-2016 15:12:10.627 xfer-in: info: transfer of
'' from Transfer status: up to date
23-Mar-2016 15:12:10.627 xfer-in: info: transfer of
'' from Transfer completed: 0
messages, 1 records, 0 bytes, 0.012 secs (0 bytes/sec)

where '' is my domain. A little googling shows this is the
result of the UDP transfer failing, then falling back to TCP.

This system is running on a Cablevision/Optonline business-class cable
connection. They've said the circuit is operating normally. Could this
still be some kind of network issue? There are no local errors on the
interface, and I've rebooted their modem and even replaced the network

Perhaps you know of a tcpdump option where I can look for network
retries or some type of packet retransmission/errors?

I'm really stuck, and the mail server isn't functioning while I figure
this out, so any help greatly appreciated.

I've included my named.conf but it was working fine yesterday:

acl "trusted" {
        { 127/8; };
        {; };
        {; };
        {; };
options {
        listen-on port 53 {;; };
        // listen-on-v6 port 53 { ::1; };
        listen-on-v6 port 53 { none; };
        directory       "/var/named";
        dump-file       "/var/named/data/cache_dump.db";
        statistics-file "/var/named/data/named.stats";         // _PATH_STATS
        memstatistics-file "/var/named/data/named.memstats";   // _PATH_MEMSTATS
        allow-query     { trusted; };
        notify master-only;
        recursive-clients 5000;
         - If you are building an AUTHORITATIVE DNS server, do NOT
enable recursion.
         - If you are building a RECURSIVE (caching) DNS server, you
need to enable
         - If your recursive DNS server has a public IP address, you
MUST enable access
           control to limit queries to your legitimate users. Failing
to do so will
           cause your server to become part of large scale DNS amplification
           attacks. Implementing BCP38 within your network would greatly
           reduce such attack surface
        // recursion yes;
        allow-recursion { trusted; };
        dnssec-enable yes;
        dnssec-validation yes;
        dnssec-lookaside auto;
        /* Path to ISC DLV key */
        bindkeys-file "/etc/named.iscdlv.key";
        managed-keys-directory "/var/named/dynamic";
        pid-file "/run/named/";
        session-keyfile "/run/named/session.key";
logging {
        channel default_debug {
                file "data/";
                severity dynamic;
        // Record all queries to the box for now
        channel query_info {
           severity info;
           file "/var/log/named.query.log" versions 3 size 10m;
           print-time yes;
           print-category yes;
        // added for fail2ban support
        channel security_file {
           severity dynamic;
           file "/var/log/" versions 3 size 30m;
           print-time yes;
           print-category yes;
        channel b_debug {
                file "/var/log/named.debug.log" versions 2 size 10m;
                print-time yes;
                print-category yes;
                print-severity yes;
                severity dynamic;
        // Send the security related messages to a separate file.
        channel audit_log {
                file "/var/log/named.audit.log" versions 4 size 10m;
                severity info;
                print-time yes;
                print-category yes;
        category queries { query_info; };
        category default { b_debug; };
        category config { b_debug; };
        category security { security_file; };
        category lame-servers { null; };
zone "." IN {
        type hint;
        file "/var/named/";
zone "" {
        type slave;
        file "slaves/";
        masters {; };
        allow-query { trusted; };
        allow-transfer { trusted; };
include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
include "/etc/rndc.key";


More information about the bind-users mailing list