SERVFAIL and peak utilization

Alex mysqlstudent at gmail.com
Thu Jul 26 19:54:01 UTC 2018


Hi,

I've made some performance adjustments although I really don't know
whether it's correct, and it doesn't seem to have solved the problem.
I also notice the SERVFAIL error seems to happen in bulk - it will
happen for a while and then stop. It definitely seems to occur more
during peak mail volume (this is a mail server).

        max-clients-per-query 4000;
        clients-per-query 4000;
        recursive-clients 4000;
        tcp-clients 4000;

Here's the named_stats.txt file from "rndc stats":

+++ Statistics Dump +++ (1532630822)
++ Incoming Requests ++
                3267 QUERY
++ Incoming Queries ++
                2345 A
                  74 NS
                  69 PTR
                 152 MX
                 569 TXT
                  58 AAAA
++ Outgoing Rcodes ++
                1356 NOERROR
                 648 SERVFAIL
                1070 NXDOMAIN
++ Outgoing Queries ++
[View: default]
                8749 A
                 139 NS
                 133 PTR
                  30 MX
                 640 TXT
                   6 AAAA
                 488 DS
                  87 DNSKEY
[View: _bind]
++ Name Server Statistics ++
                3267 IPv4 requests received
                2026 requests with EDNS(0) received
                   6 TCP requests received
                3074 responses sent
                   6 truncated responses sent
                1883 responses with EDNS(0) sent
                1134 queries resulted in successful answer
                2426 queries resulted in non authoritative answer
                 222 queries resulted in nxrrset
                 648 queries resulted in SERVFAIL
                1070 queries resulted in NXDOMAIN
                2190 queries caused recursion
                  33 duplicate queries received
                   4 queries dropped
                 156 recursing clients
                3249 UDP queries received
                   6 TCP queries received
++ Zone Maintenance Statistics ++
++ Resolver Statistics ++
[Common]
                 143 UDP queries in progress
[View: default]
               10272 IPv4 queries sent
                2503 IPv4 responses received
                 611 NXDOMAIN received
                   1 SERVFAIL received
                  16 FORMERR received
                  14 EDNS(0) query failures
                 448 truncated responses received
                7865 query retries
                7674 query timeouts
                 380 IPv4 NS address fetches
                  33 IPv4 NS address fetch failed
                1129 DNSSEC validation attempted
                 348 DNSSEC validation succeeded
                 741 DNSSEC NX validation succeeded
                   1 DNSSEC validation failed
                  78 queries with RTT < 10ms
                1394 queries with RTT 10-100ms
                 981 queries with RTT 100-500ms
                   6 queries with RTT 500-800ms
                   1 queries with RTT 800-1600ms
                 150 active fetches
                 523 bucket size
                   3 REFUSED received
                6146 COOKIE send with client cookie only
                 393 COOKIE sent with client and server cookie
                 291 COOKIE replies received
                 291 COOKIE client ok
[View: _bind]
                 523 bucket size
++ Cache Statistics ++
[View: default]
               22101 cache hits
                  13 cache misses
                5896 cache hits (from query)
                3416 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
                   0 cache records deleted due to TTL expiration
                2096 cache database nodes
                1039 cache database hash buckets
             1352276 cache tree memory total
             1022492 cache tree memory in use
             1022548 cache tree highest memory in use
              393216 cache heap memory total
              132096 cache heap memory in use
              132096 cache heap highest memory in use
[View: _bind (Cache: _bind)]
                   0 cache hits
                   0 cache misses
                   0 cache hits (from query)
                   0 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
                   0 cache records deleted due to TTL expiration
                   0 cache database nodes
                  64 cache database hash buckets
              287792 cache tree memory total
               29952 cache tree memory in use
               29952 cache tree highest memory in use
              262144 cache heap memory total
                1024 cache heap memory in use
                1024 cache heap highest memory in use
++ Cache DB RRsets ++
[View: default]
                 963 A
                 299 NS
                  14 CNAME
                  23 PTR
                  19 MX
                  47 TXT
                 400 AAAA
                  57 DS
                 193 RRSIG
                  33 NSEC
                  34 DNSKEY
                   3 !A
                   2 !NS
                   1 !MX
                  19 !TXT
                   1 !AAAA
                 122 !DS
                 557 NXDOMAIN
                   1 #RRSIG
                   1 #NSEC
[View: _bind (Cache: _bind)]
++ ADB stats ++
[View: default]
                1021 Address hash table size
                 916 Addresses in hash table
                1021 Name hash table size
                1035 Names in hash table
[View: _bind]
                1021 Address hash table size
                1021 Name hash table size
++ Socket I/O Statistics ++
                9861 UDP/IPv4 sockets opened
                 450 TCP/IPv4 sockets opened
                   1 Raw sockets opened
                9711 UDP/IPv4 sockets closed
                 454 TCP/IPv4 sockets closed
                  30 UDP/IPv4 socket bind failures
                9824 UDP/IPv4 connections established
                 446 TCP/IPv4 connections established
                   7 TCP/IPv4 connections accepted
                  43 UDP/IPv4 recv errors
                 150 UDP/IPv4 sockets active
                   3 TCP/IPv4 sockets active
                   1 Raw sockets active
++ Per Zone Query Statistics ++
--- Statistics Dump --- (1532630822)
+++ Statistics Dump +++ (1532634389)
++ Incoming Requests ++
               26879 QUERY
++ Incoming Queries ++
               18386 A
                 642 NS
                 351 PTR
                1186 MX
                5626 TXT
                 688 AAAA
++ Outgoing Rcodes ++
               12312 NOERROR
                3066 SERVFAIL
               11270 NXDOMAIN
++ Outgoing Queries ++
[View: default]
               57901 A
                1761 NS
                 566 PTR
                 555 MX
                4177 TXT
                  87 AAAA
                   2 DNSKEY
[View: _bind]
++ Name Server Statistics ++
               26879 IPv4 requests received
               16404 requests with EDNS(0) received
                 168 TCP requests received
               26648 responses sent
                 168 truncated responses sent
               16357 responses with EDNS(0) sent
               10556 queries resulted in successful answer
               23582 queries resulted in non authoritative answer
                1756 queries resulted in nxrrset
                3066 queries resulted in SERVFAIL
               11270 queries resulted in NXDOMAIN
               14505 queries caused recursion
                 231 duplicate queries received
               26693 UDP queries received
                 168 TCP queries received
                   2 COOKIE option received
                   2 COOKIE - client only
++ Zone Maintenance Statistics ++
++ Resolver Statistics ++
[Common]
[View: default]
               65049 IPv4 queries sent
               12813 IPv4 responses received
                7832 NXDOMAIN received
                   5 SERVFAIL received
                  32 FORMERR received
                  26 EDNS(0) query failures
                 530 truncated responses received
                   4 lame delegations received
               50747 query retries
               52327 query timeouts
                1038 IPv4 NS address fetches
                 205 IPv4 NS address fetch failed
                 706 queries with RTT < 10ms
                7423 queries with RTT 10-100ms
                4076 queries with RTT 100-500ms
                 342 queries with RTT 500-800ms
                  39 queries with RTT 800-1600ms
                   9 queries with RTT > 1600ms
                 523 bucket size
                   6 REFUSED received
               20513 COOKIE send with client cookie only
                1485 COOKIE sent with client and server cookie
                 921 COOKIE replies received
                 921 COOKIE client ok
[View: _bind]
                 523 bucket size
++ Cache Statistics ++
[View: default]
              158038 cache hits
                  13 cache misses
               62750 cache hits (from query)
               19356 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
                 126 cache records deleted due to TTL expiration
               12112 cache database nodes
                4159 cache database hash buckets
             4822015 cache tree memory total
             4393804 cache tree memory in use
             4394140 cache tree highest memory in use
              393216 cache heap memory total
              132096 cache heap memory in use
              132096 cache heap highest memory in use
[View: _bind (Cache: _bind)]
                   0 cache hits
                   0 cache misses
                   0 cache hits (from query)
                   0 cache misses (from query)
                   0 cache records deleted due to memory exhaustion
                   0 cache records deleted due to TTL expiration
                   0 cache database nodes
                  64 cache database hash buckets
              293568 cache tree memory total
               29952 cache tree memory in use
               35728 cache tree highest memory in use
              262144 cache heap memory total
                1024 cache heap memory in use
                1024 cache heap highest memory in use
++ Cache DB RRsets ++
[View: default]
                3060 A
                 863 NS
                 302 CNAME
                  81 PTR
                  77 MX
                 186 TXT
                1152 AAAA
                  85 DS
                 259 RRSIG
                  80 NSEC
                   1 DNSKEY
                  28 !A
                  27 !NS
                   2 !MX
                  94 !TXT
                   5 !AAAA
                6192 NXDOMAIN
[View: _bind (Cache: _bind)]
++ ADB stats ++
[View: default]
                1021 Address hash table size
                2125 Addresses in hash table
                1021 Name hash table size
                1427 Names in hash table
[View: _bind]
                1021 Address hash table size
                1021 Name hash table size
++ Socket I/O Statistics ++
               64830 UDP/IPv4 sockets opened
                 532 TCP/IPv4 sockets opened
                   1 Raw sockets opened
               64823 UDP/IPv4 sockets closed
                 726 TCP/IPv4 sockets closed
                 304 UDP/IPv4 socket bind failures
               64519 UDP/IPv4 connections established
                 519 TCP/IPv4 connections established
                 197 TCP/IPv4 connections accepted
                 218 UDP/IPv4 recv errors
                   7 UDP/IPv4 sockets active
                   3 TCP/IPv4 sockets active
                   1 Raw sockets active
++ Per Zone Query Statistics ++
--- Statistics Dump --- (1532634389)


On Thu, Jul 26, 2018 at 2:51 PM, Alex <mysqlstudent at gmail.com> wrote:
> Hi,
>
> On Thu, Jul 26, 2018 at 1:57 PM, John Miller <johnmill at brandeis.edu> wrote:
>> Hi Alex,
>>
>> What does your query volume look like on this server?  Depending on
>> volume, the BIND defaults for:
>>
>> - clients-per-query
>> - max-clients-per-query
>> - recursive-clients
>> - tcp-clients
>>
>> and others may not be set high enough.  Check pp. 106-108 in the
>> latest 9.11 manual for more details on each of these.
>>
>> Of course, if you're only seeing SERVFAIL for a handful of domains,
>> then they may have some sort of delegation issue, or there might be a
>> network issue between your caching servers and them.
>
> I think it's happening more frequently than for just a remote
> misconfigured system. Here is my rndc status, but it doesn't appear to
> provide all values you've requested.
>
> It's also occurring for queries to trustworthy remote sources:
> 26-Jul-2018 14:48:22.975 query-errors: debug 1: client @0x7fddb400c570
> 127.0.0.1#56094 (mail-dm3nam03on0041.outbound.protection.outlook.com):
> query failed (SERVFAIL) for
> mail-dm3nam03on0041.outbound.protection.outlook.com/IN/A at
> ../../../bin/named/query.c:8580
>
> # rndc status
> version: BIND 9.11.4-RedHat-9.11.4-1.fc28 (Extended Support Version)
> <id:2fe4344>
> running on bwimail03.guardiandigital.com: Linux x86_64
> 4.17.7-200.fc28.x86_64 #1 SMP Tue Jul 17 16:28:31 UTC 2018
> boot time: Thu, 26 Jul 2018 18:47:52 GMT
> last configured: Thu, 26 Jul 2018 18:47:52 GMT
> configuration file: /etc/named.conf (/var/named/chroot/etc/named.conf)
> CPUs found: 8
> worker threads: 8
> UDP listeners per interface: 7
> number of zones: 103 (97 automatic)
> debug level: 0
> xfers running: 0
> xfers deferred: 0
> soa queries in progress: 0
> query logging is OFF
> recursive clients: 63/900/1000
> tcp clients: 0/150
> server is up and running
>
> I've also now confirmed it's happening at times of regular network
> activity. I'm really stuck. I hope someone can help.
>
> Thanks,
> Alex
>
>
>>
>> John
>>
>>
>> On Thu, Jul 26, 2018 at 1:07 PM, Alex <mysqlstudent at gmail.com> wrote:
>>> Hi,
>>>
>>> I have a bind-9.11.4 server on a fedora28 system and are frequently
>>> seeing SERVFAIL errors like this:
>>>
>>> 26-Jul-2018 12:54:04.255 query-errors: info: client @0x7f764314a5c0
>>> 127.0.0.1#50719 (223.178.102.199.cidr.bl.mcafee.com): query failed
>>> (SERVFAIL) for 223.178.102.199.cidr.bl.mcafee.com/IN/A at
>>> ../../../bin/named/query.c:4140
>>>
>>> I believe this happens more frequently at times of peak link
>>> utilization, but it also appears to happen during normal times.
>>>
>>> This is a local caching server I've set up but it also appears to
>>> exist on other systems that have been set up to be authoritative for
>>> our domain.
>>>
>>> How can I troubleshoot this further?
>>>
>>> Here is the named.conf for this caching server:
>>>
>>> acl "trusted" {
>>>         { 127/8; };
>>>         { 68.195.191.40/29; };
>>>         { 192.168.1.0/24; };
>>>         { 107.155.67.2/32; };
>>> };
>>>
>>> options {
>>> listen-on port 53 { 127.0.0.1; 68.195.191.45; };
>>> listen-on-v6 port 53 { none; };
>>> directory "/var/named";
>>> dump-file "/var/named/data/cache_dump.db";
>>>         statistics-file "/var/named/data/named.stats";         // _PATH_STATS
>>>         memstatistics-file "/var/named/data/named.memstats";   // _PATH_MEMSTATS
>>> allow-query     { trusted; };
>>> recursion yes;
>>> zone-statistics yes;
>>>
>>> // dnssec-enable yes;
>>> // dnssec-validation yes;
>>> // dnssec-lookaside auto;
>>>
>>> dnssec-enable no;
>>> dnssec-validation no;
>>> dnssec-lookaside no;
>>>
>>> /* Path to ISC DLV key */
>>> bindkeys-file "/etc/named.iscdlv.key";
>>>
>>> managed-keys-directory "/var/named/dynamic";
>>>
>>> };
>>>
>>> logging {
>>>         channel default_debug {
>>>                 file "data/named.run";
>>>                 severity dynamic;
>>>         };
>>>
>>>         // Record all queries to the box for now
>>>         channel query_info {
>>>            severity info;
>>>            file "/var/log/named.query.log" versions 3 size 10m;
>>>            print-time yes;
>>>            print-category yes;
>>>          };
>>>
>>>         // added for fail2ban support
>>>         channel security_file {
>>>            severity dynamic;
>>>            file "/var/log/named.security.log" versions 3 size 30m;
>>>            print-time yes;
>>>            print-category yes;
>>>         };
>>>
>>> channel b_debug {
>>> file "/var/log/named.debug.log" versions 2 size 10m;
>>> print-time yes;
>>> print-category yes;
>>> print-severity yes;
>>> severity dynamic;
>>>         };
>>>
>>> // Send the security related messages to a separate file.
>>> channel audit_log {
>>> file "/var/log/named.audit.log" versions 4 size 10m;
>>> severity info;
>>> print-time yes;
>>> print-category yes;
>>> };
>>>
>>>
>>>         category queries { query_info; };
>>>         category default { b_debug; };
>>>         category config { b_debug; };
>>>         category security { security_file; };
>>> // category lame-servers { audit_log; };
>>> category lame-servers { null; };
>>>
>>> };
>>>
>>> zone "." IN {
>>> type hint;
>>> file "/var/named/named.ca";
>>> };
>>>
>>> zone "localhost.localdomain" IN {
>>> type master;
>>> file "named.localhost";
>>> allow-update { none; };
>>> };
>>>
>>> zone "localhost" IN {
>>> type master;
>>> file "named.localhost";
>>> allow-update { none; };
>>> };
>>>
>>> zone "1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.ip6.arpa"
>>> IN {
>>> type master;
>>> file "named.loopback";
>>> allow-update { none; };
>>> };
>>>
>>> zone "1.0.0.127.in-addr.arpa" IN {
>>> type master;
>>> file "named.loopback";
>>> allow-update { none; };
>>> };
>>>
>>> zone "0.in-addr.arpa" IN {
>>> type master;
>>> file "named.empty";
>>> allow-update { none; };
>>> };
>>>
>>> include "/etc/named.root.key";
>>> include "/etc/rndc.key";
>>> _______________________________________________
>> _______________________________________________
>> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>>
>> bind-users mailing list
>> bind-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users


More information about the bind-users mailing list