Bind9 overloaded, recursive clients and timeout.

Mark Andrews marka at isc.org
Mon Feb 8 21:26:19 UTC 2010


In message <4B701EC5.6060409 at arcelormittal.com>, Cedric Lejeune writes:
> This is a multi-part message in MIME format.
> --------------010501020309000405000509
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> 
> Hello list,
> 
> Sorry to bother you but I really need help since I cannot figure out 
> what I am doing wrong. I am trying to set up a new DNS server: it 
> behaves as expected in a test environment, but in a production 
> environment, it seems to get overloaded, the number of recursive clients 
> increases until it reaches recursive-clients, a lot of timeouts occure 
> and the server is no more able to answers to any query. The main clients 
> of this server are spam filters (spamassassin) and mail routers. I have 
> googled for this issue and the only thing I have found that may explain 
> this issue is that our firewalls are mishandling packets 
> fragmentation/size larger than 512 bits. So I have checked this using 
> this thread 
> http://groups.google.com/group/comp.protocols.dns.bind/browse_thread/thread/cfa8c63ec6bd08d6 
> and it seems everything is fine. So, as a last resort, I bother you... 
> Do you have any hint that would help me to track down what is wrong?
> 
> Thank you for your help.
> 
> Kind regards,
> 
> cedric.
> 
> Possibly usefull informations:
> 
> System: Debian testing
> Bind version: 9.6.1.dfsg.P1-1
> 
> --------%<--------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> named.conf
> 
> // This is the primary configuration file for the BIND DNS server named.
> //
> // Please read /usr/share/doc/bind9/README.Debian.gz for information on the
> // structure of BIND configuration files in Debian, *BEFORE* you customize
> // this configuration file.
> //
> // If you are just adding zones, please do that in 
> /etc/bind/named.conf.local
> 
> include "/etc/bind/named.conf.options";
> 
> include "/etc/bind/named.conf.local";
> 
> --------%<--------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> named.conf.options
> 
> logging {
>          channel debug {
>                  file "/tmp/debug";
>                  severity debug 2;
>                  print-category yes;
>                  print-time yes;
>                  print-severity yes;
>          };
> 
>          category default {
>                  debug;
>          };
> };
> 
> options {
>          directory "/var/cache/bind";
> 
>          // If there is a firewall between you and nameservers you want
>          // to talk to, you may need to fix the firewall to allow multiple
>          // ports to talk.  See http://www.kb.cert.org/vuls/id/800113
> 
>          // If your ISP provided one or more IP addresses for stable
>          // nameservers, you probably want to use them as forwarders.
>          // Uncomment the following block, and insert the addresses 
> replacing
>          // the all-0's placeholder.
> 
>          // forwarders {
>          //      0.0.0.0;
>          // };
> 
>          auth-nxdomain no;       // conform to RFC1035
> //      listen-on-v6 { any; };
> 
>          allow-transfer {
>                  X.X.X.X;
>                  Y.Y.Y.Y;
>          };
> 
>          allow-query-cache { any; };
>          allow-recursion { any; };
> 
>          querylog yes;
> 
>          recursive-clients 2000;
> };
> 
> --------%<--------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> named.conf.local
> 
> //
> // Do any local configuration here
> //
> 
> // Consider adding the 1918 zones here, if they are not used in your
> // organization
> // include "/etc/bind/zones.rfc1918";
> 
> include "/etc/bind/zone.hint";
> include "/etc/bind/zones.rfc1912";
> include "/etc/bind/zones.rfc1918";
> include "/etc/bind/zones.master";
> include "/etc/bind/zones.slave";
> 
> --------%<--------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> /etc/default/bind9
> 
> # run resolvconf?
> RESOLVCONF=yes
> 
> # startup options for the server
> OPTIONS="-4 -u bind"
> 
> --------%<--------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> # dig +norec +dnssec www.google.com @a.root-servers.net
> 
> ; <<>> DiG 9.6.1-P1 <<>> +norec +dnssec www.google.com @a.root-servers.net
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55758
> ;; flags: qr; QUERY: 1, ANSWER: 0, AUTHORITY: 13, ADDITIONAL: 16
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags: do; udp: 512
> ;; QUESTION SECTION:
> ;www.google.com.                        IN      A
> 
> ;; AUTHORITY SECTION:
> com.                    172800  IN      NS      a.gtld-servers.net.
> com.                    172800  IN      NS      b.gtld-servers.net.
> com.                    172800  IN      NS      c.gtld-servers.net.
> com.                    172800  IN      NS      d.gtld-servers.net.
> com.                    172800  IN      NS      e.gtld-servers.net.
> com.                    172800  IN      NS      f.gtld-servers.net.
> com.                    172800  IN      NS      g.gtld-servers.net.
> com.                    172800  IN      NS      h.gtld-servers.net.
> com.                    172800  IN      NS      i.gtld-servers.net.
> com.                    172800  IN      NS      j.gtld-servers.net.
> com.                    172800  IN      NS      k.gtld-servers.net.
> com.                    172800  IN      NS      l.gtld-servers.net.
> com.                    172800  IN      NS      m.gtld-servers.net.
> 
> ;; ADDITIONAL SECTION:
> a.gtld-servers.net.     172800  IN      A       192.5.6.30
> a.gtld-servers.net.     172800  IN      AAAA    2001:503:a83e::2:30
> b.gtld-servers.net.     172800  IN      A       192.33.14.30
> b.gtld-servers.net.     172800  IN      AAAA    2001:503:231d::2:30
> c.gtld-servers.net.     172800  IN      A       192.26.92.30
> d.gtld-servers.net.     172800  IN      A       192.31.80.30
> e.gtld-servers.net.     172800  IN      A       192.12.94.30
> f.gtld-servers.net.     172800  IN      A       192.35.51.30
> g.gtld-servers.net.     172800  IN      A       192.42.93.30
> h.gtld-servers.net.     172800  IN      A       192.54.112.30
> i.gtld-servers.net.     172800  IN      A       192.43.172.30
> j.gtld-servers.net.     172800  IN      A       192.48.79.30
> k.gtld-servers.net.     172800  IN      A       192.52.178.30
> l.gtld-servers.net.     172800  IN      A       192.41.162.30
> m.gtld-servers.net.     172800  IN      A       192.55.83.30
> 
> ;; Query time: 10 msec
> ;; SERVER: 198.41.0.4#53(198.41.0.4)
> ;; WHEN: Mon Feb  8 15:03:49 2010
> ;; MSG SIZE  rcvd: 531

Good you are not blocking packets > 512.
 
> --------%<--------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> # dig +dnssec +norec +ignore dnskey se @A.NS.se
> 
> ;; Query time: 48 msec
> ;; SERVER: 192.36.144.107#53(192.36.144.107)
> ;; WHEN: Mon Feb  8 15:04:52 2010
> ;; MSG SIZE  rcvd: 1203

This one didn't reach the fragmentation threshold (1203 < 1500).
SE have tuned their dnskey response in the last two years.
Try "dig +dnssec +norec +ignore any . @l.root-servers.net"
I get 1906 bytes which is well over the threshold.

;; Query time: 229 msec
;; SERVER: 2001:500:3::42#53(2001:500:3::42)
;; WHEN: Tue Feb  9 08:20:00 2010
;; MSG SIZE  rcvd: 1906

> --------%<--------%<--------%<--------%<--------%<--------%<--------%<--------
> 
> Log extract:
> 
> ...
> 08-Feb-2010 14:39:56.391 query-errors: debug 1: client X.X.X.X#12695: 
> query failed (SERVFAIL) for 11.94.88.195.dnsbl.sorbs.net/IN/A at 
> query.c:4619
> 08-Feb-2010 14:39:56.391 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for 11.94.88.195.dnsbl.sorbs.net/A in 30.000143: timed 
> out/success [domain:dnsbl.sorbs.NET,referral:0,restart:1,qrysent:13,timeou
> t:12,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]

Run "dig +trace +dnssec 11.94.88.195.dnsbl.sorbs.net" from the box the
recursive nameserver is on and see what happens.

> 08-Feb-2010 14:39:56.392 query-errors: debug 1: client X.X.X.X#48028: 
> query failed (SERVFAIL) for euro-index.be/IN/A at query.c:4619
> 08-Feb-2010 14:39:56.392 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for euro-index.be/A in 30.000085: timed out/success 
> [domain:.,referral:0,restart:1,qrysent:11,timeout:10,lame:0,neterr:0,badresp:
> 0,adberr:0,findfail:0,valfail:0]

Run "dig +trace +dnssec euro-index.be" and see what happens.

> 08-Feb-2010 14:39:56.392 query-errors: debug 1: client X.X.X.X#48028: 
> query failed (SERVFAIL) for euro-index.be/IN/MX at query.c:4619
> 08-Feb-2010 14:39:56.393 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for euro-index.be/MX in 30.000111: timed out/success 
> [domain:.,referral:0,restart:1,qrysent:11,timeout:10,lame:0,neterr:0,badresp
> :0,adberr:0,findfail:0,valfail:0]

Run "dig +trace +dnssec euro-index.be mx" and see what happens.

> 08-Feb-2010 14:39:56.394 query-errors: debug 1: client X.X.X.X#48028: 
> query failed (SERVFAIL) for 218.208.78.194.dnsbl.sorbs.net/IN/A at 
> query.c:4619
> 08-Feb-2010 14:39:56.394 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for 218.208.78.194.dnsbl.sorbs.net/A in 30.000152: timed 
> out/success [domain:dnsbl.sorbs.NET,referral:0,restart:1,qrysent:13,time
> out:12,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
> 08-Feb-2010 14:39:56.396 query-errors: debug 1: client X.X.X.X#48028: 
> query failed (SERVFAIL) for 218.208.78.194.zen.spamhaus.org/IN/A at 
> query.c:4619
> 08-Feb-2010 14:39:56.396 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for 218.208.78.194.zen.spamhaus.org/A in 30.000175: 
> timed out/success 
> [domain:zen.spamhaus.org,referral:0,restart:1,qrysent:22,ti
> meout:21,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
> 08-Feb-2010 14:39:56.396 query-errors: debug 1: client X.X.X.X#48028: 
> query failed (SERVFAIL) for euro-index.be.fulldom.rfc-ignorant.org/IN/A 
> at query.c:4619
> 08-Feb-2010 14:39:56.396 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for euro-index.be.fulldom.rfc-ignorant.org/A in 
> 30.000098: timed out/success 
> [domain:rfc-ignorant.org,referral:0,restart:4,qrysen
> t:4,timeout:3,lame:0,neterr:0,badresp:0,adberr:4,findfail:0,valfail:0]
> 08-Feb-2010 14:39:56.417 query-errors: debug 1: client X.X.X.X#12695: 
> query failed (SERVFAIL) for 11.94.88.195.zen.spamhaus.org/IN/A at 
> query.c:4619
> 08-Feb-2010 14:39:56.417 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for 11.94.88.195.zen.spamhaus.org/A in 30.000161: timed 
> out/success [domain:zen.spamhaus.org,referral:0,restart:1,qrysent:22,time
> out:21,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
> 08-Feb-2010 14:39:56.418 query-errors: debug 1: client X.X.X.X#12695: 
> query failed (SERVFAIL) for 
> ukrs238770.pur3.net.fulldom.rfc-ignorant.org/IN/A at query.c:4619
> 08-Feb-2010 14:39:56.418 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for ukrs238770.pur3.net.fulldom.rfc-ignorant.org/A in 
> 30.000102: timed out/success [domain:rfc-ignorant.org,referral:0,restart:4,
> qrysent:4,timeout:3,lame:0,neterr:0,badresp:0,adberr:4,findfail:0,valfail:0]
> 08-Feb-2010 14:39:56.479 query-errors: debug 1: client X.X.X.X#35810: 
> query failed (SERVFAIL) for 227.228.181.88.combined.njabl.org/IN/A at 
> query.c:4619
> 08-Feb-2010 14:39:56.479 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for 227.228.181.88.combined.njabl.org/A in 30.000118: 
> timed out/success [domain:combined.njabl.org,referral:0,restart:1,qrysent:1
> 1,timeout:10,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
> 08-Feb-2010 14:39:56.479 query-errors: debug 1: client X.X.X.X#35810: 
> query failed (SERVFAIL) for 3.42.27.212.combined.njabl.org/IN/A at 
> query.c:4619
> 08-Feb-2010 14:39:56.479 query-errors: debug 2: fetch completed at 
> resolver.c:3121 for 3.42.27.212.combined.njabl.org/A in 30.000156: timed 
> out/success [domain:combined.njabl.org,referral:0,restart:1,qrysent:10,t
> imeout:9,lame:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
> ...
> 
> --------------010501020309000405000509
> Content-Type: text/x-vcard; charset=utf-8;
>  name="cedric_lejeune.vcf"
> Content-Transfer-Encoding: 7bit
> Content-Disposition: attachment;
>  filename="cedric_lejeune.vcf"
> 
> begin:vcard
> fn:Cedric Lejeune
> n:Lejeune;Cedric
> org:ArcelorMittal Luxembourg;IT
> adr:;;24-26 boulevard d'Avranches;Luxembourg;;L-1160;Luxembourg
> email;internet:cedric.lejeune at arcelormittal.com
> title:System Administration Consultant
> tel;work:+352 4792 2078
> tel;fax:+352 4792 89 2078
> x-mozilla-html:FALSE
> url:http://www.arcelormittal.com
> version:2.1
> end:vcard
> 
> 
> --------------010501020309000405000509
> Content-Type: text/plain; charset="us-ascii"
> MIME-Version: 1.0
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
> 
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
> --------------010501020309000405000509--
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org



More information about the bind-users mailing list