Name resolution failure on a caching server -- many '; pending-answer' records in the cache

Wed Jan 27 20:42:46 UTC 2016

NXDOMAIN is not a "failure" response. Are you *sure* you're getting NXDOMAIN? If you're using nslookup to test, be aware that it will do suffix searching by default, so if the original query, e.g. www.bbc.co.uk  fails, it'll quietly (unless debug-mode is in effect) start appending suffixes. Looking up those suffixed names, e.g. www.bbc.co.uk.example.com, mostly likely gets an NXDOMAIN, so nslookup reports NXDOMAIN as the overall result of the query. So, it's basically a misreporting of the error by nslookup. 

Note that only 1 of the records in your cache dump is actually relevant -- the CNAME from www.bbc.co.uk to www.bbc.net.uk -- and the others are for a different part of the namespaces (thdow.bbc.co.uk).

If you do an explicit query of the CNAME, when the problem is occurring, does it resolve? I would expect, even though the cache entry is marked "pending-answer", it will still resolve. But, without the target of the CNAME also resolving, the lookup as a whole cannot succeed.

													- Kevin

-----Original Message-----
From: bind-users-bounces at lists.isc.org [mailto:bind-users-bounces at lists.isc.org] On Behalf Of TPCbind at mklab.ph.rhul.ac.uk
Sent: Tuesday, January 26, 2016 8:02 PM
To: bind-users at lists.isc.org
Subject: Name resolution failure on a caching server -- many '; pending-answer' records in the cache

Dear All,
     I run a caching server on a section of the departmental LAN.
Occasionally network congestion results in timeouts & name resolution failures.  Lookups performed on name servers outside my LAN section fail with NXDOMAIN.  Querying my name server for items not in its cache gets the same result.

My problem is that long after the congestion has subsided, queries to my name server still result in NXDOMAIN failure.  AFAICT this situation remains indefinitely, until the cache is flushed 'rndc flush' or the bind restarted.  When it is in this state dumping the cache with 'rndc dumpdb' shows numerous entries like this,

--------------------------------------------------------------------------------------------
; pending-additional
thdow.bbc.co.uk.        76632   NS      ns3.bbc.net.uk.
                        76632   NS      ns4.bbc.co.uk.
                        76632   NS      ns4.bbc.net.uk.
                        76632   NS      ns3.bbc.co.uk.
; pending-answer
ns0.thdow.bbc.co.uk.    2082    \-AAAA  ;-$NXRRSET
; thdow.bbc.co.uk. SOA ns.bbc.co.uk. hostmaster.bbc.co.uk. 2015122100 1800 600 864000 86400 ; pending-answer
                        76632   A       212.58.240.162
; pending-answer
www.bbc.co.uk.          30      CNAME   www.bbc.net.uk.
; glue
--------------------------------------------------------------------------------------------

and attempts to lookup eg. www.bbc.co.uk result in NXDOMAIN.

Browsing the documentation I noticed the parameter 'max-ncache-ttl'
which is unset in my named.conf and apparently defaults to 3hours.
However the problem persists long after 3hours has elapsed following incidents of network congestion.

I could setup a cronjob to check name resolution on external domains and flush the cache when it fails?  I am assuming there must be better solution!  Should I set max-ncache-ttl to something fairly short in my named.conf and hope that the default value is for some reason actually
>> 3hours?

BTW I there a way to dump out all the parameters from a running named
-- just to see all their values ?

Any ideas on how to solve or further diagnose the problem?

Many thanks
Tom Crane

System details:
OS:    Scientific Linux CERN SLC release 6.7 (Carbon) [NB: SLC is a derivative of RHEL]
BIND:  bind-9.8.2-0.37.rc1.el6_7.5.x86_64

Ps. I originally posted in Usenet NG comp.protocols.dns.bind but got no followups and then noticed all messages in that NG had this ML's fields 'NNTP-Posting-Host: lists.isc.org' and 'X-Original-To: 
bind-users at lists.isc.org' etc. in their headers.  Is c.p.d.b actually a moderated group now or exclusively tied to this ML via a mail2news gateway?

-- 
Tom Crane, Dept. Physics, Royal Holloway, University of London, Egham Hill,
Egham, Surrey, TW20 0EX, England.
Email:  T dot Crane at rhul dot ac dot uk

_______________________________________________
Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list

bind-users mailing list
bind-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/bind-users