on TTL expiry BIND sends 'ANY' query, gets back 'NOANSWER'

Chuck Anderson cra at WPI.EDU
Wed Apr 8 19:25:16 UTC 2015


I have load balancers (I know, run away now) acting as authoritative
servers for a GSLB zone.  The sub-zone is delegated properly from my
main zone which runs BIND.  All my clients are using the BIND server
as their caching resolver.

Every once in a while, my mail server gets back a 'NOANSWER' for one
of our load-balanced mail servers and causes mail to be bounced.  I've
tracked this down to the following BIND behavior and load balancer
behavior:

1. On TTL expiry, BIND sends an 'ANY' query for the RR in question to
   the authoritative servers for the zone (load balancers).  This
   happens even if there is no current recursive query being processed
   by BIND for this name.  It seems that BIND does this to attempt to
   "refresh" the cache in advance of another recursive query coming
   in.

2. Unfortunately, the load balancer answers 'NOANSWER' when queried
   with the 'ANY' type ('A' queries work fine).  Is this correct
   behavior?

3. BIND caches the 'NOANSWER' response.

4. When the next recursive query for the 'A' RR for this name comes
   in, BIND responds 'NOANSWER' from cache.

5. After some time (zone SOA TTL???), BIND ages out this 'NOANSWER'
   from the cache and sends an 'A' query to the auth servers (load
   balancers).  Again, this happens even if there is no current
   recursive query being serviced for this name, perhaps to "refresh"
   the cache once again.

6. The load balancer answers with the correct 'A' record response.

7. BIND caches the correct 'A' response.

8. When the next recursive query for the 'A' RR for this name comes
   in, BIND responds with the correct 'A' record from cache.

My questions are, what is at fault here?  Is it a BIND bug to expect
'ANY' queries to work?  Is it a load balancer bug to respond
'NOANSWER' to an 'ANY' query?  Is it a BIND bug to cache this
'NOANSWER', or should it have instead immediately issued an 'A' query
before expiring the cache?  Should BIND have not cached 'NOANSWER' at
all, and instead just have done an 'A' query as needed when recursing
during the servicing of the next query from the client?

And finally, is there something I can tweak in BIND to avoid this
problem?

Thanks.


More information about the bind-users mailing list