Problem with BIND 9.10.1-P1 recursion limits

Mike Hoskins (michoski) michoski at cisco.com
Tue Dec 9 20:04:00 UTC 2014


Thanks for digging in so fast.  Our mitigation will be sticking to
9.9.6-P1, since we like ESV anyway.

Wanted to point out that (perhaps sadly) this isn't so crazypants...or at
least not uncommon.  The *edge* and *aka* references speak Akamai DNS+CDN.
 From my last overview, this has gotten cleaner in the latest versions of
their offerings -- but many of the large(est) sites on the Internet will
be configured this way today.

-----Original Message-----
From: Evan Hunt <each at isc.org>
Date: Tuesday, December 9, 2014 at 2:41 PM
To: Stuart Henderson <stu at spacehopper.org>
Cc: Tony Finch <dot at dotat.at>, "bind-users at lists.isc.org"
<bind-users at lists.isc.org>
Subject: Re: Problem with BIND 9.10.1-P1 recursion limits

>On Tue, Dec 09, 2014 at 05:51:58PM +0000, Evan Hunt wrote:
>> That's unexpected. I'll see if I can reproduce it.
>
>Okay, I can.
>
>Part of the problem is the somewhat crazypants DNS configuration
>of www.ibm.com:
>
>  $ dig +noall +answer www.ibm.com
>  www.ibm.com.            3600    IN      CNAME   www.ibm.com.cs186.net.
>  www.ibm.com.cs186.net.  60      IN      CNAME
>china-cdn.san.ibm.com.edgekey.net.
>  china-cdn.san.ibm.com.edgekey.net. 21600 IN CNAME
>china-cdn.san.ibm.com.edgekey.net.globalredir.akadns.net.
>  china-cdn.san.ibm.com.edgekey.net.globalredir.akadns.net. 900 IN CNAME
>e7826.x.akamaiedge.net.
>  e7826.x.akamaiedge.net. 20      IN      A       23.59.201.136
>
>... like, *wow*.  A chain of five aliases with TTLs ranging from 20
>seconds to 6 hours, passing through five different zones (ibm.com,
>cs186.net, edgekey.net, akadns.net, akamaiedge.net), hosted by
>servers in three *more* zones (ihost.com, akam.net, and akadns.org,
>in addition to akadns.net and akamaiedge.net).  I had to almost
>double the maximum recursion queries to 99 to get this to work on
>an empty cache.  Yikes.
>
>Almost any non-empty cache will dodge the bullet. Preceeding the
>lookup of www.ibm.com with "dig @::1 ns com" causes the query to
>succeed.  Also, as previously noted, on 9.9 it will succeed without
>a five-minute delay if you just issue the query a second time.
>
>So, possible workarounds if this issue is causing problems for you:
>
>  - Ensure that the first query sent to a newly-primed recursive
>    resolver isn't quite as spectacular as this one;
>  - Add "max-recursion-queries 100;" to your options statement;
>  - Run 9.9.6-P1 instead of 9.10.1-P1
>
>The five-minute delay is still a bit of a puzzle. It happens because
>of this code in adb.c:
>
>        /* XXXMLG Don't pound on bad servers. */
>        if (address_type == DNS_ADBFIND_INET) {
>                name->expire_v4 = ISC_MIN(name->expire_v4, now + 300);
>                name->fetch_err = FIND_ERR_FAILURE;
>                inc_stats(adb, dns_resstatscounter_gluefetchv4fail);
>        } else {
>                name->expire_v6 = ISC_MIN(name->expire_v6, now + 300);
>                name->fetch6_err = FIND_ERR_FAILURE;
>                inc_stats(adb, dns_resstatscounter_gluefetchv6fail);
>        }
>
>The "now + 300" bit is where the five minutes comes from.  That's code
>that's been around for years, and it is in 9.9, but apparently it's
>reached more easily in 9.10.  I'm looking into the reasons for this.
>
>The problem should be addressed in 9.10.2, which is likely to be
>released next month.
>
>-- 
>Evan Hunt -- each at isc.org
>Internet Systems Consortium, Inc.
>_______________________________________________
>Please visit https://lists.isc.org/mailman/listinfo/bind-users to
>unsubscribe from this list
>
>bind-users mailing list
>bind-users at lists.isc.org
>https://lists.isc.org/mailman/listinfo/bind-users



More information about the bind-users mailing list