Strange / Frustrating Caching Problems

Fri Jul 14 14:34:38 UTC 2006

On 13 Jul 2006, at 11:43 , Smith, William E. ((Bill)), Jr. wrote:

>
>
> -----Original Message-----
> From: Mark_Andrews at isc.org [mailto:Mark_Andrews at isc.org]
> Sent: Thursday, July 13, 2006 1:55 PM
> To: Smith, William E. (Bill), Jr.
> Cc: bind-users at isc.org
> Subject: Re: Strange / Frustrating Caching Problems
>
>
>> For the past few months, I have been trying to resolve  
>> (unsuccessfully
>> to thi s point) with a  trio of caching only name servers that we  
>> have
>> in place.  The general nature of the problem is as follows.  A dhcp
>> client originally gets  an IP address on subnet A but at some point
>> prior to lease expiration moves to subnet B, where they obtain a new
>> IP address successfully.  The problem that I am seeing is that after
>> the move to subnet B, one or more of our caching  only name servers
>> are still returning the old IP address when a lookup of the hostname
>> occurs.  This behavior seems reasonable at first glance since caching
>> only servers should retain the information they have in cache until
>> the TTL expires and/or the cache is flushed.  After digging into this
>> further, I'm  finding that that the TTL for the hosts whose forward
>> lookups are returning the wrong IP are set to 604800 seconds or 168
>> hours.  I've determined this by dumping / viewing the cache.   In
>> addition, I've also discovered that the TTL for the reverse record
>> for the same client is also set to this high value.  This behavior
>> would seem reasonable if this high value was the TTL value configured
>> for the domain, which is not the case here.  We have the default TTL
>> in our environment set for 10800 seconds or 4 hours.  Thus, I'm a
>> little baffled as to why the TTL for some of these DHCP clients are
>> being set to such a high value when other clients have their TTL's  
>> set
>> to the 10800 v alue configured at
>> the domain level.  I've checked the registration at the ob ject level
>> (in our IP management application) and the TTL field is blank, thu s
> implying the default TTL is in place.
>> Aside from the above details, I can also note that the problematic
>> lookups se em to involve the same DHCP clients.  The only reason I
>> know about these clie nts is that they are unable to SSH to some Unix
>> boxes in a DMZ that restrict access to hosts that they can perform
> both forward and reverse lookups for.
>> In this scenario, the forward lookup is failing since it's returning
>> the old IP address of the client.  When this problem occurs, it tends
>> to affect one o r two of the caching servers but not all three.
>> Furthermore, it is somewhat random as to which of the 3 servers are
> affected.
>>
>> The caching servers in question are all Solaris 9 running BIND 9.3.2
>>
>> If anyone can provide some insight here, it would be much  
>> appreciated.
>
>> I can  provide additional information and/or elaborate on  
>> something as
> needed.
>>
>> Bill Smith
>> <mailto:bill.smith at jhuapl.edu>
>> ISS Server Systems Group
>> Johns Hopkins University Applied Physics Laboratory 11100 Johns
>> Hopkins Road Laurel, MD 20723
>> Phone:  443-778-5523
>> Web:  http://www.jhuapl.edu <http://www.jhuapl.edu/>
>
> 	Nameservers do what the dhcp servers tell them to do.  The TTL
> 	is set by the DHCP server.  Try lowering the dhcp lease time as
> 	that influences the DNS TTL.

In an environment where people can wander with their laptops from  
subnet to subnet, why do you have caching only name servers?

These name servers should, at least, have the local zones defined as  
forward or stub zones to minimize the amount of erroneous data being  
returned in a volatile environment.

In a volatile environment, you do not want the DHCP server to set the  
TTL to the lease time.  I've yet to see a user release the system's  
IP address before picking up his laptop and going to his next  
meeting.  To minimize the impact of this behaviour, define ddns-ttl  
for each DHCP pool.  The DHCP server will use the value of ddns-ttl  
for the TTL when updating DNS.  The value of ddns-ttl should be set  
to the maximum number of seconds you are willing to accept erroneous  
DNS answers.

For this to work correctly, you need to configure the DHCP server to  
update both forward and reverse zones and not permit DHCP clients to  
update any zone information.

> This is intersting then.  We have roughly 10,000 DHCP clients in total
> here with only a small handful exhibiting this high TTL value.  The
> handful could certainly be more that I simply don't know about but I
> would have expected to hear of similar problems from other users.  In
> addition, the same template (of IP settings) is being applied to the
> "problematic" clients as others whose TTL's are fine.  If the behavior
> is a by-product of the lease time, why would we not be seeing this
> behavior on a larger number of clients?  Our standard lease time  
> here is
> 14 days and has been for some time.  It has only been within the last
> few months that I've been made aware of the noted problem.  That said,
> best practice seems to dictate that RR TTL for DHCP clietns should not
> exceed 1/3 the lease time, which would not be the case here (right at
> bout 50% in some cases).  All this aside though, is there any DHCP
> option available to more tightly control the TTL value or is this
> something that should be configurable at a more global level?  I may
> also follow-up with the vendor of my IP Management product since I'm
> using their DHCP server.
>
> 	Mark
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742                 INTERNET: Mark_Andrews at isc.org
>
>

Merton Campbell Crockett
m.c.crockett at adelphia.net