Strange / Frustrating Caching Problems

Fri Jul 14 14:59:07 UTC 2006

-----Original Message-----
From: Merton Campbell Crockett [mailto:m.c.crockett at adelphia.net] 
Sent: Friday, July 14, 2006 10:35 AM
To: Smith, William E. (Bill), Jr.
Cc: Mark_Andrews at isc.org; bind-users at isc.org
Subject: Re: Strange / Frustrating Caching Problems 

On 13 Jul 2006, at 11:43 , Smith, William E. ((Bill)), Jr. wrote:

>
>
> -----Original Message-----
> From: Mark_Andrews at isc.org [mailto:Mark_Andrews at isc.org]
> Sent: Thursday, July 13, 2006 1:55 PM
> To: Smith, William E. (Bill), Jr.
> Cc: bind-users at isc.org
> Subject: Re: Strange / Frustrating Caching Problems
>
>
>> For the past few months, I have been trying to resolve 
>> (unsuccessfully to thi s point) with a  trio of caching only name 
>> servers that we have in place.  The general nature of the problem is 
>> as follows.  A dhcp client originally gets  an IP address on subnet A

>> but at some point prior to lease expiration moves to subnet B, where 
>> they obtain a new IP address successfully.  The problem that I am 
>> seeing is that after the move to subnet B, one or more of our caching

>> only name servers are still returning the old IP address when a 
>> lookup of the hostname occurs.  This behavior seems reasonable at 
>> first glance since caching only servers should retain the information

>> they have in cache until the TTL expires and/or the cache is flushed.

>> After digging into this further, I'm  finding that that the TTL for 
>> the hosts whose forward lookups are returning the wrong IP are set to

>> 604800 seconds or 168
>> hours.  I've determined this by dumping / viewing the cache.   In
>> addition, I've also discovered that the TTL for the reverse record 
>> for the same client is also set to this high value.  This behavior 
>> would seem reasonable if this high value was the TTL value configured

>> for the domain, which is not the case here.  We have the default TTL 
>> in our environment set for 10800 seconds or 4 hours.  Thus, I'm a 
>> little baffled as to why the TTL for some of these DHCP clients are 
>> being set to such a high value when other clients have their TTL's 
>> set to the 10800 v alue configured at the domain level.  I've checked

>> the registration at the ob ject level (in our IP management 
>> application) and the TTL field is blank, thu s
> implying the default TTL is in place.
>> Aside from the above details, I can also note that the problematic 
>> lookups se em to involve the same DHCP clients.  The only reason I 
>> know about these clie nts is that they are unable to SSH to some Unix

>> boxes in a DMZ that restrict access to hosts that they can perform
> both forward and reverse lookups for.
>> In this scenario, the forward lookup is failing since it's returning 
>> the old IP address of the client.  When this problem occurs, it tends

>> to affect one o r two of the caching servers but not all three.
>> Furthermore, it is somewhat random as to which of the 3 servers are
> affected.
>>
>> The caching servers in question are all Solaris 9 running BIND 9.3.2
>>
>> If anyone can provide some insight here, it would be much 
>> appreciated.
>
>> I can  provide additional information and/or elaborate on something 
>> as
> needed.
>>
>> Bill Smith
>> <mailto:bill.smith at jhuapl.edu>
>> ISS Server Systems Group
>> Johns Hopkins University Applied Physics Laboratory 11100 Johns 
>> Hopkins Road Laurel, MD 20723
>> Phone:  443-778-5523
>> Web:  http://www.jhuapl.edu <http://www.jhuapl.edu/>
>
> 	Nameservers do what the dhcp servers tell them to do.  The TTL
> 	is set by the DHCP server.  Try lowering the dhcp lease time as
> 	that influences the DNS TTL.

In an environment where people can wander with their laptops from subnet
to subnet, why do you have caching only name servers?

These name servers should, at least, have the local zones defined as
forward or stub zones to minimize the amount of erroneous data being
returned in a volatile environment.

		I left out some details about our environment (at least
initially) to avoid adding too much information at the onset.  Anyhow,
The caching 	name servers are only in our DMZ for use by our DMZ
clients.  We have authoritative servers running internally and
externally (external being the 	ones we have registered with registrar).
The reasoning for the caching-only name servers in the DMZ was primarily
one of security.

In a volatile environment, you do not want the DHCP server to set the
TTL to the lease time.  I've yet to see a user release the system's IP
address before picking up his laptop and going to his next meeting.  To
minimize the impact of this behaviour, define ddns-ttl for each DHCP
pool.  The DHCP server will use the value of ddns-ttl for the TTL when
updating DNS.  The value of ddns-ttl should be set to the maximum number
of seconds you are willing to accept erroneous DNS answers.

	Correct -- I do not want the DHCP server setting the TTL to the
lease and do not believe it is happening for the most part.  With the
one client in 	question, his lease was 14 days while the TTL was 7.
That said, I was unaware of the ddns-ttl option and will need to explore
if that option is 	available with my DHCP server.  I presume it is
but need to check with the vendor.  Assuming it is available, that
should certainly help with this 	particular client though I'm
still don't understanding why I'm not seeing nor hearing about this
problem more often from users/admins.

For this to work correctly, you need to configure the DHCP server to
update both forward and reverse zones and not permit DHCP clients to
update any zone information.

	The DHCP is configured to update both forward and reverze zones
and does not permit DHCP clients to update their own records.

> This is intersting then.  We have roughly 10,000 DHCP clients in total

> here with only a small handful exhibiting this high TTL value.  The 
> handful could certainly be more that I simply don't know about but I 
> would have expected to hear of similar problems from other users.  In 
> addition, the same template (of IP settings) is being applied to the 
> "problematic" clients as others whose TTL's are fine.  If the behavior

> is a by-product of the lease time, why would we not be seeing this 
> behavior on a larger number of clients?  Our standard lease time here 
> is
> 14 days and has been for some time.  It has only been within the last 
> few months that I've been made aware of the noted problem.  That said,

> best practice seems to dictate that RR TTL for DHCP clietns should not

> exceed 1/3 the lease time, which would not be the case here (right at 
> bout 50% in some cases).  All this aside though, is there any DHCP 
> option available to more tightly control the TTL value or is this 
> something that should be configurable at a more global level?  I may 
> also follow-up with the vendor of my IP Management product since I'm 
> using their DHCP server.
>
> 	Mark
> --
> Mark Andrews, ISC
> 1 Seymour St., Dundas Valley, NSW 2117, Australia
> PHONE: +61 2 9871 4742                 INTERNET: Mark_Andrews at isc.org
>
>

Merton Campbell Crockett
m.c.crockett at adelphia.net