solved! - Re: 1 hour subdomain failures

sblee at tazmania.org sblee at tazmania.org
Wed Aug 25 03:04:54 UTC 1999


Based on this situation, is it safe to also deduce then that any SOA
with multiple NS records for the same machine (different hostname,
same ip) would cause similar 'outages'?  I have a situation where for
every non-discreet period of time - no definite pattern - my
nameserver stops answering queries. No nslookups, no tcp connections.
Unfortunately, this only became a problem approx 4-6 weeks ago ao I've
been hunting a network problem. 

The SOA contains:

                       3600    ; refresh     1H
                        1800    ; retry       30M
                        604800  ; expire      7D
                        3600 ) ; default TTL  1H

and
	IN	NS	xyz.com.
	IN	NS	rst.net.
where xyz,com and rst.net are the same machine/same ip.

The system is configured as a primary. As I have multiple nameserers
on the network, the only machine that's really affected is the one
that whole world has refernece to. when the system seems to be
behaving 'normally', I periodically have problems running most ndc
permutations - reload, restart - it hangs. If it's as simple as
removing one of the NS records, great.

Insight please?

On 24 Aug 1999 12:16:00 -0700, John Studarus <studarus at one.net> wrote:

>
>	Unfortunately the ISP never told us what 
>version of software they were running but we were
>able to determine the problem and place a fix.
>	Turns out the caching name server got 
>confused on an NS record that pointed to a name
>server in the subdomain.
>	i.e.
>	(in the mydomain.com)
>	subdomain.mydomain.com.	1H IN NS	ns.mydomain.com.
>
>	(in the mydomain.com subdomain)
>	subdomain.mydomain.com.	1H IN NS	ns.subdomain.mydomain.com.
>
>	ns.mydomain.com and ns.subdomain.mydomain.com are
>the same machines.
>
>	So - when the first record expired (after an hour) it 
>would try and use the second record to determine the name server 
>to use.  This would fail (infinite loop? - NS record for 
>subdomain.mydomain.com points to ns.subdomain.mydomain.com).
>It would fail for 1 hour while the TTL expires and then
>work again when it caches the subdomain.mydomain.com IN NS
>ns.mydomain.com record.
>	The fix was to replace the entries with:
>
>	(in the mydomain.com)
>	subdomain.mydomain.com.	1H IN NS	ns.mydomain.com.
>
>	(in the mydomain.com subdomain)
>	subdomain.mydomain.com.	1H IN NS	ns.mydomain.com.
>
>	Is this a bug in an older version of BIND (or
>some other name server software)?  It probably doesn't
>make sense to place ns records in the subdomain but
>it's interesting that it works with the latest
>BIND release.  We were not able to duplicate this 
>problem anywhere else.  
>	Thanks for everyone's help!
>
>		-John
>
>	
>
>
>
>
>Michael Voight wrote:
>> 
>> I don't think this would cause a problem on only one machine.
>> 
>> Michael
>> 
>> Mark_Andrews at isc.org wrote:
>> > 
>> >         What are the SOA counter values for the zone in question?
>> > 
>> >         My bet is that expire is set at 1 hr and refresh is set
>> >         at 2 hrs.  Expire should always be very much greater than
>> >         refresh.
>> > 
>> >         Mark
>> > 
>> > >
>> > >       I've been tracking down a intermittent
>> > > name server problem from a single caching DNS server.
>> > > This caching DNS server will oscilate between
>> > > being able to answer queries and not being able
>> > > to answer the queries for hostnames in the subdomain.
>> > > The oscillations are exactly two hours in total
>> > > length (one hour it works, for the next hour
>> > > it is broken).
>> > >       When I say it is broken I mean that when
>> > > we send a query we never get a packet in reply.
>> > > When I perform the query via tcp the socket closes
>> > > right after the query.  (I've been modifying the
>> > > code to dnsquery for these tests).
>> > >       We have been monitoring several caching
>> > > name servers and this is the only server that has
>> > > this problem!
>> > >       Some more details...  The ttl for the NS
>> > > record for this subdomain is 1 hour.  The ttl for
>> > > hosts in this subdomain is 6 minutes.
>> > >       Could it be that when the NS record
>> > > expires (after 1 hour) the caching server waits
>> > > for an hour before it contacts the authoritative server
>> > > again?  Does anyone know of a name server implementation
>> > > the exhibits this behavior?  (i.e. a 2 hour limit
>> > > before recontacting an authoritative name server?)
>> > >
>> > >               -John
>> > >
>> > > --
>> > > John Studarus <studarus at one.net>
>> > >
>> > >
>> > --
>> > Mark Andrews, Internet Software Consortium
>> > 1 Seymour St., Dundas Valley, NSW 2117, Australia
>> > PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org
>> 
>
>
>-- 
>John Studarus <studarus at one.net>
>



More information about the bind-users mailing list