BIND 9.2.1 Refresh Timeout Problem

Wed Jul 31 19:35:24 UTC 2002

> Barry Finkel <b19141 at achilles.ctd.anl.gov> wrote:
>>> On July 03 I posted:

>>>>When I start BIND 9.2.1, the zones are loaded, and I see the "running"
>>>>message.  Then I see messages
>>>>
>>>>     zone xxxxx/IN sending notifies (serial yyyyy)
>>>>
>>>>for each of the 293 zones.  Then I see messages like this one:
>>>>
>>>>     Jul  3 07:18:40 titania.ctd.anl.gov named[5037]: zone anl.gov/IN:
>>>>       refresh: failure trying master 146.137.96.100#53: timed out
>>>>
>>>>For some unknown reason the slave can not get to any of its masters.
>>>>What could cause this?  The slave server works fine with BIND 8.2.5-REL.

>>> There have been no replies on this newsgroup.  I looked at the BIND 9
>>> Users newsgroup, and there was a similar posting.  I am posting my
>>> problem here (instead of to bind9-users) because I am subscribed to
>>> this list, and I assume that the same level of expertise is available
>>> here as there.  Is there a need for two different newsgroups?
>>>
>>> The responses on bind9-users were
>>>
>>>      1) Change the firewall to accept DNS packets from a high-numbered
>>>         UDP port.
>>>      2) See transfer-source, notify-source and query-source to let BIND
>>>         not use a high-numbered UDP port.
>>>
>>> I do not have a firewall between my DNS server titania (aka 
>>> dns1.anl.gov) and some of my masters.  I ran a number of sniffer traces,
>>> and in each case I saw BIND 9.2.1 on dns1 send SOA queries from a
>>> high-numbered UDP port to port 53 on each master.  In the trace, which
>>> was taken on a router port that spanned the dns1 addresses, I saw
>>> responses for each of the SOA queries returning from port 53 on the
>>> masters to the high-numbered port on the slave dns1.  Is there any
>>> reason why BIND would not be seeing these return responses?  Do I
>>> need to change anything in the BIND configuration file?  After I have
>>> finished with my testing (when the initial set of refresh failure
>>> messages stop appearing in syslog), then I stop 9.2.1 with rndc,
>>> edit the named.conf file to comment out the rndc key statements,
>>> copy the BIND 8.2.5-REL executable back to named, and restart 8.2.5.
>>
>>> Note that dns1 is a Solaris 5.6 machine (soon to be 5.8) with three
>>> Interfaces.  Is there a problem because I have multiple interfaces?

> phn at icke-reklam.ipsec.nu replied:

>>You might gave something here.  What if you explicitly states 
>>"listen-on for all your addresses ?

> I tried

>      listen-on { 146.137.64.5; 146.139.254.5; 130.202.20.5; };

> this morning; it did not help.

I installed two patches from Mark Andrews (one to socket.c and one to
zone.c), as Mark said that the refresh responses were arriving back
to the slave server faster than BIND 9.2.1 could process them.  These
two patches may have helped (I do not know), but the problem still
existed.  Mark suggested the options statement

     serial-query-rate #;

where the default is 20 queries per second.  I chose a value of 5, and
the refresh messages no longer appeared (except in the "normal" case
where one master was unreachible).  Once I got BIND 9 operational I
did not have the time to experiment with rate values to see what was
the limit for my hardware configuration.  I still have problems with the
slowness of BIND; I plan to move the server from a Sun Sparc 5 to
a Sun Blade within the next month.
----------------------------------------------------------------------
Barry S. Finkel
Electronics and Computing Technologies Division
Argonne National Laboratory          Phone:    +1 (630) 252-7277
9700 South Cass Avenue               Facsimile:+1 (630) 252-4601
Building 222, Room D209              Internet: BSFinkel at anl.gov
Argonne, IL   60439-4828             IBMMAIL:  I1004994