NOTIFY(SOA) for zone already xferring

Mark.Andrews at nominum.com Mark.Andrews at nominum.com
Wed Feb 28 03:25:08 UTC 2001


1152.   [bug]           ixfr processing could leave Z_XFER_RUNNING set.

Index: bin/named/ns_maint.c
===================================================================
RCS file: /proj/cvs/isc/bind8/src/bin/named/ns_maint.c,v
retrieving revision 8.118
retrieving revision 8.119
diff -u -r8.118 -r8.119
--- ns_maint.c	2001/02/04 12:49:37	8.118
+++ ns_maint.c	2001/02/08 01:27:47	8.119
@@ -1599,7 +1599,6 @@
 					break;
 
 				case XFER_SUCCESSIXFR:
-					zp->z_flags |= Z_XFER_RUNNING;
 					zp->z_xferpid = XFER_ISIXFR;
 					ns_notice(ns_log_default,
 						  "IXFR Success %s",
@@ -1624,8 +1623,6 @@
 						ns_notice(ns_log_default,
 							"IXFR Merge failed %s",
 							  zp->z_ixfr_tmp);
-					zp->z_flags &=
-						~(Z_XFER_RUNNING|Z_XFER_ABORTED|Z_XFER_GONE);
 						ns_retrytime(zp, tt.tv_sec);
 						sched_zone_maint(zp);
 					}
> 
>   We have been having periodic, spotty problems with
> zones on slave servers not synchronizing from our stealth
> master.  This first happened just after installing 8.2.3
> on one of our public slaves.  It re-occurred just a few
> days ago on a different public slave.  When it happens,
> it happens to only some of the zones carried by the slave
> even though many other zones are receiving NOTIFY messages
> and synchronizing.  As a result of grubbing through the
> logs on the master and the slave, here's what happens
> (based on the landmarks found in the log):
> 
> 1) Zone updated via dynamic DNS update to
> Stealth Master.
> 2) SM issues a NOTIFY to slave name servers
> listed for zone (FWIW SM doesn't have an NS
> record, only slaves).
> 3) Slave responds by requesting the serial
> number of the zone for which it just received
> a NOTIFY.
> **** This is where the slave appears to lose it.
>      It logs:
>        NOTIFY(SOA) for zone already xferring (xyzzy.dom.ain)
>      for every NOTIFY it receives for the zones
>      that aren't being updated.  It never
> 4) When the serial number comes in from the
> SM, slave compares with the serial number
> it has in memory.  If the zone needs to be
> updated, slave forks/execs "named-xfer" to
> retrieve the zone.
> 5) "named-xfer" queries for the serial number
> again to verify that the zone needs to be
> transferred (it has been passed the local
> serial number as a parameter).
> 6) If the zone is out of date, "named-xfer"
> initiates a zone transfer to pull down the
> new zone.
> 7) The exit status from "named-xfer" tells
> slave whether there is a new zone file
> waiting to be loaded.
> 8) Slave loads it and the serial number
> is updated.
> 
> Info from logs (SM has query logging enabled,
> slaves do not):
> 2) When notify is sent, it is seen on uncooperative
> slave as evidenced by the fact that SM logs:
>   XX /slave_ip/dom.ain/SOA/IN
> and:
>   Received NOTIFY answer (AA) from slave_ip for "dom.ain IN SOA"
> 4) "named-xfer" never started as evidenced by
> wrapping it in a script that throws in a log
> entry.  The
>   NOTIFY(SOA) for zone already xferring (xyzzy.dom.ain)
> message doesn't show up for this notify, but
> does for the next one to come in.  FWIW z_flags
> when this happens is 8043.
> 
> The zone had been successfully staying in sync:
> a NOTIFY would come in, an AXFR would be started,
> it would succeed, and the zone would load.
> 
> The zone does not carry an NS for the SM, only
> for the slaves.
> 
> We can't really run the slaves with debug enabled
> because it can take many days before the problem
> appears.  Other hints or suggestions would be
> most welcome.
> 				Thanks, Scott
> 
--
Mark Andrews, Nominum Inc.
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark.Andrews at nominum.com


More information about the bind-users mailing list