NOTIFY(SOA) for zone already xferring
Mark.Andrews at nominum.com
Mark.Andrews at nominum.com
Wed Feb 28 03:25:08 UTC 2001
1152. [bug] ixfr processing could leave Z_XFER_RUNNING set.
Index: bin/named/ns_maint.c
===================================================================
RCS file: /proj/cvs/isc/bind8/src/bin/named/ns_maint.c,v
retrieving revision 8.118
retrieving revision 8.119
diff -u -r8.118 -r8.119
--- ns_maint.c 2001/02/04 12:49:37 8.118
+++ ns_maint.c 2001/02/08 01:27:47 8.119
@@ -1599,7 +1599,6 @@
break;
case XFER_SUCCESSIXFR:
- zp->z_flags |= Z_XFER_RUNNING;
zp->z_xferpid = XFER_ISIXFR;
ns_notice(ns_log_default,
"IXFR Success %s",
@@ -1624,8 +1623,6 @@
ns_notice(ns_log_default,
"IXFR Merge failed %s",
zp->z_ixfr_tmp);
- zp->z_flags &=
- ~(Z_XFER_RUNNING|Z_XFER_ABORTED|Z_XFER_GONE);
ns_retrytime(zp, tt.tv_sec);
sched_zone_maint(zp);
}
>
> We have been having periodic, spotty problems with
> zones on slave servers not synchronizing from our stealth
> master. This first happened just after installing 8.2.3
> on one of our public slaves. It re-occurred just a few
> days ago on a different public slave. When it happens,
> it happens to only some of the zones carried by the slave
> even though many other zones are receiving NOTIFY messages
> and synchronizing. As a result of grubbing through the
> logs on the master and the slave, here's what happens
> (based on the landmarks found in the log):
>
> 1) Zone updated via dynamic DNS update to
> Stealth Master.
> 2) SM issues a NOTIFY to slave name servers
> listed for zone (FWIW SM doesn't have an NS
> record, only slaves).
> 3) Slave responds by requesting the serial
> number of the zone for which it just received
> a NOTIFY.
> **** This is where the slave appears to lose it.
> It logs:
> NOTIFY(SOA) for zone already xferring (xyzzy.dom.ain)
> for every NOTIFY it receives for the zones
> that aren't being updated. It never
> 4) When the serial number comes in from the
> SM, slave compares with the serial number
> it has in memory. If the zone needs to be
> updated, slave forks/execs "named-xfer" to
> retrieve the zone.
> 5) "named-xfer" queries for the serial number
> again to verify that the zone needs to be
> transferred (it has been passed the local
> serial number as a parameter).
> 6) If the zone is out of date, "named-xfer"
> initiates a zone transfer to pull down the
> new zone.
> 7) The exit status from "named-xfer" tells
> slave whether there is a new zone file
> waiting to be loaded.
> 8) Slave loads it and the serial number
> is updated.
>
> Info from logs (SM has query logging enabled,
> slaves do not):
> 2) When notify is sent, it is seen on uncooperative
> slave as evidenced by the fact that SM logs:
> XX /slave_ip/dom.ain/SOA/IN
> and:
> Received NOTIFY answer (AA) from slave_ip for "dom.ain IN SOA"
> 4) "named-xfer" never started as evidenced by
> wrapping it in a script that throws in a log
> entry. The
> NOTIFY(SOA) for zone already xferring (xyzzy.dom.ain)
> message doesn't show up for this notify, but
> does for the next one to come in. FWIW z_flags
> when this happens is 8043.
>
> The zone had been successfully staying in sync:
> a NOTIFY would come in, an AXFR would be started,
> it would succeed, and the zone would load.
>
> The zone does not carry an NS for the SM, only
> for the slaves.
>
> We can't really run the slaves with debug enabled
> because it can take many days before the problem
> appears. Other hints or suggestions would be
> most welcome.
> Thanks, Scott
>
--
Mark Andrews, Nominum Inc.
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: Mark.Andrews at nominum.com
More information about the bind-users
mailing list