[bind10-dev] transfer problems

Jeremy C. Reed jreed at isc.org
Tue May 3 18:47:25 UTC 2011


On Fri, 29 Apr 2011, Likun Zhang wrote:

> On Wednesday, April 27, 2011 9:37 PM, Jeremy wrote:
> 
> > I think the use is: a master contains the signed zone (it is not
> > "hidden" but it also is not part of the delegation). Four NS records.
> > One of these is the BIND 10 server. I think the three others may be
> > configured to pull from the new BIND 10 server, but I do not know how
> > they are configured.
> 
> 
> Are you talking about how to configure master address for slave bind10 
> auth server? If it is, you have to configure it in xfrin's spec file.

Okay. I forgot. And this does not make sense. It has master_addr 
defaulting to 127.0.0.1. I confirmed this by doing a zone transfer and 
nothing useful was logged.

> Xfrin retransfer zone_name=bind10.isc.org
"zone xfrin is started"

[b10-auth] received a message:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 65047
;; flags: ; QUESTION: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;bind10.isc.org. IN TYPE252

03-May-2011 18:15:47.030 Xfrout: INFO: transfer of 'bind10.isc.org./IN': AXFR started
03-May-2011 18:15:47.036 Xfrout: INFO: transfer of 'bind10.isc.org./IN': AXFR end
03-May-2011 18:15:47.076 Xfrout: INFO: zone 'bind10.isc.org/IN': receive notify others command

It would be nice to show where the transfer happens.

And why would it have a "receive notify others command" if the zone 
transfer wasn't useful?

Is it really successfully doing a zone transfer from itself?

With tcpdump, I saw no traffic on my external interface but saw the 
traffic on loopback.

That can't be useful as a default -- transfer from itself? But also how 
can we have a common master address for all zones?

I don't know how was initially configured, but I had suggested bindctl's 
Xfrin retransfer with zone_name and a specific master.

But now I realize that the zone-specific master is not recorded in any 
configuration.

> > [b10-xfrin] transfer of 'bind10.isc.org.': AXFR started
> > [b10-xfrin] Error while loading bind10.isc.org.: receive data from
> > socket time out.
> > [b10-xfrin] transfer of 'bind10.isc.org.': AXFR failed
> > 
> > (This is logged over 80 times.)
> > 
> > I don't know when the failed is logged as I don't have the very last
> > one. It would be nice to have real logging with timestamps and PID of
> > process.
> > 
> > It doesn't indicate when or where.  A simple dig against the master for
> > AXFR does work fine on same system.
> 
> 
> I think the problem you mentioned has been recorded in ticket 761.

Is that correct ticket number? I don't see anything related at 
http://bind10.isc.org/ticket/761

I am now running master and it didn't show the "socket time out".

> > The serial number on master and the three other
> > secondaries is 2011042400. The serial on the BIND 10 server is
> > 2011041300.
> > 
> > This is the second time I have seen the xfrin not working. Early last
> > week it was serving wrong data. It was restarted since then.
> > 
> > jelte provided minor patch for xfrin.py.in to also output the
> > self._master_address to know where the timeout was from. I didn't use
> > this yet.

I adjusted slightly:

diff --git a/src/bin/xfrin/xfrin.py.in b/src/bin/xfrin/xfrin.py.in
index 10a866e..f396b10 100755
--- a/src/bin/xfrin/xfrin.py.in
+++ b/src/bin/xfrin/xfrin.py.in
@@ -151,7 +151,8 @@ class XfrinConnection(asyncore.dispatcher):
             self._need_recv_size = size - recv_size
             self._asyncore_loop()
             if self._recv_time_out:
-                raise XfrinException('receive data from socket time out.')
+                raise XfrinException('receive data from %s time out.'
+                                      % (self._master_address))
 
             recv_size += self._recvd_size
             data += self._recvd_data



But haven't noticed this again.

> I'm not sure about the reason, hopely jelte's patch can catch the problem.
> 
> 
> > The verbose output also includes:
> > 
> > --------------------------------------
> >  [b10-auth] received a message:
> > ;; ->>HEADER<<- opcode: NOTIFY, status: NOERROR, id: 13575
> > ;; flags: aa ; QUESTION: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
> > 
> > ;; QUESTION SECTION:
> > ;bind10.isc.org. IN SOA
> > 
> > ;; ANSWER SECTION:
> > bind10.isc.org. 0 IN SOA ns-int.isc.org. hostmaster.isc.org. 2011042400
> > 7200 3600 604800 3600
> > 
> > [b10-auth] received a message:
> > --------------------------------------
> > 
> > The above makes no sense. Notice the "flags: aa ;"
> > 
> > The answer serial is not the same as sent other times (2011041300). This
> > specific query doesn't have any corresponding answer sent back. (I don't
> > see any corresponding "sending a response" for same.)
> > 
> > Why would it receive and log a received message that includes the answer
> > section?
> > 
> > Maybe this corresponds to it originating a SOA check, but that original
> > query is not noted in the verbose output.
> > 
> > Note that is the correct serial that all the other auth servers (except
> > this one) know.
> > 
> > I don't know if we send any response if our xfrin fails. (I didn't do a
> > capture yet.)
> 
> 
> No, this a notify message received from some master.


Okay.  In this case, it was itself (127.0.0.1).


> > Again the goal of this BIND 10 server is to be the master used by the
> > other three public auth servers.  But the Xfrout.log doesn't indicate
> > that at all. No notifies received or transfers out logged since was
> > restarted on April 21.
> > 
> 
> I don't think axfr is pulled from this bind10 server, since every 
> transfer-out will have one log with timestamp now, how about trying 
> axfr with dig again bind10 server?

Yes, I verify that worked. It would be useful to log the address of the 
remote server (recipient of transfer).

As for configuring who the master (or possible plural masters are, 
including optional ports), maybe that should be part of 
Zonemgr/secondary_zones and the Xfrin/master_addr and Xfrin/master_port 
should be removed.

On a related note, why is b10-zonemgr a separate daemon from b10-xfrin? 
Can they be merged? If separate is better, please let me know so I can 
document reason. (This was brought up by another developer recently.)

As for now, I can workaround my problem by choosing a single master_addr 
for all zones.

I need to open tickets for these issues. But first I will wait back for 
some more comments.

  Jeremy C. Reed
  ISC



More information about the bind10-dev mailing list