Stalling slave transfers
cathya at isc.org
Thu May 9 09:36:46 UTC 2013
On 08/05/13 19:15, Tom Sommer wrote:
> On 5/8/13 12:25 PM, Cathy Almond wrote:
>> On 08/05/13 08:26, Tom Sommer wrote:
>>> I have a problem with one of 3 slave servers, all set up the exact same
>>> way, with the exact same bind version and configuration.
>>> One slave has a problem transfering zones from the master.
>>> The logfiles are flooded with "received notify for zone" .. "refresh in
>>> progress, refresh check queued" lines and "rndc status" returns a
>>> constant high number of "soa queries in progress".
>>> After a few hours the zones are transfers, so the connection to the
>>> master is working, but there is a major delay. I tried resetting the
>>> slave and transfering ALL slave zones again, which worked fine
>>> instantly. The problem still appeared again after a few hours though.
>>> The master has three network-paths, one on external IP, one on internal
>>> IP and one on IPv6. All 3 paths work fine, because the transfers happen
>>> after an hour or so.
>>> There is no hints in the master's log.
>>> The other two slaves are running perfectly, no errors or delays what so
>>> Bind version 9.9.2-P2 (recently upgraded to).
>>> Any hints would be appreciated, as I feel like I've exhausted most
>>> Thank you.
>> Have a look at this KB article (you'll need to register to view - but
>> registration is open to all):
>> Also - and this isn't covered in that article (yet) - if you're using
>> views, then use-alt-transfer-source defaults to 'yes'. You might want
>> to set it explicitly to 'no' or to define alt-transfer-source
>> and/or alt-transfer-source-v6.
> Thank you, great resource. I think I solved it with raising
> serial-query-limit, it's just odd that it's not required on the other
> two servers.
> Another issue has arisen now though, the logfile is filled with lots of
> named: zone example.com/IN: refresh: failure trying master
> 220.127.116.11#53 (source 0.0.0.0#0): operation canceled
> But if I do a "dig example.com @18.104.22.168" it's working just fine. Same
> server as with the previous issue.
> Any thoughts? Thank you.
> // Tom
I don't think you solved the problem - I think you moved it (or made it
The refresh errors indicate that the master isn't responding to your
slave for some reason. That's what you'll need to investigate. I would
suggest auditing the differences between this slave and the others in
their named configurations as well as their configured IP interfaces and
A pair of network packet traces (slave and the non-responding auth
server) might also point you in the right direction.
More information about the bind-users