refresh: retry limit for master 10.133.253.128#53 exceeded (source 0.0.0.0#0)

Sat Nov 14 06:12:09 UTC 2015

So, the last couple of days I've been banging my head on this problem....

Where I'm seeing this strangeness.

13-Nov-2015 18:00:27.896 general: info: zone salina.k-state.edu/IN/internal: 
refresh: retry limit for master 10.133.253.128#53 exceeded (source 0.0.0.0#0)
13-Nov-2015 18:00:27.896 general: info: zone salina.k-state.edu/IN/internal: 
Transfer started.
13-Nov-2015 18:00:27.900 xfer-in: info: transfer of 
'salina.k-state.edu/IN/internal' from 10.133.253.128#53: connected using 
129.130.254.21#65439

Among the things I tried, included setting 'transfer-source'.

13-Nov-2015 23:03:42.388 general: info: zone salina.k-state.edu/IN/internal: 
refresh: retry limit for master 10.133.253.128#53 exceeded (source 
129.130.254.21#0)
13-Nov-2015 23:03:42.388 general: info: zone salina.k-state.edu/IN/internal: 
Transfer started.
13-Nov-2015 23:03:42.393 xfer-in: info: transfer of 
'salina.k-state.edu/IN/internal' from 10.133.253.128#53: connected using 
129.130.254.21#34391

No help.

Also disabled the host's firewall though it was wide open for tcp/udp 
involving port 53....

The fuller logs context is:

13-Nov-2015 23:03:03.298 notify: info: client 10.133.253.128#17589: view 
internal: received notify for zone 'salina.k-state.edu'
13-Nov-2015 23:03:03.298 notify: info: client 10.133.253.128#17589: view 
internal: received notify for zone '178.130.129.in-addr.arpa'
13-Nov-2015 23:03:03.298 general: info: zone salina.k-state.edu/IN/internal: 
notify from 10.133.253.128#17589: refresh in progress, refresh check queued
13-Nov-2015 23:03:03.298 general: info: zone 
178.130.129.in-addr.arpa/IN/internal: notify from 10.133.253.128#17589: 
refresh in progress, refresh check queued
13-Nov-2015 23:03:42.388 general: info: zone salina.k-state.edu/IN/internal: 
refresh: retry limit for master 10.133.253.128#53 exceeded (source 
129.130.254.21#0)
13-Nov-2015 23:03:42.388 general: info: zone salina.k-state.edu/IN/internal: 
Transfer started.
13-Nov-2015 23:03:42.393 xfer-in: info: transfer of 
'salina.k-state.edu/IN/internal' from 10.133.253.128#53: connected using 
129.130.254.21#34391
13-Nov-2015 23:03:42.443 general: info: zone salina.k-state.edu/IN/internal: 
transferred serial 2015113475
13-Nov-2015 23:03:42.443 xfer-in: info: transfer of 
'salina.k-state.edu/IN/internal' from 10.133.253.128#53: Transfer completed: 
9 messages, 654 records, 17889 bytes, 0.049 secs (365081 bytes/sec)
13-Nov-2015 23:03:42.443 notify: info: zone salina.k-state.edu/IN/internal: 
sending notifies (serial 2015113475)
13-Nov-2015 23:03:43.395 general: info: zone 
178.130.129.in-addr.arpa/IN/internal: refresh: retry limit for master 
10.133.253.128#53 exceeded (source 129.130.254.21#0)
13-Nov-2015 23:03:43.396 general: info: zone 
178.130.129.in-addr.arpa/IN/internal: Transfer started.
13-Nov-2015 23:03:43.400 xfer-in: info: transfer of 
'178.130.129.in-addr.arpa/IN/internal' from 10.133.253.128#53: connected 
using 129.130.254.21#34392
13-Nov-2015 23:03:43.438 general: info: zone 
178.130.129.in-addr.arpa/IN/internal: transferred serial 2015113421
13-Nov-2015 23:03:43.439 xfer-in: info: transfer of 
'178.130.129.in-addr.arpa/IN/internal' from 10.133.253.128#53: Transfer 
completed: 5 messages, 223 records, 6184 bytes, 0.038 secs (162736 bytes/sec)
13-Nov-2015 23:03:43.439 notify: info: zone 
178.130.129.in-addr.arpa/IN/internal: sending notifies (serial 2015113421)

zone "salina.k-state.edu" {
         type slave;
         file "sec/internal/zone.salina.k-state.edu";
         masters {
                 10.133.253.128;
                 10.133.253.129;
                 129.130.254.20 key "int-tsig";
         }
         also-notify { 129.130.254.20 key "int-tsig"; };
         transfer-source 129.130.254.21;
};

I have 4 nameservers...one stealth master and 3 exposed secondaries....this 
is the zone on 'ns-1.ksu.edu', and where I've just given away the IP of our 
stealth master...

The intent (temporary at the time) was so delegated zones sending to 
'ns-1.ksu.edu' would work....by having that server send it to stealth master, 
which will then distribute it everywhere as if it had gotten it directly....

Of all the delegated subodmains....only the ones involving 10.133.253.128 are 
experiencing this.  So, wondering if there's something about this that's 
causing problems, or something special that needs to be set, etc.  Been 
staring at the ARM, but everything is getting fuzzy so time to crash....

-----

What came before this problem, was the months of mulling over how to redo our 
DNS to get internal transfers of zones between internal/external views (and 
getting our CFEngine 2 to deliver it.  Where I got rushed at the end of the 
rollout and crashed....didn't put in that I was out side the next day, though 
I had been for a week, and was compounding it with sleep deprivation...my 
body said enough.  Unfortunately, so did DNS...(but contained to on campus 
lookups.)  During which I failed to notice that my work cellphone had died, 
and work never thought to try contacting me by any other means....like my 
home phone(s)....such as the one they had called me on when a replace bad 
mirror went south (two problems, the replacement disk wasn't partitioned the 
same way as good disk, and it ran out of relocation sectors soon after 
resilvering was done.)

But, apparently they could only think to try work means during this 
time....voicemail, the sms notification goes where?, office jabber, the sms 
notification goes where?  I did setup voicemail imap retrieval on my 
(personal) smartphone....

Work cellphone is the only one out of 4 I have that wasn't plugged in....its 
a KRZR K1, which has to use its special mini-usb charger...not a mini-usb 
cable from my charging station....so its tangled into a big ball with various 
other cords on floor by my desk.  But, the phone had been sitting by computer 
where I was working....

Ended up with a health check from the police, though the police didn't say 
why work had done that.  Found other voicemails saying they heard back from 
the police that I'm alive, but still can't get a hold of me about the 
emergency....

So it was a few hours later before I happened to see cacti graphs of my DNS 
servers (and saw spikes from having been restarted a few times.)  In taking a 
peek at my email to see what's up...

....fixed it quick...after peeling out all the weird things that other admins 
were trying.

After the dust settled, it was off to catch up on the backlog of DNS tickets 
that were somewhat dependent on this.

------

I have one split domain...which I had been doing as master scp's the (signed) 
zone to other servers, which all act as master for it.  Along with fixing the 
problem caused by upgrading to 9.9.7-P2....where we had all the zones using 
the same file between internal/external views....

Which I had kluged a fix by having CFEngine copy from internal to external, 
and "if repaired" do an 'rndc reload'....

Surprised it held together for 3 months....had figured that it would do for a 
couple of weeks....but wanted it out of the way should I end up put out on 
disability.

-- 
Who: Lawrence K. Chen, P.Eng. - W0LKC - Sr. Unix Systems Administrator
                                    with LOPSA Professional Recognition.
For: Enterprise Server Technologies (EST) -- & SafeZone Ally