do not stupidly delete ZSK files

Fri Aug 7 14:50:49 UTC 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Am 07.08.2015 um 07:16 schrieb Lawrence K. Chen, P.Eng.:
> 
> 
> On 2015-08-06 19:26, Heiko Richter wrote:
> 
>>> Though back then I was still building bind 32-bit, and the
>>> hardware as much slower.  A full signing was more than 10x
>>> longer than our current hardware....which can get it done in
>>> just under a minute. (usually)  The need for speed is some
>>> people expect DNS changes to be near instantaneous.
>> 
>> So either you have very slow servers, or a really big zone, if
>> it takes a whole minute to sign it.
>> 
>> Just use inline-signing and the changes will be instantanious. As
>> soon as nsupdate delivers a change to the master server, it will
>> sign it automatically and send out notifies. Doesn't even take a
>> second, as only the changes need to be signed, not the whole
>> zone.
>> 
> 
> Its big and probably full of a lot of stuff that isn't needed
> anymore, etc.  Though there something weird about the zones too.
> 
> our ksu.edu zone will have more entries than the k-state.edu one,
> even though by policy they should be the same,

Just one addition aside the face that your network seems to drown in
chaos:

If the two zones are mandated to be the same, just empty one of them,
put a DNAME record in it that points to the other one and make all
future changes there. That way you can be sure the two zones are
always in sync....

> though I just fixed up delegated subdomain that is only doing
> .ksu.edu form (the also don't list us as secondaries or allow us to
> do transfers anymore...which they're supposed to according to
> policy (and to ensure external resolution....especially if all
> their 129.130.x.y addresses become 10.42.x.y or something.
> Internally we're probably running out of open blocks of IPv4,
> especially for anything that wants /27 or bigger (such as a /21)
> It caused problems the first chunk from a reclaimed block was used.
> The reclaimed block used to be our guest wireless network (which is
> now a number of are was a growing number of blocks in 10.x.x.x 
> space)  The switch to WPA2 Enterprise versus open guest, made it
> too tempting to take easy way to get online.  So it was required
> that campus resources block access from guest networks.  There was
> no notification that the old guest network wasn't anymore...and its
> been years now.
> 
> But, often hear that it should would be nice if I filled these
> various network blocks with generated forward/reverses....I'm
> rarely in the loop for what and where the blocks are.
> 
> Anyways...the odd thing I was going with ksu.edu vs
> k-state.edu...the size of the raw second zones end up fairly close
> in size so would expect a huge difference in viewing the zones.
> 
> but, the named-compilezone to convert k-state.edu back into text
> took a few seconds, while it took minutes to do ksu.edu.....same
> machine, etc.    Wonder why, and wonder to what extent I should
> investigate.....
> 
> But, our master server, is Sun Fire X4170 M2 (dual Xeon
> E5620's)....its bored and a waste most of the time...until a full
> signing needs to get done.  Though it isn't as fun to watch when I
> was using a T5120 (64 threads)....load average would break 100 and
> set all kinds of monitoring alerts....  but it chugged along
> fine....though the apps (and their admins) in other containers on
> it weren't as happy.
> 
> Years ago, loads exceeding 100 were often fatal and messy, since
> they used to be caused by problems between ZFS and our old SAN
> (9985)....as much as they didn't want us to, turning of zil was
> often the fix to make it not happen anymore.  The problem went away
> after we switched to new SAN (which isnt so new anymore...as its
> end is nearing.
> 
> I've thought about looking for a solution that I can throw our
> zone configs enough that would just work, but I largely haven't had
> time to do that.  Or I was hoping to get more backing on enforcing
> good behavior in my zones. (stop the vanity of wanting 10.x.x.x
> servers at same level as your subdomain with public.)  Not sure how
> preprocesssing zone files to generate internal / external (/ guest
> / dr) versions translates into a free ready to go solution :)
> 
> I commented out the the latter two as the first never did what
> they wanted, and I heard that the official DR plan was something
> that got written up back in 2000 and and then shelved to be
> revisited when there's funding....  So once we got we got
> secondaries outside of our netblock (we vanished complete a few
> times when our Internet connection breaks, and the last major quite
> a number of sites plus our email were externally hosted....
> 
> During recent DNS outage, i couldn't send replies to
> co-workers....our Office365 tenant said i was an invalid sender
> :..(  It also apparently knocked me off of jabber and stopped
> having my deskphone forward to my cellphone....or for me to get sms
> notications of voicemail.....
> 
> But, FreeNode contined to work....before jabber we had a private
> channel that we hung out in (while its been a long time since we
> ran a node, we still have well maybe not, since the co-workers that
> had those friends have all left now....which is probably why
> ownership of the channel hasn't transferred to me....)
> 
> 
>>> 
>>> For those I do have a script that can run after and ssh into
>>> all my caching servers have flush....
>> 
>> You don't need to manually sync your servers. Just aktivate
>> NOTIFY and your master will inform all slaves of any zone
>> changes. If you also activate IXFR-Transfers, the slaves will
>> only transfer the records that have changes; there's no need to
>> transfer the whole zone. Combined with inline-signing your
>> updates will propagate to all servers within a second.
>> 
> Well, we do have our caching servers acting as slaves for some
> zones, but frequently its not realiable for getting our busiest
> server (the server that listed first on our DNS configuration page,
> and is what DHCP gives out as first.) to not continue with its
> cached answer...  I've made suggestions to try to get them to
> spread things out....there's 6 servers....not just two...as they
> some areas now get the second server first.  Resulting in second
> listed server being my second busiest. After that its a split
> between 3 and 5 ones.  We used to list our datacenter DNS as
> 'backup', though we had an outage our student information system
> due to the datacenter DNS getting swamped by a few computers across
> campus (that were getting hammered by a DDoS attack....
> 
> number 3 used to be 3rd busiest, but its popularity is has gone 
> down...since it only has a 100M connection, while others have
> gigabit. All the campus servers used to be only 100M.  But, people
> that know which say it matters...  But, tis in the powerplant and
> has one leg on inverter power...the batteries for the old phone
> system are there....next to large empty room....
> 
> though at the moment, no incremental capabilities.... so I can hit
> a slave a few times before the transfer finishes the info updates.
> (just as I can hit master servera few times after it does 'rndc
> reload' after the signing....before it reflect the change...
> 
> But, it it was actually hard getting to the amount of automation
> that I have now.... but occasion people fight the automation. (some
> more than others)
> 
> 
> 
>>> 
>>> Now if only I could figure out how to do that to the rest of
>>> the world to satisfy those other requests.
>> 
>> It's just a matter of lowering your ttl. Resolvers all over the
>> world will cache your records according to your ttl. If you
>> really have 86400 set as ttl, any given record will be queried
>> only once per day.
>> 
>> Just lower the default ttl to a resonable number and your updates
>> will propagate faster to the resolvers. It's just a question of
>> how much bandwidth and resources are you willing/able to give to
>> DNS? Lower it step-by-step until either hit the limit in your
>> bandwidth or the system-resources of your servers.
>> 
>>> 
>>> Recently saw in incident....a department that has full control
>>> of their subdomain made a typo on an entry with TTL 86400.
>>> They had fixed the typo, but the world still wasn't seeing the
>>> correction. Asked us if we could lower the TTL for it, to maybe
>>> 300.
>>> 
>>> Hmmm... no.
>> 
>> If they have full control of their subdomain, why don't they
>> just change the ttl themselves?
>> 
> that's basically what my co-worker said.... in responding to the 
> ticket.  But, what they're ask is we  lower the TTL of the already 
> cached value.
> 
>> Setting a ttl of 1 day seems a bit high, but of course it always 
>> depends on your zone. If the data is static, 1 day is find, but
>> for dynamic zones this is a but high.
>> 
> 
> There lots that seem to feel that 1 day is what things need to be
> at except for temporary reasons....though people often forget to
> have to lowered in advance of a server upgrade or something.  And,
> this case they had made a typo on where the new server was...so
> instead of traffic shifting from old to new as their update spread
> out....it all disappeared....
> 
> All my domains are static, and I just have forwarding set to the
> servers that have dynamics subdomains (though I'm slave to
> them...shich this new bind has me a bit stumped on what the correct
> way to go is.
> 
>> When you use inline-signing, your updates will be signed
>> on-the-fly, as they come in, so you can lower the ttl to a few
>> minutes without any problems. This helps much in keeping outdated
>> data out of any resolver's cache.
>> 
> 
> Hopefully a solution will suddenly appear that can replace the
> scripts I've mashed together over the years to do what we do
> now....
> 
> I had thought I'd have solution to our current DNS problem in place
> by now....
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)

iQIcBAEBAgAGBQJVxMXHAAoJECKEz6pWghIm2ycP/iKL/4GVDdk+J7POP4pELE5K
Po3J2/jXddBUQr+vfdilHqxMSjsolkr+eAkCDAjDAt2HoyM21wMIZBQmLeJEouhJ
OfD1tLx9T9RFS5T5C4fuMG5FramnxoAfIeANQznOzKGIFXe8E11fGz38SNoj3Jgb
gCsKQbqPhnSXoK2/StS+E3QslBdqesw3dVke21uSJqMyN+kdZJvwTF26ZovJsLfK
kzTCxbmnLM97bzpvhob+BrRPQwarpzcL/y+5mWv6fhHxCC2+iJjckLvpkconww0k
sTvNPLYbmNNqV3YjbAjIf2FjA08dRV4319nI+lkRcmpRgNhp9d3reEpKIHa+PJVe
7lR+k7F3H7IGQ1XpfcW2G/HZXdvY0LuY7dI3yGo8+e/EFVzVFZ38hDLQSBbkylJU
h1OCgfBSLamsSYfWgvGp7vlbEJkQgVpl1sdVsMl3Of5VkP2gmVGZgqfxqcylbzAo
lya0UhfE2PZlTGpVJA3xqopLTVz9YRJk4D2iapTxECTtiKVYuvO9X5D9Zhqd0YIy
WwNkka6RknOLtAPrB9K6CRaB7uFWPCifuIt3a+pz5vttzOy6OfJZ+wsHcy4QaLoz
rTM1VJ+ujhHOcykGNaHOcjssrjLzcu6za3pOcFydsaNmCUqrK9A5iZ6V3EIKaOtJ
E4H+fEwbqzErxouaI3D2
=KBQl
-----END PGP SIGNATURE-----