Catastrophic failure and recovery

Mon Jun 25 17:29:59 UTC 2018

So, in the case I'm interested in here, I've got a pair of peers [failover].
[ISC/We really should pick a different name than failover, because it's essentially load-balancing with redundancy, but I digress :) ]

Now while I'm using two peers, I think the question I'm asking about will be the same regardless of peers or a single server...

So, lets assume the DHCP server [or a peer] dies. Assume we lost a disk. 
Assume I've got configs, but no leases file.

What's the best recovery method?

---
I assume we'll simply put the configurations back on a "new" server. [or peer]
Turn it on and bring it up. [In the peer setup, let it communicate with the other peer.]

Since it won't have a record of any leases [that the dead-peer/old-server actually leased] we'll have a bit of a mess.
But, we'd hope that most machines would already have a lease, and would ask for renewal of that lease.
The server, I think, would generally grant that lease renewal on the same IP. [Even though it has no record of it initially.]

"New" machines just powered up, may/will ask for new addresses, and may "steal" a lease from an active client. ...BUT...
However, if the DHCP server can [and is set to use ping-check] AND the station isn't firewalled or otherwise prevented from receiving/responding to the ping-check, then the DHCP server will realize there's an active client using the address and will avoid leasing that address.

If the active lease is on a machine that's off and returns to the network [before the end of the lease] I'm not sure of the result. I *think* it will attempt to confirm the lease when it comes back on, will get a NAK and be forced to get a new lease.

Thus, generally, using best practices, the result of a catastrophic loss of a DHCP server shouldn't be too disruptive. 
[Provided it can be replaced fairly quickly before too many machines lose their current lease.]

The above setup will be a lot cleaner if there's not much/any IP address churn - in that, for a particular pool, there's enough addresses to give every machine an address simultaneously. If there's a lot of churn it will be substantially more messy, but machines will see far less stability in IP address assignment [But there wasn't a lot of stability to start with, so we've probably only increased the churn rate some.]

Does that sound about right?
I'm sure there's use cases I'm not considering because I don't have those configurations - but am I missing anything serious?

---
On a side note - is it worth capturing [backing up] the leases file, say at a rate of 0.5 times the lease length? [The idea would be to have a reasonably current leases file that might be 80%+ right. Or is this likely to cause more problems than no leases file at all.]

Pointers to FAQ/Docs etc gladly accepted!

TIA
-Greg.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20180625/69f3cce7/attachment.html>