Procedure for failover partner replacement.

Thu May 11 14:38:48 UTC 2017

On May 11, 2017, at 6:35 AM, Bob McDonald <bmcdonaldjr at gmail.com> wrote:
> 
> I've got a failing dhcp failover partner. (the partner is a HA cluster and both nodes are being RMAed. Long story)
> 
> My question is this. Is the following procedure ok for the replacement? (I've already confirmed the new version of DHCP is exactly the same as the old one)
> 
> 1) before shutting down the failing partner cluster, stop DHCP and save the dhcpd.leases file and the DHCPD.conf file.
> 2) shut down the failing partner cluster completely.
> 3) bring up the replacement partner cluster while leaving DHCPD turmed off.
> 4) restore the DHCPD.leases and DHCPD.conf files.
> 5) restart DHPCD on the replacement partner cluster.
> 
> My contention is that this will result in the failover pair going into partner-interrupted state for about 5 or 10 minutes while the HA cluster is replaced and then should restart communications as if nothing happened when the replacement partner comes live. Thoughts?

Here is what I would do:

1. On both failover peers (both clusters), set 'max-unacked-updates 1000;'.
2. Save the old dhcpd.conf and any included files from the failing peer cluster. Do not save the leases file.
3. Shut down the failing cluster completely.
4. Put the remaining failover peer into partner-down state.
5. Bring up the replacement cluster with dhcpd not running.
6. Restore the dhcpd.conf (including the 'max-unacked-updates' statement.
7. Start dhcpd on the replacement cluster.

At step 3, the remaining peer will move to communications-interrupted. But step 4 will change this, so that you don't have to worry about pool exhaustion during steps 5 and 6. At step 7, the new peer will move to recover state, sync with the master, and then move to normal state. At that point, the other peer will automatically move from partner-down to normal state.

Regards,
Chris