General questions about failover, config changes and restarting

James Dore james.dore at new.ox.ac.uk
Wed Mar 2 12:36:06 UTC 2016


Hi all,

I’ve had a pair of DHCP servers running in a load balance/failover cluster for about 9 months, but haven’t really got my head round what happens when I make a change to the configuration. 

I have a bunch of config files called from the main config file thus:

##########################
#                        #
# Failover configuration #
#                        #
##########################
failover peer "newc-dhcp" {
    primary;
    address 129.67.111.199; # address of this server
    port 519;
    peer address 129.67.111.243; # address of the secondary dhcpd
    peer port 519;
   max-response-delay 60;
   max-unacked-updates 10;
   mclt 600;
   split 128;
   load balance max seconds 3;
}

key primaryhost {
    algorithm hmac-md5;
    secret <ssshhh!>
};

omapi-key primaryhost;
omapi-port 7911;


###########################
#                         #
# Load the golbal options #
#                         #
###########################

include "/etc/dhcpd.d/master.conf"; # (Rarely!) Edit this file to set global options

########################
#                      #  
# Subnet config files  #
#                      #
########################

include "/etc/dhcpd.d/vlan1.conf"; # 129.67.108.0/22 Main subnet and static assignments
include "/etc/dhcpd.d/vlan3.conf"; # 10.30.0.0/22 Devices subnet config and static assignments
include "/etc/dhcpd.d/vlan4.conf"; # 10.4.0.0/16 NAT Vlan4 Subnet config and static assignments
include "/etc/dhcpd.d/annexe.conf"; # 163.1.173.0/24 Annexe subnet config and static assignments

Both peers have pretty similar config files, the only difference being the secret and the address/peer address settings. Everything else is the same. (Should it be?)

The things I’m curious about are what happens when I make a change to one of the Subnet config files, for instance to add a new static assignment. My usual method has been to edit the file one peer, and then scp it over to the other peer. After that, it seems like I need to do a number of restarts of each peer before they both return to Normal status. They seem to get stuck in Partner-down, Recover, or Recover Wait status for a while. 

If I can get them both in Recover Wait, then they will synchronise, but it seems to be difficult to get them there. 

Is there anything I can do to smooth the process? 

I can’t find much info about troubleshooting failover or load balancing, all my googling has turned up is instructions on initial setup. Does anyone have some useful pointers or links?

Cheers,
James




More information about the dhcp-users mailing list