Frustrated DHCP failover not working.. :(

Rob Morin rmorin at datavalet.com
Wed Feb 10 14:28:37 UTC 2016


Hello all… I recently upgraded our 2 dhcp servers, running Ubuntu 14.04 
on quad core servers with 8 gigs of ram.

Before I get into everything we DID have a working failover pair before 
the upgrades were done, but just on crappy and failing hard for 4 years.

What was done is the following….

I made both our dhcp-1(primary) and our dhcp-2(secondary) into stand 
alone mode(no fail over) , I know this might have not been the correct 
way to do this, but at the time it seemed practical.

We then configured our clients controllers to go half to dhcp-1 and half 
to dhcp-2

This worked fine.

We then gradually moved, over the course of a couple days, all the 
client controllers to go only to dhcp-2 server, so at that point all 
controllers were going to dhcp-2 only.

This was working fine.

I then swapped out dhcp-1 server for a more updated one, with the above 
mentioned specs.

Last night I attempted to bring them back into failover mode/setup, this 
did not go well.

What I did was the following;

With dhcp-1 dhcpd daemon stopped, but configured to do failover, I then 
stopped dhcp-2 server.

Now during this time period, leases were obviously not give out J

I then proceed to re-configure dhcp-2 server to be a failover once again 
using the same method that was previously used successfully, I added the 
secondary server conf include statement back into dhcpd.conf file, I 
made sure all was like it was before we did anything.

I started up dhcp-2, with its dhcpd,leases file the same as it was 
before I started all this, and then in the syslog I saw 1000’s of the 
below line..

DHCPDISCOVER from 8c:2d:aa:21:10:91 via 10.37.22.1: peer holds all free 
leases

Now the peer, dhcp-1 was not even up, so I am not sure how it was saying 
that.

I then preceded to tell dhcp-2 that dhcp-1 was done via omshell command, 
then dhcp-2 started giving leases out again.

I then went back on to dhcp-1, started it and it went into recover mode..

Feb 10 05:45:45 dhcp-1 dhcpd: Internet Systems Consortium DHCP Server 
4.3.3-P1
Feb 10 05:45:45 dhcp-1 dhcpd: Copyright 2004-2016 Internet Systems 
Consortium.
Feb 10 05:45:45 dhcp-1 dhcpd: All rights reserved.
Feb 10 05:45:45 dhcp-1 dhcpd: For info, please visit 
https://www.isc.org/software/dhcp/
Feb 10 05:45:45 dhcp-1 dhcpd: Wrote 0 leases to leases file.
Feb 10 05:45:45 dhcp-1 dhcpd: Host HW hash:   No table.
Feb 10 05:45:45 dhcp-1 dhcpd: Host UID hash:  No table.
Feb 10 05:45:45 dhcp-1 dhcpd: Lease IP hash:  Contents/Size (%): 
1664000/1800017 (92%). Min/max: 0/1
Feb 10 05:45:45 dhcp-1 dhcpd: Lease UID hash: Contents/Size (%): 
0/1800017 (0%). Min/max: 0/0
Feb 10 05:45:45 dhcp-1 dhcpd: Lease HW hash:  Contents/Size (%): 
0/1800017 (0%). Min/max: 0/0
Feb 10 05:45:45 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: I move 
from recover to startup
Feb 10 05:45:45 dhcp-1 dhcpd: Server starting service.
Feb 10 05:45:45 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: peer 
moves from unknown-state to partner-down
Feb 10 05:45:45 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: I move 
from startup to recover
Feb 10 05:45:45 dhcp-1 dhcpd: Sent update request all message to 
tdl-dhcp-failover
Feb 10 05:46:54 dhcp-1 dhcpd: bind update on 10.54.147.229 from 
tdl-dhcp-failover rejected: 10.54.147.229: invalid state transition: 
active to expired
Feb 10 05:46:56 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: peer 
update completed.
Feb 10 05:46:56 dhcp-1 dhcpd: failover peer tdl-dhcp-failover: I move 
from recover to recover-wait


And stayed that way....

So i put dhcp-2 back into stand alone mode as to keep the clients happy....


So what would be the proper procedure to get these two back into 
failover mode, while dhcp-2 servers leases still?


P.S. I just did a test on some dev servers and did not realize that 
recover-wait will stay like that till mclt time is over? is this 
correct, as i moved mclt time to 30 seconds on dev server and then 
eventually after  almost 45 secs i saw that the 2 dev servers saw each 
other.

Please see below for my conf files


Thanks...


--------------
Primary
-------------


     dhcpd.conf


authoritative;
log-facility local7;
option domain-name "tmp";
option domain-name-servers 172.30.64.210, 172.30.64.220;
default-lease-time 1200;
max-lease-time 3600;


# Include EITHER the primary configuration
include "/usr/local/etc/dhcp/dhcpd_primary.conf";
# OR the secondary configuration
#include "/etc/dhcp/dhcpd_secondary.conf";

# No service for the local networks
subnet 172.30.0.0 netmask 255.255.255.0 { }
subnet 172.30.128.0 netmask 255.255.255.0 { }
subnet 172.30.129.0 netmask 255.255.255.0 { }

# Non-standard IP ranges (i.e. big stores)
include "/usr/local/etc/dhcp/dhcpd_special_pools.conf";
pid-file-name "/run/dhcp-server/dhcpd.pid";
ddns-update-style none;
omapi-port 7911;
omapi-key omapi_key;
key omapi_key {
      algorithm hmac-md5;
      secret xxxxxxxxxxxxxxxx==;
}


     dhcpd_primary.conf


## PRIMARY
failover peer "dhcp-failover" {
   primary; # declare this to be the primary server
   address 172.30.128.9;
   port 647;
   peer address 172.30.128.11;
   peer port 647;
   max-response-delay 30;
   max-unacked-updates 10;
   load balance max seconds 3;
   mclt 1800;
   split 128;
}

     dhcpd_pools.conf


subnet 10.32.0.0 netmask 255.255.255.0 {
   option routers 10.32.0.1;
   pool {
         failover peer "tdl-dhcp-failover";
         range 10.32.0.5 10.32.0.254;
   }
}

subnet 10.32.1.0 netmask 255.255.255.0 {
   option routers 10.32.1.1;
   pool {
         failover peer "tdl-dhcp-failover";
         range 10.32.1.5 10.32.1.254;
   }
}

............................
and another 6000 subnets like above in this whole dhcpd_pools.conf file



--------------

Secondary
--------------


dhcpd.conf


authoritative;
log-facility local7;
option domain-name "tmp";
option domain-name-servers 172.30.64.210, 172.30.64.220;
default-lease-time 1200;
max-lease-time 3600;


# Include EITHER the primary configuration
#include "/usr/local/etc/dhcp/dhcpd_primary.conf";
# OR the secondary configuration
include "/etc/dhcp/dhcpd_secondary.conf";

# No service for the local networks
subnet 172.30.0.0 netmask 255.255.255.0 { }
subnet 172.30.128.0 netmask 255.255.255.0 { }
subnet 172.30.129.0 netmask 255.255.255.0 { }

# Non-standard IP ranges (i.e. big stores)
include "/usr/local/etc/dhcp/dhcpd_special_pools.conf";
pid-file-name "/run/dhcp-server/dhcpd.pid";
ddns-update-style none;
omapi-port 7911;
omapi-key omapi_key;
key omapi_key {
      algorithm hmac-md5;
      secret xxxxxxxxxxxxxxxx==;
}


     dhcpd_secondary.conf


## SECONDARY
failover peer "dhcp-failover" {
  secondary;
  address 172.30.128.11;
  port 647;
  peer address 172.30.128.9;
  peer port 647;
  max-response-delay 30;
  max-unacked-updates 10;
  load balance max seconds 3;
}


dhcpd_pools.conf file is same as for dhcp-1 server


Rob Morin

Montreal, Canada

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20160210/f0b136ce/attachment-0001.html>


More information about the dhcp-users mailing list