Secondary server in failover fails to come out of recover state

Wed Apr 24 22:50:12 UTC 2013

I should note that while it was recovering, the primary reported:

partner-state = 00:00:00:06
local-state = 00:00:00:04

and the secondary reported:

partner-state = 00:00:00:04
local-state = 00:00:00:06

In following another suggestion (recreate an empty dhcpd.leases file), I 
shutdown the secondary but the primary still reported:

partner-state = 00:00:00:06
local-state = 00:00:00:04

The change that was made was the addition of these two scopes:

subnet 192.168.75.128 netmask 255.255.255.128 {
                pool {
                        range 192.168.75.130 192.168.75.254;
                        deny dynamic bootp clients ;
                        failover peer "dhcp" ;
                 }
        option domain-name "dept.utexas.edu";
        option subnet-mask 255.255.255.128;
        option broadcast-address 255.255.255.255;
        option routers 192.168.75.129;
}

subnet 192.168.228.32 netmask 255.255.255.224 {
	pool {
		range 192.168.228.34 192.168.228.62;
		deny dynamic bootp clients ;
		failover peer "dhcp" ;
	}
	default-lease-time 7200;
	max-lease-time 7200;
	option domain-name "dept.utexas.edu";
	option subnet-mask 255.255.255.224;
	option broadcast-address 255.255.255.255;
	option routers 192.168.228.33;
}

the new scopes were first added to the primary, it was then reloaded. 
After both servers were in a "normal" state, the corresponding change 
was made on the secondary and it was reloaded.

Per Stephen Carr's suggestion, I have increased the MCLT to 300 and both 
servers are still in the same state.

On 04/24/2013 04:40 PM, Oscar Ricardo Silva wrote:
> We have two servers in a failover relationship, both running 4.1-ESV-R7.
>   After a reload of dhcpd on the secondary, it has not come out of the
> recover state after almost an hour.  We've had this happen with 3.1.3
> and recently upgraded to this version.  The only thing we've been able
> to do is stop both instances of dhcpd and remove "my state" and "partner
> state" from dhcpd.leases.
>
>
> Here's the timeline of what happened.
>
> 1.  A change was made to the configuration of the primary and dhcpd
> reloaded at 15:39:14.
> 2. The primary moved back to a "normal" state at 15:43:42
>
> Apr 24 15:39:14 primary-dhcp dhcpd: failover peer dhcp: I move from
> normal to shutdown
> Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> normal to partner-down
> Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: I move from
> shutdown to recover
> Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from
> recover to startup
> Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from
> startup to recover
> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer update
> completed.
> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from
> recover to recover-done
> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> partner-down to normal
> Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from
> recover-done to normal
> Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> normal to shutdown
> Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: I move from
> normal to partner-down
> Apr 24 15:44:54 primary-dhcp dhcpd: peer dhcp: disconnected
> Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> shutdown to recover
> Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from
> recover to recover
>
>
>
> 3.  The corresponding change was made on the secondary and it was
> reloaded at 15:44:53
>
> 4.  At 15:44:54 it came back up into recover, then moved from recover to
> startup, then from startup to recover.  That's where it's been ever since.
>
> Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: I move from
> normal to shutdown
> Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: peer moves
> from normal to partner-down
> Apr 24 15:44:54 secondary-dhcp dhcpd: failover peer dhcp: I move from
> shutdown to recover
> Apr 24 15:45:56 secondary-dhcp dhcpd: failover peer dhcp: I move from
> recover to startup
> Apr 24 15:45:59 secondary-dhcp dhcpd: failover peer dhcp: I move from
> startup to recover
>
>
>
> Here's dhcpd.conf for the primary:
>
> option domain-name-servers 192.168.50.41, 192.168.50.40 ;
> option ntp-servers 192.168.50.40, 192.168.50.41;
> default-lease-time 86400;
> max-lease-time 86400;
> one-lease-per-client true;
> ddns-update-style ad-hoc;
> ddns-updates off;
> authoritative;
> if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {
>          deny booting;
> }
> option voip-tftp-server-address code 150 = array of ip-address ;
> set vendor-string = option vendor-class-identifier;
> failover peer "dhcp" {
>           primary;
>           address 192.168.100.2;
>           port 520;
>           peer address 192.168.101.2;
>           peer port 520;
>           max-response-delay 60;
>           max-unacked-updates 10;
>           mclt 120;
>           split 255;
>           load balance max seconds 5;
>         }
> subnet 192.168.100.0 netmask 255.255.255.224 {
>          }
> include "/dhcpd/dhcpd.network.conf";
>
>
> and the /dhcpd/dhcpd.network.conf file holds the scope definitions. Both
> servers sync time through ntp and have the same exact time.
>
>
> Any information would be appreciated.
>
>
>