<div dir="ltr">Can you crank up the logging level to debug (IIRC this needs to be done via syslog) so it details exactly what it is doing when it goes into RECOVER state, it may give some extra pointers.<br></div><div class="gmail_extra">

<br><br><div class="gmail_quote">On 24 April 2013 23:50, Oscar Ricardo Silva <span dir="ltr"><<a href="mailto:oscars@mail.utexas.edu" target="_blank">oscars@mail.utexas.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

I should note that while it was recovering, the primary reported:<br>

<br>

partner-state = 00:00:00:06<br>

local-state = 00:00:00:04<br>

<br>

<br>

and the secondary reported:<br>

<br>

partner-state = 00:00:00:04<br>

local-state = 00:00:00:06<br>

<br>

<br>

<br>

In following another suggestion (recreate an empty dhcpd.leases file), I shutdown the secondary but the primary still reported:<br>

<br>

partner-state = 00:00:00:06<br>

local-state = 00:00:00:04<br>

<br>

<br>

<br>

<br>

The change that was made was the addition of these two scopes:<br>

<br>

<br>

subnet 192.168.75.128 netmask 255.255.255.128 {<br>

               pool {<br>

                       range 192.168.75.130 192.168.75.254;<br>

                       deny dynamic bootp clients ;<br>

                       failover peer "dhcp" ;<br>

                }<br>

       option domain-name "<a href="http://dept.utexas.edu" target="_blank">dept.utexas.edu</a>";<br>

       option subnet-mask 255.255.255.128;<br>

       option broadcast-address 255.255.255.255;<br>

       option routers 192.168.75.129;<br>

}<br>

<br>

<br>

subnet 192.168.228.32 netmask 255.255.255.224 {<br>

        pool {<br>

                range 192.168.228.34 192.168.228.62;<br>

                deny dynamic bootp clients ;<br>

                failover peer "dhcp" ;<br>

        }<br>

        default-lease-time 7200;<br>

        max-lease-time 7200;<br>

        option domain-name "<a href="http://dept.utexas.edu" target="_blank">dept.utexas.edu</a>";<br>

        option subnet-mask 255.255.255.224;<br>

        option broadcast-address 255.255.255.255;<br>

        option routers 192.168.228.33;<br>

}<br>

<br>

<br>

the new scopes were first added to the primary, it was then reloaded. After both servers were in a "normal" state, the corresponding change was made on the secondary and it was reloaded.<br>

<br>

Per Stephen Carr's suggestion, I have increased the MCLT to 300 and both servers are still in the same state.<div class="HOEnZb"><div class="h5"><br>

<br>

<br>

<br>

<br>

On 04/24/2013 04:40 PM, Oscar Ricardo Silva wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

We have two servers in a failover relationship, both running 4.1-ESV-R7.<br>

  After a reload of dhcpd on the secondary, it has not come out of the<br>

recover state after almost an hour.  We've had this happen with 3.1.3<br>

and recently upgraded to this version.  The only thing we've been able<br>

to do is stop both instances of dhcpd and remove "my state" and "partner<br>

state" from dhcpd.leases.<br>

<br>

<br>

Here's the timeline of what happened.<br>

<br>

1.  A change was made to the configuration of the primary and dhcpd<br>

reloaded at 15:39:14.<br>

2. The primary moved back to a "normal" state at 15:43:42<br>

<br>

Apr 24 15:39:14 primary-dhcp dhcpd: failover peer dhcp: I move from<br>

normal to shutdown<br>

Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: peer moves from<br>

normal to partner-down<br>

Apr 24 15:39:15 primary-dhcp dhcpd: failover peer dhcp: I move from<br>

shutdown to recover<br>

Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from<br>

recover to startup<br>

Apr 24 15:40:18 primary-dhcp dhcpd: failover peer dhcp: I move from<br>

startup to recover<br>

Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer update<br>

completed.<br>

Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from<br>

recover to recover-done<br>

Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: peer moves from<br>

partner-down to normal<br>

Apr 24 15:43:42 primary-dhcp dhcpd: failover peer dhcp: I move from<br>

recover-done to normal<br>

Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: peer moves from<br>

normal to shutdown<br>

Apr 24 15:44:53 primary-dhcp dhcpd: failover peer dhcp: I move from<br>

normal to partner-down<br>

Apr 24 15:44:54 primary-dhcp dhcpd: peer dhcp: disconnected<br>

Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from<br>

shutdown to recover<br>

Apr 24 15:45:59 primary-dhcp dhcpd: failover peer dhcp: peer moves from<br>

recover to recover<br>

<br>

<br>

<br>

3.  The corresponding change was made on the secondary and it was<br>

reloaded at 15:44:53<br>

<br>

4.  At 15:44:54 it came back up into recover, then moved from recover to<br>

startup, then from startup to recover.  That's where it's been ever since.<br>

<br>

Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: I move from<br>

normal to shutdown<br>

Apr 24 15:44:53 secondary-dhcp dhcpd: failover peer dhcp: peer moves<br>

from normal to partner-down<br>

Apr 24 15:44:54 secondary-dhcp dhcpd: failover peer dhcp: I move from<br>

shutdown to recover<br>

Apr 24 15:45:56 secondary-dhcp dhcpd: failover peer dhcp: I move from<br>

recover to startup<br>

Apr 24 15:45:59 secondary-dhcp dhcpd: failover peer dhcp: I move from<br>

startup to recover<br>

<br>

<br>

<br>

Here's dhcpd.conf for the primary:<br>

<br>

option domain-name-servers 192.168.50.41, 192.168.50.40 ;<br>

option ntp-servers 192.168.50.40, 192.168.50.41;<br>

default-lease-time 86400;<br>

max-lease-time 86400;<br>

one-lease-per-client true;<br>

ddns-update-style ad-hoc;<br>

ddns-updates off;<br>

authoritative;<br>

if substring (option dhcp-client-identifier, 0, 5) = 01:52:41:53:20 {<br>

         deny booting;<br>

}<br>

option voip-tftp-server-address code 150 = array of ip-address ;<br>

set vendor-string = option vendor-class-identifier;<br>

failover peer "dhcp" {<br>

          primary;<br>

          address 192.168.100.2;<br>

          port 520;<br>

          peer address 192.168.101.2;<br>

          peer port 520;<br>

          max-response-delay 60;<br>

          max-unacked-updates 10;<br>

          mclt 120;<br>

          split 255;<br>

          load balance max seconds 5;<br>

        }<br>

subnet 192.168.100.0 netmask 255.255.255.224 {<br>

         }<br>

include "/dhcpd/dhcpd.network.conf";<br>

<br>

<br>

and the /dhcpd/dhcpd.network.conf file holds the scope definitions. Both<br>

servers sync time through ntp and have the same exact time.<br>

<br>

<br>

Any information would be appreciated.<br>

<br>

<br>

<br>

</blockquote>

<br>

______________________________<u></u>_________________<br>

dhcp-users mailing list<br>

<a href="mailto:dhcp-users@lists.isc.org" target="_blank">dhcp-users@lists.isc.org</a><br>

<a href="https://lists.isc.org/mailman/listinfo/dhcp-users" target="_blank">https://lists.isc.org/mailman/<u></u>listinfo/dhcp-users</a><br>

</div></div></blockquote></div><br></div>