Primary on failover pair gets "stuck" in partner state

Oscar Ricardo Silva oscars at mail.utexas.edu
Wed Aug 31 15:39:23 UTC 2011


Here's the logs and nothing really jumps out:

Primary:

Aug 21 12:02:48 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: I move 
from recover-wait to recover-done
Aug 21 12:02:48 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: peer 
moves from partner-down to normal
Aug 21 12:02:48 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: I move 
from recover-done to normal
Aug 21 12:02:48 DHCP-SERVER-1 dhcpd: balancing pool 8ca9138 SJH109 total 
442  free 258  backup 172  lts 43  max-own (+/-)43
.....
.....  At this point the primary has been restarted and both servers
.....  have re-established a failover relationship
.....
.....  The secondary is about to be restarted
.....
Aug 21 12:02:52 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: I move 
from normal to partner-down
Aug 21 12:02:52 DHCP-SERVER-1 dhcpd: DHCPREQUEST for 128.62.6.19 
(172.16.200.34) from f0:b4:79:61:45:54 (Josephs-iPod) via 172.17.33.58
Aug 21 12:02:52 DHCP-SERVER-1 dhcpd: DHCPACK on 128.62.6.19 to 
f0:b4:79:61:45:54 (Josephs-iPod) via 172.17.33.58
Aug 21 12:02:52 DHCP-SERVER-1 dhcpd: DHCPDISCOVER from 00:23:6c:6d:ac:2a 
via 172.17.32.54
Aug 21 12:02:52 DHCP-SERVER-1 dhcpd: bind update on 198.213.236.6 got 
ack from failover-dhcp: xid mismatch.
.....
Aug 21 12:03:01 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: peer 
moves from shutdown to recover
Aug 21 12:03:01 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: peer 
moves from recover to recover
Aug 21 12:03:01 DHCP-SERVER-1 dhcpd: Update request all from 
failover-dhcp: sending update
.....
.....
.....  It's now more than 12 minutes since the secondary was re-started. 
  The MCLT is 120 seconds
.....  so the two servers should've re-established a failover 
relationship.  The same process
.....  occurred several days later with exact same log entries.  We 
attempted to down the secondary
.....  and start it again.
.....
.....
.....
Aug 21 12:15:17 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: peer 
moves from recover to recover
Aug 21 12:15:17 DHCP-SERVER-1 dhcpd: failover peer failover-dhcp: peer 
moves from recover to recover
Aug 21 12:15:17 DHCP-SERVER-1 dhcpd: Received update request while old 
update still flying!  Silently discarding old request.
Aug 21 12:15:17 DHCP-SERVER-1 dhcpd: Update request all from 
failover-dhcp: sending update





Secondary:

Aug 21 12:02:48 DHCP-SERVER-2 dhcpd: DHCPOFFER on 198.213.203.45 to 
f0:b4:79:18:92:a2 via 172.17.32.4
Aug 21 12:02:48 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: peer 
moves from recover-wait to recover-done
Aug 21 12:02:48 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: I move 
from partner-down to normal
Aug 21 12:02:48 DHCP-SERVER-2 dhcpd: balancing pool a7f10f0 SJH109 total 
442  free 258  backup 172  lts -43  max-own (+/-)43
Aug 21 12:02:48 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: peer 
moves from recover-done to normal
.....
.....  At this point the primary has been restarted and both servers
.....  have re-established a failover relationship
.....
.....  The secondary is about to be restarted
.....
Aug 21 12:02:52 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: I move 
from normal to shutdown
Aug 21 12:02:52 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: peer 
moves from normal to partner-down
Aug 21 12:02:53 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: I move 
from shutdown to recover
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd: Internet Systems Consortium DHCP 
Server V3.1.3 [UTnet patch v2.3]
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd: Copyright 2004-2009 Internet 
Systems Consortium.
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd: All rights reserved.
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd: For info, please visit 
https://www.isc.org/software/dhcp/
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd:   PATCHED GLOBAL CONFIGS FOUND:
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd:    + Enabled keying off MAC-ADDRESS 
instead of UID
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd:    + Enabled logging of active 
lease expiration
Aug 21 12:02:54 DHCP-SERVER-2 dhcpd:
Aug 21 12:03:00 DHCP-SERVER-2 dhcpd: Wrote 0 deleted host decls to 
leases file.
Aug 21 12:03:00 DHCP-SERVER-2 dhcpd: Wrote 0 new dynamic host decls to 
leases file.
Aug 21 12:03:01 DHCP-SERVER-2 dhcpd: Wrote 72468 leases to leases file.
Aug 21 12:03:01 DHCP-SERVER-2 dhcpd: Listening on 
LPF/eth0/00:19:b9:df:24:3b/172.16.201.32/27
Aug 21 12:03:01 DHCP-SERVER-2 dhcpd: Sending on 
LPF/eth0/00:19:b9:df:24:3b/172.16.201.32/27
Aug 21 12:03:01 DHCP-SERVER-2 dhcpd: Sending on Socket/fallback/fallback-net
Aug 21 12:03:01 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: I move 
from recover to startup
Aug 21 12:03:01 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: I move 
from startup to recover
Aug 21 12:03:01 DHCP-SERVER-2 dhcpd: Sent update request all message to 
failover-dhcp
.....
.....
.....  It's now more than 12 minutes since the secondary was re-started. 
  The MCLT is 120 seconds
.....  so the two servers should've re-established a failover 
relationship.  The same process
.....  occurred several days later with exact same log entries.  We 
attempted to down the secondary
.....  and start it again.
.....
.....
.....
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd: Internet Systems Consortium DHCP 
Server V3.1.3 [UTnet patch v2.3]
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd: Copyright 2004-2009 Internet 
Systems Consortium.
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd: All rights reserved.
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd: For info, please visit 
https://www.isc.org/software/dhcp/
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd:   PATCHED GLOBAL CONFIGS FOUND:
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd:    + Enabled keying off MAC-ADDRESS 
instead of UID
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd:    + Enabled logging of active 
lease expiration
Aug 21 12:15:14 DHCP-SERVER-2 dhcpd:
Aug 21 12:15:16 DHCP-SERVER-2 dhcpd: Wrote 0 deleted host decls to 
leases file.
Aug 21 12:15:16 DHCP-SERVER-2 dhcpd: Wrote 0 new dynamic host decls to 
leases file.
Aug 21 12:15:17 DHCP-SERVER-2 dhcpd: Wrote 72468 leases to leases file.
Aug 21 12:15:17 DHCP-SERVER-2 dhcpd: Listening on 
LPF/eth0/00:19:b9:df:24:3b/172.16.201.32/27
Aug 21 12:15:17 DHCP-SERVER-2 dhcpd: Sending on 
LPF/eth0/00:19:b9:df:24:3b/172.16.201.32/27
Aug 21 12:15:17 DHCP-SERVER-2 dhcpd: Sending on Socket/fallback/fallback-net
Aug 21 12:15:17 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: I move 
from recover to startup
Aug 21 12:15:17 DHCP-SERVER-2 dhcpd: failover peer failover-dhcp: I move 
from startup to recover
Aug 21 12:15:17 DHCP-SERVER-2 dhcpd: Sent update request all message to 
failover-dhcp





On 08/30/2011 07:00 AM, dhcp-users-request at lists.isc.org wrote:
> Message: 4 Date: Mon, 29 Aug 2011 21:45:17 -0600 From: Jason Gerfen
> <jason.gerfen at gmail.com> Subject: Re: Primary on failover pair gets
> "stuck" in partner state To: Users of ISC DHCP
> <dhcp-users at lists.isc.org> Message-ID:
> <CAMU2igsjFEZ771SQpRQxd=DrXvbVQwserV9rxMQDuS-jn8x+VA at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1 Does anything show up in
> the logs? Ideally this would be the first place to start looking for odd
> behavior. On Mon, Aug 29, 2011 at 9:23 PM, Oscar Ricardo Silva
> <oscars at mail.utexas.edu> wrote:
>> >  I have a pair of servers running 3.1.3 in a failover relationship.
>> >  Recently we added new scopes and were testing changes to the
>> >  configuration, requiring multiple starts/restarts. ?Each time, the
>> >  primary was restarted, we waited for both servers to move to normal,
>> >  then restarted the secondary. ?After one of these cycles, the primary
>> >  came up fine and moved to a "normal" state. ?The secondary came up and
>> >  reported that it was recovering (as is normal). ?The problem is that the
>> >  secondary never recovered, even after several hours (6 hours of
>> >  waiting). ?There was communication between the two and the only change
>> >  was the addition of new scopes.
>> >
>> >  At one point, when the secondary was completely down, the primary was
>> >  still reporting that it had a local state of "Partner Down" but a
>> >  partner state of "Recovering". ?Nothing done to the secondary caused the
>> >  states on the primary to change.
>> >
>> >  Bringing both servers down, the starting the primary, then the secondary
>> >  fixed the problem. ?It's the second time we've seen this and we're not
>> >  exactly sure why one server gets stuck in a particular state. ?Time
>> >  isn't an issue as both servers are updated via NTP and they have the
>> >  same time down to several microseconds.
>> >
>> >  We don't think it's a firewall or other communication problem (yes,
>> >  everyone says that right). ?After both dhcpd processes were shutdown
>> >  then restarted, the two servers established a failover relationship with
>> >  no changes to the underlying network, or iptables.
>> >
>> >
>> >
>> >
>> >  Oscar
>> >
>> >
>> >  Primary:
>> >
>> >  failover peer "failover-dhcp" {
>> >  ? ? ? ? primary;
>> >  ? ? ? ? address 192.168.100.34;
>> >  ? ? ? ? port 520;
>> >  ? ? ? ? peer address 192.168.101.34;
>> >  ? ? ? ? peer port 520;
>> >  ? ? ? ? max-response-delay 60;
>> >  ? ? ? ? max-unacked-updates 10;
>> >  ? ? ? ? mclt 120;
>> >  ? ? ? ? split 255;
>> >  ? ? ? ? load balance max seconds 5;
>> >  ? ? ? }
>> >
>> >
>> >  Secondary:
>> >
>> >  failover peer "failover-dhcp" {
>> >  ? ? ? ? secondary;
>> >  ? ? ? ? address 192.168.101.34;
>> >  ? ? ? ? port 520;
>> >  ? ? ? ? peer address 192.168.100.34;
>> >  ? ? ? ? peer port 520;
>> >  ? ? ? ? max-response-delay 60;
>> >  ? ? ? ? max-unacked-updates 10;
>> >  ? ? ? ? load balance max seconds 5;
>> >  ? ? ? }
>> >
>> >
>> >
>> >  Yes, I know the "split 255" statement is a little weird but we do this
>> >  to try and "prefer" the primary to facilitate troubleshooting. ?The
>> >  servers are started/restarted several times each month as new networks
>> >  are defined and the split statement doesn't cause any issues.
>> >
>> >  _______________________________________________
>> >  dhcp-users mailing list
>> >  dhcp-users at lists.isc.org
>> >  https://lists.isc.org/mailman/listinfo/dhcp-users
>> >
>
>




More information about the dhcp-users mailing list