Failover mechanism problem
Hans Liss
Hans at Liss.pp.se
Mon Sep 28 10:06:28 UTC 2009
Hi all,
I am trying to help investigate a strange failover breakdown and I
wonder if anyone else has encountered this particular error.
1) The network connection between two peers in a failover pair went down.
2) Server #1 discovered that the link was down and tried to reestablish
the link.
3) Server #2 received the "connect" message but hadn't noticed yet that
the link was down, so it declined the new connection with an "already
connected" message, and even after the link was back up, no more
attempts to reestablish the link were performed.
What I am wondering about most is this: If peer #1 discovers a lost
connection and tries to establish a new one *before* peer #2 discovers
that the link is down, what is really supposed to happen? Judging from
the code in server/failover.c, it will only check whether the link
structure is present, not whether the link is actually up.
A comment before this code says "If we already have a link to the peer,
it must be dead, so drop it." with a followup saying "Is this the right
thing to do? Probably not - what if both peers start at the same time?"
and has said so from version dhcp-3.0b2pl1 onward, when the relevant
code first appeared.
Instead of dropping the old connection, it unconditionally sends an
FTR_DUP_CONNECTION message and refuses the new connection. Shouldn't it
check whether the current connection is still alive before deciding what
to do? Or am I barking up the wrong tree here?
(The comment mentioned above is followed by another one: )
Here are some excerpts from the logfiles:
Sep 20 12:57:53 dns01 dhcpd: peer dhcp: disconnected
Sep 20 12:57:53 dns01 dhcpd: failover peer dhcp: I move from normal to
communications-interrupted
Sep 20 12:58:42 dns02 dhcpd: Failover CONNECTACK from dhcp: already
connected
Sep 20 12:58:42 dns02 dhcpd: failover peer dhcp: peer moves from normal
to communications-interrupted
Sep 20 12:58:43 dns01 dhcpd: Failover DISCONNECT from dhcp: Connection
rejected, duplicate connection.
Sep 20 12:58:43 dns01 dhcpd: peer dhcp: disconnected
Sep 20 12:58:45 dns02 dhcpd: peer dhcp: disconnected
Sep 20 12:58:45 dns02 dhcpd: failover peer dhcp: I move from normal to
communications-interrupted
/Hans
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3242 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20090928/904e1ed3/attachment.bin>
More information about the dhcp-users
mailing list