Failover mechanism problem
Hans at Liss.pp.se
Mon Sep 28 10:06:28 UTC 2009
I am trying to help investigate a strange failover breakdown and I
wonder if anyone else has encountered this particular error.
1) The network connection between two peers in a failover pair went down.
2) Server #1 discovered that the link was down and tried to reestablish
3) Server #2 received the "connect" message but hadn't noticed yet that
the link was down, so it declined the new connection with an "already
connected" message, and even after the link was back up, no more
attempts to reestablish the link were performed.
What I am wondering about most is this: If peer #1 discovers a lost
connection and tries to establish a new one *before* peer #2 discovers
that the link is down, what is really supposed to happen? Judging from
the code in server/failover.c, it will only check whether the link
structure is present, not whether the link is actually up.
A comment before this code says "If we already have a link to the peer,
it must be dead, so drop it." with a followup saying "Is this the right
thing to do? Probably not - what if both peers start at the same time?"
and has said so from version dhcp-3.0b2pl1 onward, when the relevant
code first appeared.
Instead of dropping the old connection, it unconditionally sends an
FTR_DUP_CONNECTION message and refuses the new connection. Shouldn't it
check whether the current connection is still alive before deciding what
to do? Or am I barking up the wrong tree here?
(The comment mentioned above is followed by another one: )
Here are some excerpts from the logfiles:
Sep 20 12:57:53 dns01 dhcpd: peer dhcp: disconnected
Sep 20 12:57:53 dns01 dhcpd: failover peer dhcp: I move from normal to
Sep 20 12:58:42 dns02 dhcpd: Failover CONNECTACK from dhcp: already
Sep 20 12:58:42 dns02 dhcpd: failover peer dhcp: peer moves from normal
Sep 20 12:58:43 dns01 dhcpd: Failover DISCONNECT from dhcp: Connection
rejected, duplicate connection.
Sep 20 12:58:43 dns01 dhcpd: peer dhcp: disconnected
Sep 20 12:58:45 dns02 dhcpd: peer dhcp: disconnected
Sep 20 12:58:45 dns02 dhcpd: failover peer dhcp: I move from normal to
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3242 bytes
Desc: S/MIME Cryptographic Signature
More information about the dhcp-users