Failover mechanism problem

Hans Liss Hans at Liss.pp.se
Mon Sep 28 10:06:28 UTC 2009


Hi all,

I am trying to help investigate a strange failover breakdown and I 
wonder if anyone else has encountered this particular error.

1)  The network connection between two peers in a failover pair went down.
2)  Server #1 discovered that the link was down and tried to reestablish 
the link.
3)  Server #2 received the "connect" message but hadn't noticed yet that 
the link was down, so it declined the new connection with an "already 
connected" message, and even after the link was back up, no more 
attempts to reestablish the link were performed.

What I am wondering about most is this: If peer #1 discovers a lost 
connection and tries to establish a new one *before* peer #2 discovers 
that the link is down, what is really supposed to happen? Judging from 
the code in server/failover.c, it will only check whether the link 
structure is present, not whether the link is actually up.

A comment before this code says "If we already have a link to the peer, 
it must be dead, so drop it." with a followup saying "Is this the right 
thing to do? Probably not - what if both peers start at the same time?" 
and has said so from version dhcp-3.0b2pl1 onward, when the relevant 
code first appeared.

Instead of dropping the old connection, it unconditionally sends an 
FTR_DUP_CONNECTION message and refuses the new connection. Shouldn't it 
check whether the current connection is still alive before deciding what 
to do? Or am I barking up the wrong tree here?

(The comment mentioned above is followed by another one: )

Here are some excerpts from the logfiles:

Sep 20 12:57:53 dns01 dhcpd: peer dhcp: disconnected
Sep 20 12:57:53 dns01 dhcpd: failover peer dhcp: I move from normal to 
communications-interrupted

Sep 20 12:58:42 dns02 dhcpd: Failover CONNECTACK from dhcp: already 
connected
Sep 20 12:58:42 dns02 dhcpd: failover peer dhcp: peer moves from normal 
to communications-interrupted

Sep 20 12:58:43 dns01 dhcpd: Failover DISCONNECT from dhcp: Connection 
rejected, duplicate connection.
Sep 20 12:58:43 dns01 dhcpd: peer dhcp: disconnected

Sep 20 12:58:45 dns02 dhcpd: peer dhcp: disconnected
Sep 20 12:58:45 dns02 dhcpd: failover peer dhcp: I move from normal to 
communications-interrupted


/Hans
 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3242 bytes
Desc: S/MIME Cryptographic Signature
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20090928/904e1ed3/attachment.bin>


More information about the dhcp-users mailing list