[BUG] dhcp-failover: Both servers remain POTENTIAL-CONFLICT forever.

Toyo Abe tabe at miraclelinux.com
Mon Nov 30 03:59:31 UTC 2009


Hello,

I'd already reported this problem to dhcp-bugs at isc.org.(Bug# is 20352)
But I don't get any response for 2 months.
So I re-post the same problem report to dhcp-workers/hackers in this time.

I've encountered a problem in DHCP-Failover feature of isc-dhcp.
I wonder if you could help me with resolving the problem.

The following message was shown and both of the primary and the secondary
looked like out-of-service. And they never got recovered.

DHCPDISCOVER from 00:0c:29:e9:39:0c via eth1: not responding (resolving conflicts)

I found there are two cases that cause the problem.
One scenario is as follows.
(isc-dhcp version is cvs HEAD)

*SCENARIO 1*

    Primary                                  Secondary

       |                                         |
POTENTIAL-CONFLICT                        POTENTIAL-CONFLICT
       |                                         |
       | >-- UPDREQ ----------------->           |
       |                                         |
       |         <------------------- BNDUPD --< |
       | >-- BNDACK ----------------->           |
      ...                                       ...
       |                                         |
       |         <------------------- BNDUPD --< |
       | <<========= Connection lost =========>> |
       |                                         |
      ...                                       ...
       |                                         |
RESOLUTION-INTERRUPTED                    RESOLUTION-INTERRUPTED
       |                                         |
       | >-- CONNECT ---------------->           |
       |         <--------------- CONNECTACK --< |
       |                                         |
       | <<===== Connection established ======>> |
       |                                         |
POTENTIAL-CONFLICT                        POTENTIAL-CONFLICT
       |                                         |
       | >-- (Doesn't send UPDREQ) --> X         |
       |                                         |
       | >-- CONTACT ---------------->           |
       |         <------------------ CONTACT --< |
       |                                         |
       | >-- CONTACT ---------------->           |
       |         <------------------ CONTACT --< |
       |                                         |
       | >-- CONTACT ---------------->           |
       |         <------------------ CONTACT --< |
       |                                         |
      ...                                       ...

The primary seems to continue waiting for BNDUPD or UPDDONE from the secondary.
OTOH, the secondary continues waiting for UPDREQ from the primary.

The detailed log is inlined below.
I wrote partner-down state in dhcpd.leases on both servers
to trigger updating binding info.

--------------------------------------------------
Internet Systems Consortium DHCP Server 4.2.0
Copyright 2004-2009 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Wrote 2 leases to leases file.
Listening on LPF/eth1/00:0c:29:9b:e3:83/192.168.10.0/24
Sending on   LPF/eth1/00:0c:29:9b:e3:83/192.168.10.0/24

No subnet declaration for eth0 (10.2.102.74).
** Ignoring requests on eth0.  If this is not what
  you want, please write a subnet declaration
  in your dhcpd.conf file for the network segment
  to which interface eth0 is attached. **

Sending on   Socket/fallback/fallback-net
failover peer adapter: I move from partner-down to startup
failover peer adapter: peer moves from normal to partner-down
failover peer adapter: I move from startup to partner-down
failover peer adapter: peer moves from partner-down to partner-down
failover peer adapter: I move from partner-down to potential-conflict
Sent update request message to adapter
failover peer adapter: peer moves from partner-down to
potential-conflict
receive_packet failed on eth1: Network is down
timeout waiting for failover peer adapter
peer adapter: disconnected
failover peer adapter: I move from potential-conflict to resolution-interrupted
failover peer adapter: peer moves from potential-conflict to resolution-interrupted
failover peer adapter: I move from resolution-interrupted to potential-conflict
failover peer adapter: peer moves from resolution-interrupted to potential-conflict
DHCPDISCOVER from 00:0c:29:e9:39:0c via eth1: not responding (resolving conflicts)
DHCPDISCOVER from 00:0c:29:e9:39:0c via eth1: not responding (resolving conflicts)
DHCPDISCOVER from 00:0c:29:e9:39:0c via eth1: not responding (resolving conflicts)
DHCPDISCOVER from 00:0c:29:e9:39:0c via eth1: not responding (resolving conflicts)
DHCPDISCOVER from 00:0c:29:e9:39:0c via eth1: not responding (resolving conflicts)
DHCPDISCOVER from 00:0c:29:e9:39:0c via eth1: not responding (resolving conflicts)
--------------------------------------------------

I tracked down who was guilty. And found the following code in
dhcp_failover_send_update_request() triggered the problem.

  if (state -> curUPD)
          return ISC_R_ALREADYRUNNING;

I found the following note in RELNOTE.
>From 'Changes since 3.0.2':

- In the case where a secondary server lost its stable storage while the
 primary was still in communications-interrupted, and came back online,
 the lease databases would not be fully transferred to the secondary.
 This was due to the secondary errantly sending an extra UPDREQ message
 when the primary made its state transition to PARTNER-DOWN known.

I think the curUPD was introduced to fix the problem described above,
but I don't understand yet what was the original problem.


The second problematic scenario is as follows.

*SCENARIO 2*

NOTE: To be exact, the SCENARIO 2 results in servers shutdown.
      Which means the phenomenon itself is not exactly the same
      as SCENARIO 1. But the root cause is similar.

 Primary                                  Secondary

    |                                         |
POTENTIAL-CONFLICT                        POTENTIAL-CONFLICT
    |                                         |
    | >-- UPDREQ ----------------->           |
    |                                         |
    |         <------------------- BNDUPD --< |
    | >-- BNDACK ----------------->           |
   ...                                       ...
    |                                         |
    |         <------------------ UPDDONE --< |
    |                                         |
CONFLICT-DONE                                 |
    |                                         |
    |         <------------------- UPDREQ --< |
    |                                         |
    | >-- BNDUPD ----------------->           |
    |         <------------------- BNDACK --< |
   ...                                       ...
    |                                         |
    | <<========= Connection lost =========>> |
    |                                         |
   ...                                       ...
    |                                         |
    |                                   RESOLUTION-INTERRUPTED
    |                                         |
    | >-- CONNECT ---------------->           |
    |         <--------------- CONNECTACK --< |
    |                                         |
    | <<===== Connection established ======>> |
    |                                         |
    |                                   POTENTIAL-CONFLICT
    |                                         |
    |         X <-- (Doesn't send UPDREQ) --< |
    |                                         |
    | >-- CONTACT ---------------->           |
    |         <------------------ CONTACT --< |
    |                                         |
    | >-- CONTACT ---------------->           |
    |         <------------------ CONTACT --< |
    |                                         |
    | >-- CONTACT ---------------->           |
    |         <------------------ CONTACT --< |
    |                                         |
   ...                                       ...

Furthermore, I saw the following error msg on the primary
when I tested the case above.

Peer adapter: Invalid attempt to move from potential-conflict to resolution-interrupted while local state is conflict-done.

dhcp_failover_peer_state_changed() considers moving from
potential-conflict to resolution-interrupted is invalid.
However, according to the original spec (draft-ietf-dhc-failover-12.txt)
the primary has to remain in conflict-done in the above case
and accept later UPDREQ from the secondary again. 



I fixed locally this problem (the patch is attached to this email).
The attached is comprised of the three changes.
- Allow the secondary to change its state from potential-conflict to
  resolution-interrupted while the primary is in conflict-done.
- Allow the secondary to send UPDREQ when its previous state is
  potential-conflict and the primary's is conflict-done.
- Add retrying mechanism of UPDREQ.
  In case of SCENARIO 1, for example, It allows the primary to resend
  UPDREQ only when the scenario happens.
  I choose "disconnected" event to determine if the case occurs.
  If my state is partner-down when get "disconnected" event, then
  the primary clears state->curUPD so that it can start the updating
  process again. OTOH, the secondary purges its ack_queue because
  BNDACKs, which are corresponding to previously sent BNDUPDs, might not
  be on the primary's toack_queue. I think the queue-purge is required
  because dhcp_failover_generate_update_queue() no longer dequeues all
  pending updates. 
  The same thing is true in SCENARIO 2.

But I'm not an expert of dhcp, so I'm not confident about my fix.
Could you give me advice about how this problem should be solved?

Thanks in advance,
-Toyo Abe



-------------- next part --------------
A non-text attachment was scrubbed...
Name: fix-resolving-conflicts-error.patch
Type: text/x-patch
Size: 2606 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/dhcp-workers/attachments/20091130/5df68002/attachment.bin>


More information about the dhcp-workers mailing list