Once upon a time, ISC DHCP allowed the primary to send POOLREQ to the secondary. It would respond by moving BACKUP leases to the FREE state to balance the pool. I don't have a lot of documentation to understand this move except that it was done as a trial, to float it before the draft authors (and boy howdy are there a lot of you, hi and sorry for the bother). Some time later, the changes were reverted, and the reason for that has also missed my gaze. Currently only the primary will initiate changing the state of FREE or BACKUP leases between those two states. These things were done before my time, so they may confuse me. Any clarification would be most welcome. Fast forward to today, I just fixed a bug in ISC DHCP 3.0.2 release engineering where the server (either one of them) would send a POOLREQ with a count of the number of leases allocated to the peer when it was done processing the POOLREQ request, rather than the POOLRESP dictated in the draft. I have to believe that would have resulted in an infinite loop in the original code in the day when both the primary and secondary responded to POOLREQ messages, and if problems were being encountered (or if this was even ever attempted), might have been the root cause. An upcoming feature release will seek to sync ISC DHCP to the most recent failover draft (-12), work which hasn't begun yet. So this seems like an excellent time to revisit this, at least from my point of view, as I can easily put a change like this in. As it stands, for ISC DHCP to attempt to allocate BACKUP leases to the FREE state, on the primary, seems glaringly hackish when a 'request for leases' failover message already exists. It seems at the surface far cleaner for the secondary to make the allocation to the primary, as this avoids all question of errant attempts to change lease states. For example, ISC's implementation tries half-heartedly to avoid conflicts by skipping down the backup lease list to get the last (n) leases on the list. The comment says that this helps avoid leases that are likely to be allocated soon, and it's partially right. In most environments, however, most leases exit the free/backup state because the client they were originally allocated to has returned to the network, after a brief hiatus rather than a long one, so neither traversing from the start of the list (leases expired or released longest ago) nor the end of the list (most recently expired or released) seems to be at all reasonable...your chances of conflict are roughly equal either way, depending entirely on the site's clients' behaviour pattern. Now, take that above behaviour, and try to implement the draft's recommended client-lease affinity (prefer to move FREE leases whose previous client was handled, determined by the load-balance-algorithm, by the secondary to BACKUP state). How many leases should be skipped? One? None? Messy. Needlessly so if we could ask the secondary to make the allocation instead. So, why is having primaries send POOLREQs a bad thing? So long as I am writing a long-winded email, I shall make good use of that time and double it's size in order to ask related questions, to invite discussion. Sorry. Upon the subject of 'client-lease affinity' (sorry, lack of better words), the paragraph in section 5.4 of draft revision -12; An IP address will not become owned by the server which allocated it initially when it is released or the lease expires because, in gen- eral, that server will have had to replenish its pool of available addresses well in advance of any likely lease expirations. Thus, having a particular IP address cycle back to the secondary might well put the secondary more out of balance with respect to the primary instead of enhancing the balance of available addresses between them. I don't agree. Any reasonably busy system, reasonably busy enough to require a failover partnership to govern the availability of the service, will have a significant number of leases ebb and flow between free states and ACTIVE. If the load-balance-algorithm is even marginally effective (which operational experience seems to indicate that it is highly effective), then the number of leases returning to free states will be approximately split between servers. What this paragraph is saying is that since pushing out some leases in BACKUP state after exiting EXPIRED placeholder-state "might" misbalance the pools in the secondary's direction, we should choose instead to misbalance the pools in the primary's direction and provide a window in which clients may get a different lease. I think if we allowed the server to allocate leases to FREE and BACKUP states as applicable upon expiry, the pools would grow at roughly equal rates, and in all likelyhood would remain more balanced throughout. In particular, unless someone can argue with me effectively to the contrary, I plan to implement changes to ISC DHCP's handling of leases returning to FREE state upon expiry - they will be returned to BACKUP state instead by the primary if and only if the load-balance- algorithm hashes out to indicate the client is served best by the secondary server. I might be convinced to add a clause that the assignment would not misbalance the pool, but I'm still debating that (it might be better to misbalance the pool and move the primary's leases which the secondary owns to the FREE state). Upon POOLREQ, or at startup, a 'greater or less than 10%' scheme may be used. That is, in the language easiest for me to enumerate my thoughts: leases-to-send = (nbr_free_leases - nbr_backup_leases) / 2; while (leases_to_send > (total_leases / -10) && leases_remain) { [assign a FREE lease 'that once belonged to peer' to BACKUP] leases_to_send--; } while (leases_to_send > (total_leases / 10) && leases_remain) { [assign any FREE lease to BACKUP] leases_to_send--; } And similar. The second loop would not be entered if the by-hash loop had at least gotten the misbalance within 10%. A second threshold for wether or not to engage in pool rebalancing at all (possibly defaulting to 20%) will be introduced enclosing both of the above loops, to limit the frequencey of recurring rebalance events due to a client load misbalance (transferring 1 lease every time 1 lease is allocated due to stepping 1 lease out of the 10% margin). Or I might go with 5/10 defaults instead of 10/20. Finally, Upon the subject of times that are wise to make rebalance checks, it is currently a bug we plan to repair in ISC DHCP that it only checks for pool misbalances after successfully allocating a new address to a client, or at startup. Successfully allocating a new address is impossible if there are no addresses left to allocate, which leads to a bad deadlock condition. In evaluating how to fix this bug permanently, I have considered the draft's advice to examine pool misbalance "whenever the number of available addresses for either the primary or secondary changes", and I am not enamoured with it. In my mind, to follow this advice upon receiving a BNDUPD or BNDACK that changed the size of these pools may cause one server or the other to send repetitious POOLREQ messages. The above paragraph's "buggy" behaviour is preferential to this in limiting the number of times a POOLREQ might be sent, but is still not optimal (it's still possible that two leases might be allocated in rapid succession before any POOLRESP or any BNDUPD's are received). So, I'm presently of a mind to implement pool balance checks upon a timer, defaulting to several hours, and moved in a downward direction only in the case where pool depletion activity (as measured from timestamps from the last time a lease was allocated from the pool for a client) indicates it may be depleted sooner. Say 1/4th the time at which the pool is estimated to be depleted, but not to fall below a reasonable absolute minimum (probably defaulting to 15 seconds), and further limited to the lowest time across all pools for one failover partner (since POOLREQ does not identify one pool). I seek the authors' wisdom in the above, or anyone else's if they know the answer, if I've missed the point. BTW, anyone know if is this something I should have asked dhcwg instead? I kinda wonder if this is still a DHC WG item. -- David W. Hankins "If you don't do it right the first time, Software Engineer you'll just have to do it again." Internet Systems Consortium, Inc. -- Jack T. Hankins -- Attached file included as plaintext by Ecartis -- -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (FreeBSD) iD8DBQFB8YzmcXeLeWu2vmoRAlPZAJ9AnmFVIw9h9TYE9drMYBUEjLBwmQCfUBLD aHb69rrA+VAitsS+DtDKCyU= =X7E/ -----END PGP SIGNATURE-----