Once upon a time, ISC DHCP allowed the primary to send POOLREQ to the
secondary.  It would respond by moving BACKUP leases to the FREE state
to balance the pool.  I don't have a lot of documentation to understand
this move except that it was done as a trial, to float it before the
draft authors (and boy howdy are there a lot of you, hi and sorry for
the bother).
Some time later, the changes were reverted, and the reason for that
has also missed my gaze.  Currently only the primary will initiate
changing the state of FREE or BACKUP leases between those two states.

These things were done before my time, so they may confuse me.  Any
clarification would be most welcome.

Fast forward to today, I just fixed a bug in ISC DHCP 3.0.2 release
engineering where the server (either one of them) would send a POOLREQ
with a count of the number of leases allocated to the peer when it was
done processing the POOLREQ request, rather than the POOLRESP dictated
in the draft.

I have to believe that would have resulted in an infinite loop in the
original code in the day when both the primary and secondary responded
to POOLREQ messages, and if problems were being encountered (or if this
was even ever attempted), might have been the root cause.

An upcoming feature release will seek to sync ISC DHCP to the most
recent failover draft (-12), work which hasn't begun yet.

So this seems like an excellent time to revisit this, at least from
my point of view, as I can easily put a change like this in.


As it stands, for ISC DHCP to attempt to allocate BACKUP leases to the
FREE state, on the primary, seems glaringly hackish when a 'request for
leases' failover message already exists.  It seems at the surface
far cleaner for the secondary to make the allocation to the primary,
as this avoids all question of errant attempts to change lease states.

For example, ISC's implementation tries half-heartedly to avoid
conflicts by skipping down the backup lease list to get the last (n)
leases on the list.  The comment says that this helps avoid leases that
are likely to be allocated soon, and it's partially right.  In most
environments, however, most leases exit the free/backup state because
the client they were originally allocated to has returned to the
network, after a brief hiatus rather than a long one, so neither
traversing from the start of the list (leases expired or released
longest ago) nor the end of the list (most recently expired or
released) seems to be at all reasonable...your chances of conflict
are roughly equal either way, depending entirely on the site's clients'
behaviour pattern.

Now, take that above behaviour, and try to implement the draft's
recommended client-lease affinity (prefer to move FREE leases whose
previous client was handled, determined by the load-balance-algorithm,
by the secondary to BACKUP state).  How many leases should be skipped?
One?  None?

Messy.  Needlessly so if we could ask the secondary to make the
allocation instead.

So, why is having primaries send POOLREQs a bad thing?


So long as I am writing a long-winded email, I shall make good use
of that time and double it's size in order to ask related questions,
to invite discussion.

Sorry.

Upon the subject of 'client-lease affinity' (sorry, lack of better
words), the paragraph in section 5.4 of draft revision -12;

   An IP address will not become owned by the server which allocated it
   initially when it is released or the lease expires because, in gen-
   eral, that server will have had to replenish its pool of available
   addresses well in advance of any likely lease expirations.  Thus,
   having a particular IP address cycle back to the secondary might well
   put the secondary more out of balance with respect to the primary
   instead of enhancing the balance of available addresses between them.

I don't agree.  Any reasonably busy system, reasonably busy enough
to require a failover partnership to govern the availability of the
service, will have a significant number of leases ebb and flow between
free states and ACTIVE.  If the load-balance-algorithm is even marginally
effective (which operational experience seems to indicate that it is
highly effective), then the number of leases returning to free states
will be approximately split between servers.

What this paragraph is saying is that since pushing out some leases
in BACKUP state after exiting EXPIRED placeholder-state "might"
misbalance the pools in the secondary's direction, we should choose
instead to misbalance the pools in the primary's direction and
provide a window in which clients may get a different lease.

I think if we allowed the server to allocate leases to FREE and BACKUP
states as applicable upon expiry, the pools would grow at roughly equal
rates, and in all likelyhood would remain more balanced throughout.

In particular, unless someone can argue with me effectively to the
contrary, I plan to implement changes to ISC DHCP's handling of
leases returning to FREE state upon expiry - they will be returned to
BACKUP state instead by the primary if and only if the load-balance-
algorithm hashes out to indicate the client is served best by the
secondary server.  I might be convinced to add a clause that the
assignment would not misbalance the pool, but I'm still debating
that (it might be better to misbalance the pool and move the
primary's leases which the secondary owns to the FREE state).

Upon POOLREQ, or at startup, a 'greater or less than 10%' scheme may
be used.  That is, in the language easiest for me to enumerate my
thoughts:

	leases-to-send = (nbr_free_leases - nbr_backup_leases) / 2;
	while (leases_to_send > (total_leases / -10) && leases_remain) {
		[assign a FREE lease 'that once belonged to peer' to BACKUP]
		leases_to_send--;
	}

	while (leases_to_send > (total_leases / 10) && leases_remain) {
		[assign any FREE lease to BACKUP]
		leases_to_send--;
	}

And similar.  The second loop would not be entered if the by-hash loop
had at least gotten the misbalance within 10%.  A second threshold for
wether or not to engage in pool rebalancing at all (possibly defaulting
to 20%) will be introduced enclosing both of the above loops, to limit
the frequencey of recurring rebalance events due to a client load
misbalance (transferring 1 lease every time 1 lease is allocated due to
stepping 1 lease out of the 10% margin).  Or I might go with 5/10
defaults instead of 10/20.


Finally,

Upon the subject of times that are wise to make rebalance checks, it
is currently a bug we plan to repair in ISC DHCP that it only checks
for pool misbalances after successfully allocating a new address to
a client, or at startup.  Successfully allocating a new address is
impossible if there are no addresses left to allocate, which leads
to a bad deadlock condition.

In evaluating how to fix this bug permanently, I have considered the
draft's advice to examine pool misbalance "whenever the number of
available addresses for either the primary or secondary changes", and
I am not enamoured with it.  In my mind, to follow this advice upon
receiving a BNDUPD or BNDACK that changed the size of these pools may
cause one server or the other to send repetitious POOLREQ messages.
The above paragraph's "buggy" behaviour is preferential to this in
limiting the number of times a POOLREQ might be sent, but is still not
optimal (it's still possible that two leases might be allocated in
rapid succession before any POOLRESP or any BNDUPD's are received).

So, I'm presently of a mind to implement pool balance checks upon a
timer, defaulting to several hours, and moved in a downward direction
only in the case where pool depletion activity (as measured from
timestamps from the last time a lease was allocated from the pool for
a client) indicates it may be depleted sooner.  Say 1/4th the time at
which the pool is estimated to be depleted, but not to fall below a
reasonable absolute minimum (probably defaulting to 15 seconds), and
further limited to the lowest time across all pools for one failover
partner (since POOLREQ does not identify one pool).


I seek the authors' wisdom in the above, or anyone else's if they know
the answer, if I've missed the point.

BTW, anyone know if is this something I should have asked dhcwg
instead?  I kinda wonder if this is still a DHC WG item.

-- 
David W. Hankins		"If you don't do it right the first time,
Software Engineer			you'll just have to do it again."
Internet Systems Consortium, Inc.		-- Jack T. Hankins

-- Attached file included as plaintext by Ecartis --

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.0 (FreeBSD)

iD8DBQFB8YzmcXeLeWu2vmoRAlPZAJ9AnmFVIw9h9TYE9drMYBUEjLBwmQCfUBLD
aHb69rrA+VAitsS+DtDKCyU=
=X7E/
-----END PGP SIGNATURE-----