recover-wait period question
perl-list at network1.net
Wed Dec 13 14:43:43 UTC 2006
The example I gave was an actual real world occurrence. The secondary
server was rebooted for maintenance, it failed to boot. It was going to
be several hours before onsite personnel would be able to investigate.
Hence, we set the primary to partner-down status by modifying the lease
file (we haven't tackled OMAPI yet).
Later that day, onsite personnel managed to boot the secondary server.
At this point, the secondary server entered recover, and then quickly
entered recover-wait mode. Unfortunately, we were not able to set the
primary back to communications-interrupted before onsite personnel
booted the secondary. The primary entered a mode I've never seen
before: potential-conflict and then subsequently flipped to another
mode I haven't seen before: shutdown (or something similar to that).
It then would hand out no addresses, and the secondary wouldn't hand any
out either. I was able to get the primary to hand out addresses, but I
Later, the primary started saying "peer holds all free leases" even
though there were plenty of free addresses left. The secondary was
still in the recover-wait period. This was service affecting, so I
stopped the secondary, deleted the leases file, set MCLT to 5 and
started the server. This repaired the problem, although I'm sure it was
a very bad idea. :)
I have since set MCLT back to the same number as our min/default/max
lease definitions (28800 or 8 hours on that particular server group).
Perhaps the trouble would have been minimized had MCLT been set to 600
or something, although I suspect there would still have been these
problems, they just would have resolved themselves perhaps without us
It seems that the server that is NOT in recover-wait should be able to
hand out the entire pool and merely notify the server that IS in
recover-wait that it has done so. Is that not the case?
David W. Hankins wrote:
> On Tue, Dec 12, 2006 at 11:14:51AM -0500, Darren wrote:
>> however, respond to inform. The primary will not hand out addresses
>> that should be handed out by the secondary while the secondary is in
>> recover-wait mode. This means it is possible to run out of addresses
> If the secondary is in recover-wait, hopefully your primary is in
> partner-down state (in the events you describe, anything else is
> either a bug or you're missing some events).
> In which case it will respond to all clients, and hand out free or
> backup leases alike (so long as STOS+MCLT has expired).
> So it's possible to run out of addresses, but only if all addresses
> are actively assigned, or if you run out of free addresses before
> STOS+MCLT expires (which may be before the secondary entered recover
> state, or may be approximately the same time).
> STOS is "Start Time Of Service" by the way.
>> What is the purpose of this recover-wait period?
> There's a danger of duplicate allocation of any leases the secondary,
> in your example, handed out just prior to going down. It's possible
> that neither the primary nor secondary would retain any information
> about these allocations.
> Since the surviving server (your primary) is the only one that has
> all the relevant state to avoid these duplicate allocations
> (STOS+MCLT), the secondary stays out of it. Hence, recover-wait.
More information about the dhcp-users