DHCP peer failure and pool exhaustion...

Mon Sep 9 18:10:56 UTC 2013

That sounds right. The DHCP failover protocol goes through a lot of effort so that both servers have a sense of where its peer is at the time of a failure... but then panics at the thought of actually taking over in an automated way. Perhaps this is a good time to put in a plug for more debate about this feature as DHCP gets merged into the next version of bind.

Randall Grimshaw rgrimsha at syr.edu
________________________________________
From: dhcp-users-bounces+rgrimsha=syr.edu at lists.isc.org [dhcp-users-bounces+rgrimsha=syr.edu at lists.isc.org] on behalf of Gregory Sloop [gregs at sloop.net]
Sent: Monday, September 09, 2013 1:50 PM
To: Users of ISC DHCP
Subject: DHCP peer failure and pool exhaustion...

I just had a case like the following and I'm trying to understand it
and resolve it long-term.

I think I know what happened, but want to see if it makes sense.

---
Two DHCP servers, peer'd.
[4.1-R4 under Ubuntu 12.04 x64 if it matters]

The relevant pool is quite heavily utilized. [Only a few free
addresses in the whole pool of, say 150 when at full load, which
happens often.]

In this case, one of the peers went down and we didn't realize it.

[I understand this is a primary problem, but I'm using peer/fail-over
to help resolve a situation where a DHCP server fails and I can't fix
the down box, or am unavailable etc.]

Monday AM, the peer eventually gets to this point:
"...peer holds all free leases" and stations can't get an IP address.

So, I think what's happened is:

 1) That the peer is "communications interrupted" [not peer down -
 which I think would "fix" it.]

2) The peer went down during a "low" use period and many of the
"normal" client had NO assigned address when the peer went down.

3) Eventually with only half the large [at the time] available pool of
addresses, and when we hit a high use period Monday AM, [which would
deplete the *whole* pool to nearly zero free addresses] the remaining
peer had no free addresses.

4) It couldn't re-balance because the peer was down, and it wasn't set
in partner down state.

[The "up" peer had, say 60 addresses (half the pool) at low use, when
the other peer went down. At high use, it exhausted it's 60 addresses
in the pool, but couldn't re-balance to get the other 60, and we ran
out of addresses with 60 still free, but tied up in the down peer.]

Having the peer come back up will fix things, or having the down peer
set in partner down state would also fix things as the "up" peer could
grab all the leases.

---
So, does that sound right?

If so, is there any way to "automagically" put the peer in partner
down state if it's not able to be contacted in X amount of time?

...
Or, perhaps I should simply say - I don't want the DHCP server to end
up without available addresses to lease, if the peer goes off-line and
I'm not able to do some manual process to intervene. That's my goal.

With that goal in mind, what's the best way to accomplish it?

-Greg

--
Gregory Sloop, Principal: Sloop Network & Computer Consulting
503.251.0452 x121 Voice | 503.251.0452 Fax
www.sloop.net
mailto:gregs at sloop.net

_______________________________________________
dhcp-users mailing list
dhcp-users at lists.isc.org
https://lists.isc.org/mailman/listinfo/dhcp-users