No Free Addresses?

Tue Dec 14 21:12:51 UTC 2010

Hi Bob,
no reason to get snarky with Simon, he's trying to help and he does seem to be offering quality ideas for your situation, while you seem to be certain that the way dhcpd delivers failover is insufficient (or doesn't deliver "redundancy"). You are welcome to your opinion, but as you've stated, failover is understood, accepted, and working for a great many others. To the end you're looking for, true redundancy without manual intervention, several options have been presented. I'll list a few:

1) Stop using the failover protocol. We did. We have 3 dhcpd boxes that get pointed to by relay agents. It works fantastically but comes with the overhead of not knowing which dhcpd will answer (not that we care), so we have to collect logs and look at a central server instead of an individual log. And config changes have to be presented to all the servers, but there's plenty of methods to syncing configs. If you're using private numbers then you should have no trouble offering up plenty of addresses in multiple pools on each of your servers thus providing you the ability to suffer multiple failures. We made the move from failover to standalone duplicate host configuration because that's what I consider redundancy. Any two servers can suffer a failure and all of the clients are still handled by the remaining box.

You said this option isn't attractive to you because it would present the same problem as you experienced - no free leases. I disagree with that assessment entirely. Just because you ran into a problem with failover, don't assume dhcpd can't hand out addresses arbitrarily. The error you posted was "peer holds all free leases", not "no free leases" available.

2) Allocate 2x the maximum number of leases required for each subnet. It sounds like you did this and something didn't go right. The failover protocol has had many bugs over the last few years and has made leaps and bounds in reliability and features, but it's quite possible you ran into a bug. You've asked many times how many addresses are enough. The answer is 2x max. If it didn't work with that configuration, packet sniffing and logs are the only places you can determine why. Again, as you've stated, failover configured with 2x max is working for others.

3) Develop a method (perl script perhaps?) whereby when a member of the failover protocol cannot reach a peer, he tells himself to "automatically" go into partner-down state. Just because the dhcpd doesn't do this by default, or the fact that a peer (or self) goes down as part of the protocol is neither flawed nor a weakness of dhcpd. Perhaps I'm hearing wrong, but it sounds like you are unhappy with this aspect of failover design - it was built that way intentionally is my understanding, and I can see why for a variety of possible network configurations and situations. ISC software is used by much larger and more complex implementations than I (and probably most) use it for, and was designed to support complex configurations where failover probably provides the most value. All of my company's past failures with dhcpd were related to the failover protocol and a failure of our engineers/admins to understand the protocol and implement appropriate processes (and scripts) to enable what they wanted to have happen, happen automatically.

The statement "Putting a server into partner-down requires manual interaction." isn't true. It does require that you put in place some other process if you want partner-down to occur without manual intervention or haven't tweaked your timers appropriately. It seems the protocol designers decided that a network admin would know if his servers went down and wanted him to have the choice of whether to script that process or manually place a server in partner-down or do nothing. It seems you are in the do nothing boat, which is fine, but that doesn't mean that failover doesn't offer redundancy. It means that it isn't turnkey, yes I'll admit, but that makes it far from useless. I firmly believe that if you'd had a script (or manually) placed your working server in partner-down, things would have worked. If that's true, then redundancy is only a script away!

"Needing partner-down is an indication that you don't actually have redundancy." I'm guessing the designers imagined that a network admin would know whether he needed to put a peer in partner-down or not, based on many factors including timers, pool size, configuration, network layout, etc. Placing partner-down only affects the ability to handle peer owned leases and is a sideline to redundancy depending on many factors.

This list, myself included, would like to help, and perhaps we're running into miscommunications that are raising tensions. Maybe it would be best to offer your definition of redundancy and state what it is you'd like dhcpd to do? For many, the ability for a daemon on a server to serve addresses to clients with or without peers available is redundancy, albeit sometimes dependent on external scripting for monitoring network health. Those that don't have double the max client IP addresses in a pool, perhaps because their addresses are public and a finite resource, adjust timers and either manually intervene when their servers go down or develop methods to periodically test network connectivity and adjust the servers on the fly as their policy deems appropriate. That can be automatic or manual, and is redundant either way by my definition of redundancy.

As to why what happened to your dhcpd occurred, I'd bet that one server ended up servicing more than it's fair share of leases and then tanked. At that time, the pool is still a size of 500, 147 of which appeared to be free. But those 147 may well have been owned by the down peer (this must be the case as indicated by your log error). Did you ever post your failover config for review? At one point you noted that the config directive split 128 would allocate half a pool to each server. Did you set that directive only on the primary, or also on the secondary? Failover could have acted inappropriately, or perhaps a setting was missed or overlooked?

Really, based on your discussions to this list utilizing the failover protocol doesn't sound ideal for you. I'm not advocating that you only run one server, but redundancy can be had without invoking failover.

--Marc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20101214/fafeb350/attachment.html>