config help - scaling problem

sthaug at nethelp.no sthaug at nethelp.no
Wed Mar 3 19:16:10 UTC 2010


> For those of you using the failover protocol, have you had outage because of it in the past? Do you consider it reliable now? Does the benefit of failover outweigh the risk of outage for you? We've had so few hardware failures that protocol problems, albeit on very old versions compared to now, accounted for 5 times the downtime or more in comparison. Perhaps this is one of those YMMV and mine did?

We started using dhcpd in a failover pair around version 3.1.0, in March
2008. We have had some failures where communication was interrupted
between the servers, and they didn't reconnect after communication was
reestablished. I reported these problems on the dhcp-users list, and in
November 2008 I had a reproducible case of this problem,

https://lists.isc.org/pipermail/dhcp-users/2008-November/007433.html

Along the way we picked up various patches. In May 2009 we installed the
last important patch, and since then our failover pair has been rock
solid. As of now we are running 4.1.1 (release version) with some minor
local patches. The failover/reconnect problems we've seen have all been
solved, as has the memory leak problem that plagued us for a while. I
can wholeheartedly recommend 4.1.1 as an extremely stable version (even
if you're running failover).

Note that we're using the delayed fsync patch that has been mentioned on
this list. It works extremely well for us, YMMV.

Even when your DHCP servers are stable and solid: Monitor them! A basic
NMS which regularly pings your servers, and also checks DHCP INFORM on
each server to verify basic DHCP functionality, is vital.

Steinar Haug, Nethelp consulting, sthaug at nethelp.no



More information about the dhcp-users mailing list