Failover woes with restarts of 4.0.0

Dean, Barry B.Dean at liverpool.ac.uk
Tue Jan 6 11:14:53 UTC 2009


I run two pairs of DHCP servers using version 4.0.0 on Solaris 10 x86, one pair called dhcp-pair1 and the other dhcp-pair2, not exciting names I know!

I have been migrating subnets to these new pairs for a while now and have over half the IP ranges served by them now.

What I am seeing now is a problem when we make a change. 

The procedure we use is to edit the dhcpd.conf, check it with "-t", then restart the daemon on the master, copy the config over to the slave (scp), then restart (via ssh) the slave.

When we used to do this with version 3(something) we sometimes ended up with the servers out of step and problems with leases (not sure of the details, happened before I started!)

My logic was that the failover communication was being interrupted by restarting the servers so quickly after each other, so with the new 4.0.0 set-up I introduced a 10 second pause between restarts of master/slave.

All has been working fine until 3pm on the last day I was in work before Christmas (typical!)..

Soon after doing an update, both servers stopped answering requests. They were doing housekeeping and processing incoming packets (seen with truss), but not sending any ACKs etc.

Why?

Is it that the 10 seconds between restarts sets up the servers for a fall by having them on different config for a short period?

Ideally we should be using OMAPI I know, but as we don't, what is the best procedure when dhcpd.conf changes (like adding fixed allocations) that causes the least disturbance.

If there is one thing I have learned about the ISC DHCP server is that it works best if you leave it alone!

---------------
Barry Dean
Networks Team
Computing Services Department
Liverpool, L69 3BX
Email: B.Dean < at > liverpool.ac.uk, Web: http://pcwww.liv.ac.uk/~bvd/




More information about the dhcp-users mailing list