Secondary server in failover fails to come out of recover state

Tue Apr 30 20:37:50 UTC 2013

Can't see anything in the config that is suspect to be honest.

I assume you have a 'failover peer "dhcp";' statement inside each pool
statement? (that's why I asked for full config)

Personally I would change mclt to 3600 and spilt to 128 (there are
only a handful of situations where I would see split set to 0 or 255
the main one being when you have branch networks with a local DHCP
server and need a centralised "backup" DHCP incase the branch fails).

You could also try changing the port and peer port numbers (maybe
something >1024?) just on the off chance that it is being
blocked/terminated by something else, and it would be worth getting
packet captures going on each system to see exactly what comms are
happening between the two during the startup.

The only other thought I have is that it could be something to do with
the patch you have wrote. I'm not sure what impact this has had on the
data being written out to the leases file or being synchronised (you
might see this in a packet capture) but it could be choking on
something in that data that wasn't originally meant to be in there.

If you do change the split value then I would also flip the order of
domain-name-servers on the secondary server to load balance across the
two DNS servers, rather than dumping all queries on the first DNS
server.

Steve