General questions about failover, config changes and restarting

James Dore james.dore at new.ox.ac.uk
Fri Mar 4 10:58:22 UTC 2016


> On 3 Mar 2016, at 21:15, sthaug at nethelp.no wrote:
> 
>>> Sync is finished when both peers return to NORMAL mode. You need to
>>> restart both servers (just kill dhcpd and restart it) one after
>>> another or you're likely to run into issues with the pools not
>>> matching, and then you'll run into issues with not leasing IPs.
> ...
>> Any ideas why my servers need so many restarts? I try to leave them to settle for five or six minutes before trying again, but they just seem to stick with things like 
> 
> I am somewhat mystified about why you would need many minutes for a
> restart. Here's what we see on our failover pair, with around 100k
> leases and a couple of hundred pools:
> 
> (master restarts, log on slave):
> Mar  3 08:50:00 slam dhcpd: peer dhcp1-dhcp2: disconnected
> Mar  3 08:50:00 slam dhcpd: failover peer dhcp1-dhcp2: I move from normal to communications-interrupted
> ...
> (a few seconds pass, and then)
> Mar  3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: peer moves from normal to normal
> Mar  3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: I move from communications-interrupted to normal
> 
> So a restart for us takes around 11 seconds.
> 
> It should be noted that
> - The servers have plenty of memory, and hardware RAID with battery
> backup for the disks.
> - We use the "delayed ACK" facility.
> 
> Steinar Haug, Nethelp consulting, sthaug at nethelp.no


Sorry, clearly I didn’t explain myself clearly enough: it’s not that the restart *itself* that takes minutes (that takes a couple of seconds) - it’s the period of time *after* the restart in which the peers are sat not synchronising or sitting in partner-down/recover/recover wait state and *before* I do another restart that kicks them back into sync.

Hope that makes sense!

Cheers,
James



More information about the dhcp-users mailing list