General questions about failover, config changes and restarting
james.dore at new.ox.ac.uk
Fri Mar 4 10:58:22 UTC 2016
> On 3 Mar 2016, at 21:15, sthaug at nethelp.no wrote:
>>> Sync is finished when both peers return to NORMAL mode. You need to
>>> restart both servers (just kill dhcpd and restart it) one after
>>> another or you're likely to run into issues with the pools not
>>> matching, and then you'll run into issues with not leasing IPs.
>> Any ideas why my servers need so many restarts? I try to leave them to settle for five or six minutes before trying again, but they just seem to stick with things like
> I am somewhat mystified about why you would need many minutes for a
> restart. Here's what we see on our failover pair, with around 100k
> leases and a couple of hundred pools:
> (master restarts, log on slave):
> Mar 3 08:50:00 slam dhcpd: peer dhcp1-dhcp2: disconnected
> Mar 3 08:50:00 slam dhcpd: failover peer dhcp1-dhcp2: I move from normal to communications-interrupted
> (a few seconds pass, and then)
> Mar 3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: peer moves from normal to normal
> Mar 3 08:50:11 slam dhcpd: failover peer dhcp1-dhcp2: I move from communications-interrupted to normal
> So a restart for us takes around 11 seconds.
> It should be noted that
> - The servers have plenty of memory, and hardware RAID with battery
> backup for the disks.
> - We use the "delayed ACK" facility.
> Steinar Haug, Nethelp consulting, sthaug at nethelp.no
Sorry, clearly I didn’t explain myself clearly enough: it’s not that the restart *itself* that takes minutes (that takes a couple of seconds) - it’s the period of time *after* the restart in which the peers are sat not synchronising or sitting in partner-down/recover/recover wait state and *before* I do another restart that kicks them back into sync.
Hope that makes sense!
More information about the dhcp-users