Restarting DHCP safely whilst avoiding partner-down state

Chuck Anderson cra at WPI.EDU
Fri May 13 13:22:34 UTC 2016


FWIW, we've been using the "kill" method for over a decade without any
noticable side-effects (the default init.d scripts from RHEL 6
(actually Scientific Linux 6) dhcp package).  We've never had to
manually clean up a corrupted lease file.  We restart the services
automatically on a 20 minute cycle, as needed.  We do one, then
immediately do the other.  We do not wait to restart the other, and we
do not monitor to see if failover has reconnected and rebalanced
before restarting the other, but since we are SSH-ing into each server
to do the restart, there might be enough of a built-in delay between
restarting each server.

I don't know if a corrupted lease file would cause a failure to start
the dhcp server, or if it would just go unnoticed, perhaps with a log
message.  But like I said, we've never had a failure to start the
server that was caused by a lease file issue.

Our script does test the config file before doing the restart:

#!/bin/bash
echo -n "Testing DHCP configuration: "
if sudo /etc/rc.d/init.d/dhcpd configtest; then
        echo "Restarting DHCP"
        sudo /etc/rc.d/init.d/dhcpd restart
else
        echo "FAIL: Not restarting DHCP"
fi

which in CentOS 6 does the following:

exec=/usr/sbin/dhcpd
configtest() {
    [ -x $exec ] || return 5
    [ -f $config ] || return 6
    $exec -q -t -cf $config
    RETVAL=$?
    if [ $RETVAL -eq 1 ]; then
        $exec -t -cf $config
    else
        echo "Syntax: OK" >&2
    fi
    return $RETVAL
}


On Fri, May 13, 2016 at 02:00:03PM +0100, Terry Burton wrote:
> Hi,
> 
> I'm attempting to write a systemd .service file for my own uses of ISC
> DHCP. However, if it can be made sufficiently generic then I would
> intend to push this upstream or at least into distributions.
> 
> It needs to be suitable for managing failover pairs and I'm struggling
> with the age-old problem of restarting a dhcpd instance. From reading
> around there does not currently appear to be a method for restarting
> dhcpd that is both *safe* and *useful* in such a setup.
> 
> 
> Restarting with signals:
> 
> >From AA-01043 (Last Updated: 2015-03-18): "kill is the recommended
> option, except where there is a high turnover of leases and the
> production environment requires a high degree of reliability from
> DHCP. In that case, we'd suggest that administrators consider using
> OMAPI to control the daemon instead and to request a graceful
> shutdown. The reason for this is that there is the slight possibility
> that by using kill, administrators may stop dhcpd in the middle of
> appending a lease to the leases file (in which case it may become
> corrupted). This risk, while tiny, may be significant enough for some
> administrators to prefer to use OMAPI instead."
> 
> In other words this is recommending that casual users take the risk
> that their service might not recover after restarting. This may be
> unlikely but it's still dangerous advice! The documentation does
> indicates that a feature for "gentle shutdown" in response to a signal
> was added in the 4.2 time frame and then subsequently removed:
> 
> "Added support for gentle shutdown after signal is received. [ISC-Bugs
> #32692] [ISC-Bugs 34945]"
> "Disable the gentle shutdown functionality until we can determine the
> best way to present it to remove or reduce the side effects. [ISC-Bugs
> #36066]"
> 
> Is it still the case that kill isn't suitable for production purposes?
> 
> 
> With OMAPI:
> 
> You can cleanly shutdown via OMAPI "set state=2, etc." however the
> effect on the failover protocol is less-ideal than with signals.
> 
> OMAPI shutdown will place the partner into "partner-down" state making
> it become active for all leases in the failover pools which isn't
> ideal when brief restarting an instance. Contrast this with the effect
> of restarting an instance with kill which is to briefly place the
> partner into "communications-interrupted" state from which it
> immediate revert to "normal" once the restarted instance is available
> (with auto-partner-down taking care for things if the instance does
> not recover.)
> 
> 
> Is there a safe way to restart DHCP that has minimal impact on the
> failover protocol?
> 
> 
> Thanks,
> 
> Terry


More information about the dhcp-users mailing list