Restarting DHCP safely whilst avoiding partner-down state

Fri May 13 14:10:48 UTC 2016

Here we push out new configs to a partner pair from a central server.  The config for one of the partners contains an extra file (dhcpd.i.am.secondary).  Each of the partners runs this every minute (perl script):

  if ( -e "$spath/dhcpd.i.am.secondary" ) {
     exit if (localtime)[1] % 2 == 0;
  }
  else {
     exit if (localtime)[1] % 2 == 1;
  }

  ... continue (test new config, kill running server, start new one, etc)

So the config change, stop, start, etc, can only happen on odd minutes for one server and even minutes for the other.  As long as startup time is less than a minute (and it's much, much less than that) it all works smoothly.

...Steve

-- 
Steve van der Burg
Information Technology Services
London Health Sciences Centre
& St. Joseph's Health Care London
(519) 685-8500 ext 35559
steve.vanderburg at lhsc.on.ca

Chuck Anderson <cra at WPI.EDU> wrote:
> FWIW, we've been using the "kill" method for over a decade without any
> noticable side-effects (the default init.d scripts from RHEL 6
> (actually Scientific Linux 6) dhcp package).  We've never had to
> manually clean up a corrupted lease file.  We restart the services
> automatically on a 20 minute cycle, as needed.  We do one, then
> immediately do the other.  We do not wait to restart the other, and we
> do not monitor to see if failover has reconnected and rebalanced
> before restarting the other, but since we are SSH-ing into each server
> to do the restart, there might be enough of a built-in delay between
> restarting each server.
> 
> I don't know if a corrupted lease file would cause a failure to start
> the dhcp server, or if it would just go unnoticed, perhaps with a log
> message.  But like I said, we've never had a failure to start the
> server that was caused by a lease file issue.
> 
> Our script does test the config file before doing the restart:
> 
> #!/bin/bash
> echo -n "Testing DHCP configuration: "
> if sudo /etc/rc.d/init.d/dhcpd configtest; then
>         echo "Restarting DHCP"
>         sudo /etc/rc.d/init.d/dhcpd restart
> else
>         echo "FAIL: Not restarting DHCP"
> fi
> 
> which in CentOS 6 does the following:
> 
> exec=/usr/sbin/dhcpd
> configtest() {
>     [ -x $exec ] || return 5
>     [ -f $config ] || return 6
>     $exec -q -t -cf $config
>     RETVAL=$?
>     if [ $RETVAL -eq 1 ]; then
>         $exec -t -cf $config
>     else
>         echo "Syntax: OK" >&2
>     fi
>     return $RETVAL
> }
> 
> 
> On Fri, May 13, 2016 at 02:00:03PM +0100, Terry Burton wrote:
>> Hi,
>> 
>> I'm attempting to write a systemd .service file for my own uses of ISC
>> DHCP. However, if it can be made sufficiently generic then I would
>> intend to push this upstream or at least into distributions.
>> 
>> It needs to be suitable for managing failover pairs and I'm struggling
>> with the age-old problem of restarting a dhcpd instance. From reading
>> around there does not currently appear to be a method for restarting
>> dhcpd that is both *safe* and *useful* in such a setup.
>> 
>> 
>> Restarting with signals:
>> 
>> >From AA-01043 (Last Updated: 2015-03-18): "kill is the recommended
>> option, except where there is a high turnover of leases and the
>> production environment requires a high degree of reliability from
>> DHCP. In that case, we'd suggest that administrators consider using
>> OMAPI to control the daemon instead and to request a graceful
>> shutdown. The reason for this is that there is the slight possibility
>> that by using kill, administrators may stop dhcpd in the middle of
>> appending a lease to the leases file (in which case it may become
>> corrupted). This risk, while tiny, may be significant enough for some
>> administrators to prefer to use OMAPI instead."
>> 
>> In other words this is recommending that casual users take the risk
>> that their service might not recover after restarting. This may be
>> unlikely but it's still dangerous advice! The documentation does
>> indicates that a feature for "gentle shutdown" in response to a signal
>> was added in the 4.2 time frame and then subsequently removed:
>> 
>> "Added support for gentle shutdown after signal is received. [ISC-Bugs
>> #32692] [ISC-Bugs 34945]"
>> "Disable the gentle shutdown functionality until we can determine the
>> best way to present it to remove or reduce the side effects. [ISC-Bugs
>> #36066]"
>> 
>> Is it still the case that kill isn't suitable for production purposes?
>> 
>> 
>> With OMAPI:
>> 
>> You can cleanly shutdown via OMAPI "set state=2, etc." however the
>> effect on the failover protocol is less-ideal than with signals.
>> 
>> OMAPI shutdown will place the partner into "partner-down" state making
>> it become active for all leases in the failover pools which isn't
>> ideal when brief restarting an instance. Contrast this with the effect
>> of restarting an instance with kill which is to briefly place the
>> partner into "communications-interrupted" state from which it
>> immediate revert to "normal" once the restarted instance is available
>> (with auto-partner-down taking care for things if the instance does
>> not recover.)
>> 
>> 
>> Is there a safe way to restart DHCP that has minimal impact on the
>> failover protocol?
>> 
>> 
>> Thanks,
>> 
>> Terry
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org 
> https://lists.isc.org/mailman/listinfo/dhcp-users

 --------------------------------------------------------------------------------
This information is directed in confidence solely to the person named above and may contain confidential and/or privileged material. This information may not otherwise be distributed, copied or disclosed. If you have received this e-mail in error, please notify the sender immediately via a return e-mail and destroy original message. Thank you for your cooperation.