Restarting DHCP safely whilst avoiding partner-down state

Fri May 13 14:48:32 UTC 2016

I just tested it on a standalone box I use for testing to see if it brought down dhcp cleanly.  Give it a whirl on a test environment and let us know how it goes.

Matt Pallissard

On 05/13/2016 09:42 AM, Terry Burton wrote:
> On 13 May 2016 at 15:37, Pallissard, Matthew
> <matthew.paul at pallissard.net> wrote:
>> I just tested this and it seemed to work for me.
>
> Do you not find if you tail the log on the partner that it transitions
> to "partner-down" rather than "communications-interrupted"?
>
> Thanks!
>
>
>> #dhcpd4.service
>> [Unit]
>> Description=IPv4 DHCP server
>> After=network.target
>>
>> [Service]
>> Type=forking
>> PIDFile=/run/dhcpd4.pid
>> ExecStart=/usr/bin/dhcpd -4 -q -cf /etc/dhcpd.conf -pf /run/dhcpd4.pid
>> ExecStop=/path/to/shutdown/script.sh
>>
>> [Install]
>> WantedBy=multi-user.target
>>
>> #/path/to/shutdown/script.sh
>> #copy-pasted from
>> https://kb.isc.org/article/AA-00475/0/Sending-a-Server-Shutdown-Message-Via-OMAPI.html
>> #
>> #!/bin/sh
>>
>> #  uses omshell to connect to a dhcp server on the
>> #  local machine, create a control object, set the
>> #  state of the control object, and update the
>> #  running server to cause that server to shut down
>> #  gracefully.
>> #
>> #  per dhcpd man page, server shutdown can take
>> #  several seconds as the server waits for close
>> #  on all OMAPI connections.  Watching log files
>> #  for shutdown messages is recommended.
>>
>> omshell << END_OF_INPUT > /dev/null 2> /dev/null
>> server localhost
>> port 7911
>> key omapi_key Ofakekeyfakekeyfakekey==
>> connect
>> new control
>> open
>> set state=2
>> update
>> END_OF_INPUT
>>
>> echo "done sending shutdown instruction to dhcp server.."
>>
>> Matt Pallissard
>>
>>
>> On 05/13/2016 09:33 AM, Terry Burton wrote:
>>>
>>> On 13 May 2016 at 15:10, Steve van der Burg <steve.vanderburg at lhsc.on.ca>
>>> wrote:
>>>>
>>>> Here we push out new configs to a partner pair from a central server.
>>>> The config for one of the partners contains an extra file
>>>> (dhcpd.i.am.secondary).  Each of the partners runs this every minute (perl
>>>> script):
>>>>
>>>>   if ( -e "$spath/dhcpd.i.am.secondary" ) {
>>>>      exit if (localtime)[1] % 2 == 0;
>>>>   }
>>>>   else {
>>>>      exit if (localtime)[1] % 2 == 1;
>>>>   }
>>>>
>>>>   ... continue (test new config, kill running server, start new one, etc)
>>>>
>>>> So the config change, stop, start, etc, can only happen on odd minutes
>>>> for one server and even minutes for the other.  As long as startup time is
>>>> less than a minute (and it's much, much less than that) it all works
>>>> smoothly.
>>>
>>>
>>> Thanks Steve. We've also been pushing configs around then
>>> synchronously restarting servers back-to-back (without sleeping) for
>>> several years without incident.
>>>
>>> It makes me a little suspicious about whether just killing the process
>>> is indeed unsafe... But then maybe we've been lucky.
>>>
>>> As mentioned I want to improve on what distributions are currently
>>> doing so I'm deliberately setting the bar high and it would be great
>>> if ISC could provide a single, approved, safe shutdown/restart
>>> mechanism or describe what is required to develop such a mechanism.
>>> Unfortunately the detail of Bug #36066 (retracting support for gentle
>>> shutdown) isn't available as it would be interesting to see what
>>> issues were encountered with the previous approach.
>>>
>>>
>>>> Chuck Anderson <cra at WPI.EDU> wrote:
>>>>>
>>>>> FWIW, we've been using the "kill" method for over a decade without any
>>>>> noticable side-effects (the default init.d scripts from RHEL 6
>>>>> (actually Scientific Linux 6) dhcp package).  We've never had to
>>>>> manually clean up a corrupted lease file.  We restart the services
>>>>> automatically on a 20 minute cycle, as needed.  We do one, then
>>>>> immediately do the other.  We do not wait to restart the other, and we
>>>>> do not monitor to see if failover has reconnected and rebalanced
>>>>> before restarting the other, but since we are SSH-ing into each server
>>>>> to do the restart, there might be enough of a built-in delay between
>>>>> restarting each server.
>>>>>
>>>>> I don't know if a corrupted lease file would cause a failure to start
>>>>> the dhcp server, or if it would just go unnoticed, perhaps with a log
>>>>> message.  But like I said, we've never had a failure to start the
>>>>> server that was caused by a lease file issue.
>>>>>
>>>>> Our script does test the config file before doing the restart:
>>>>>
>>>>> #!/bin/bash
>>>>> echo -n "Testing DHCP configuration: "
>>>>> if sudo /etc/rc.d/init.d/dhcpd configtest; then
>>>>>         echo "Restarting DHCP"
>>>>>         sudo /etc/rc.d/init.d/dhcpd restart
>>>>> else
>>>>>         echo "FAIL: Not restarting DHCP"
>>>>> fi
>>>>>
>>>>> which in CentOS 6 does the following:
>>>>>
>>>>> exec=/usr/sbin/dhcpd
>>>>> configtest() {
>>>>>     [ -x $exec ] || return 5
>>>>>     [ -f $config ] || return 6
>>>>>     $exec -q -t -cf $config
>>>>>     RETVAL=$?
>>>>>     if [ $RETVAL -eq 1 ]; then
>>>>>         $exec -t -cf $config
>>>>>     else
>>>>>         echo "Syntax: OK" >&2
>>>>>     fi
>>>>>     return $RETVAL
>>>>> }
>>>>>
>>>>>
>>>>> On Fri, May 13, 2016 at 02:00:03PM +0100, Terry Burton wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm attempting to write a systemd .service file for my own uses of ISC
>>>>>> DHCP. However, if it can be made sufficiently generic then I would
>>>>>> intend to push this upstream or at least into distributions.
>>>>>>
>>>>>> It needs to be suitable for managing failover pairs and I'm struggling
>>>>>> with the age-old problem of restarting a dhcpd instance. From reading
>>>>>> around there does not currently appear to be a method for restarting
>>>>>> dhcpd that is both *safe* and *useful* in such a setup.
>>>>>>
>>>>>>
>>>>>> Restarting with signals:
>>>>>>
>>>>>> >From AA-01043 (Last Updated: 2015-03-18): "kill is the recommended
>>>>>> option, except where there is a high turnover of leases and the
>>>>>> production environment requires a high degree of reliability from
>>>>>> DHCP. In that case, we'd suggest that administrators consider using
>>>>>> OMAPI to control the daemon instead and to request a graceful
>>>>>> shutdown. The reason for this is that there is the slight possibility
>>>>>> that by using kill, administrators may stop dhcpd in the middle of
>>>>>> appending a lease to the leases file (in which case it may become
>>>>>> corrupted). This risk, while tiny, may be significant enough for some
>>>>>> administrators to prefer to use OMAPI instead."
>>>>>>
>>>>>> In other words this is recommending that casual users take the risk
>>>>>> that their service might not recover after restarting. This may be
>>>>>> unlikely but it's still dangerous advice! The documentation does
>>>>>> indicates that a feature for "gentle shutdown" in response to a signal
>>>>>> was added in the 4.2 time frame and then subsequently removed:
>>>>>>
>>>>>> "Added support for gentle shutdown after signal is received. [ISC-Bugs
>>>>>> #32692] [ISC-Bugs 34945]"
>>>>>> "Disable the gentle shutdown functionality until we can determine the
>>>>>> best way to present it to remove or reduce the side effects. [ISC-Bugs
>>>>>> #36066]"
>>>>>>
>>>>>> Is it still the case that kill isn't suitable for production purposes?
>>>>>>
>>>>>>
>>>>>> With OMAPI:
>>>>>>
>>>>>> You can cleanly shutdown via OMAPI "set state=2, etc." however the
>>>>>> effect on the failover protocol is less-ideal than with signals.
>>>>>>
>>>>>> OMAPI shutdown will place the partner into "partner-down" state making
>>>>>> it become active for all leases in the failover pools which isn't
>>>>>> ideal when brief restarting an instance. Contrast this with the effect
>>>>>> of restarting an instance with kill which is to briefly place the
>>>>>> partner into "communications-interrupted" state from which it
>>>>>> immediate revert to "normal" once the restarted instance is available
>>>>>> (with auto-partner-down taking care for things if the instance does
>>>>>> not recover.)
>>>>>>
>>>>>>
>>>>>> Is there a safe way to restart DHCP that has minimal impact on the
>>>>>> failover protocol?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Terry
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users
>