Restarting DHCP safely whilst avoiding partner-down state

dave c dhcp at gvtc.drakkar.org
Fri May 13 18:25:05 UTC 2016


Are folks forgetting that the default action of the kill command is to send the TERM signal? 
That signal should tell the daemon to do an orderly shutdown, close the leases file cleanly, 
send whatever signals to the partner that are required and then exit when everything is ready.

All the concern I am seeing below would be true if folks were issuing a kill -9 to stop the 
service. At which point the leases file would get potentially corrupted.

As for a journal for the leases file, that could be created, but then it would break the methods 
currently used to monitor and process the leases file. Today, it seems to append each new lease, 
so it's always adding to the end of the file but then once an hour it will save the active 
leases file that was just being appended by renaming it and write out a brand new file from 
scratch of all active leases from memory. I learned the hard way what happens when DHCPD has RW 
access to the leases file but not create new access to the enclosing directory... the leases 
file will grow forever and never be rewritten :( Or at least grow until the next restart as the 
leases file gets rewritten as part of the startup process while the daemon is still running as 
root before it does it's priv shedding to the dhcp user. I had a cron restarting my daemon until 
I realized what I had allowed to happen :)

So it sounds like a lot of angst over nothing... a TERM signal is defined as closing all 
processes and threads cleanly, writing out the last bits of data and stopping things in an 
orderly fashion. So seems that issuing kill {dhcpd pid} would be perfectly acceptable to close 
things down even in a partner scenario.

What I don't yet have a clear handle on is the timing considerations of a partner system being 
manipulated by external command and control processes e.g. adding a new vlan definition to both 
servers and restarting them at the same time or within seconds of each other.

Do I need to incorporate a delay as was done by one of the earlier posters on this thread or is 
that precaution an unneeded complication? What happens when both partners are restarted at the 
same time? Does it delay the startup and cause DHCP responses to be ignored until they work 
things out among themselves?

I am seeing reports in this thread from both extremes... one who forces a delay with even/odd 
minute detection and another who seems to not care how closely in time the two restart.

That's the question I believe we should be caring about here...

Thanks,
Dav

On 5/13/16 13:02, Chuck Anderson wrote:
> On Fri, May 13, 2016 at 04:02:23PM +0100, Terry Burton wrote:
>> On 13 May 2016 at 15:57, Chuck Anderson <cra at wpi.edu> wrote:
>>> On Fri, May 13, 2016 at 03:23:25PM +0100, Terry Burton wrote:
>>>> On 13 May 2016 at 14:22, Chuck Anderson <cra at wpi.edu> wrote:
>>>>> I don't know if a corrupted lease file would cause a failure to start
>>>>> the dhcp server, or if it would just go unnoticed, perhaps with a log
>>>>> message.  But like I said, we've never had a failure to start the
>>>>> server that was caused by a lease file issue.
>>>>
>>>> In our experience leases files corrupted by other means can cause a
>>>> failure to start. I don't recall whether that was due to mere
>>>> truncation though...
>>>
>>> There is also the -T parameter to test the lease file:
>>>
>>>        The -T flag can be used to test the lease database file in a similar way.
>>>
>>> It might be a good idea to also use this test before restarting.
>>> While it won't fix a corrupted lease file, it may prevent you from
>>> losing all DHCP service due to a failure to restart.
>>
>> I think this will require the leases file to be closed at the point of
>> testing, i.e. the daemon has already exited.
>>
>> For the more general issue with systemd verifying the configuration
>> see: https://lists.freedesktop.org/archives/systemd-devel/2016-May/036481.html
>
> Is there a way to signal dhcpd to write out the lease file so it can
> be checked?
>
> It seems that dhcpd needs a journaling mechanism similar to named,
> where it writes the changes to a .jnl file and periodically
> incorporates those changes into the main zone file.
> _______________________________________________
> dhcp-users mailing list
> dhcp-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/dhcp-users
>

-- 
Dave Calafrancesco


More information about the dhcp-users mailing list