[kea-dev] On persistent lease file clean-up mechanism
nicolas.chaigneau at capgemini.com
Thu Nov 27 09:57:55 UTC 2014
Thanks again for your input.
That's true, there would be limitations [to my initial proposal] and it would not be suitable for most configurations. In my specific case, I need all leases to have the same duration so that's not an issue.
But maybe it's not really needed at all.
My initial concern was that the leases file clean-up implementation should not be detrimental to the server's ability to handle new requests.
In my experience with dhcpd, we encountered very significant impact each time (once per hour I believe) the lease file clean-up occurred.
From a previous discussion (in which Tomek explained that Kea was currently single-threaded, single-process), I assumed that it would not be possible to manage the lease file clean-up in background. Obviously I assumed wrongly, which is good news. :)
In that case, what you describe sounds solid and my initial proposal should indeed be unnecessary.
I also like option 2 you described.
It's not entirely accurate that it results in the use of a single lease file, though, because the server may be forcefully stopped while clean-up is in process. If that happens, the server must be able to recover from the main lease file and the renamed lease file.
Regarding broken clients (some of which can send *much* more than 1 request/s) and purposefully malicious attacks on the DHCP server: in my experience, when such traffic reaches the server, it's already too late to do something meaningful. Protection mechanisms have to be applied upstream. IMO it's better to temporarily block a faulty device (which can be done through firewall rules) than attempt to be nice and risk the service to be impaired for everyone.
But those considerations are beyond the scope of Kea, really :)
> Thanks for the example. It clarified a lot.
> IMO, this approach has a major flaw that the clean-up timer is effectively driven by the lifetime of the longest possible lease. In case, if the DHCP server is serving leases to clients in multiple subnets and one of the subnets has long valid-lifetimes (in cable networks customers could get leases for 7 days so as their IP address at home doesn't vary) the lease file would rotate every ~7 days and the lease file could grow to a significant size if the valid-lifetimes for other subnets were significantly lower.
> There is also some complexity in how the server determines the longest lease time. Checking valid lifetimes for all configured subnets is not sufficient because clients may have already got leases with different valid lifetimes and later the server could have been reconfigured. So, the configuration mechanism would need to iterate over all leases in the lease database and check their valid lifetimes. If any of the lifetimes is greater than the highest valid-lifetime across all configured subnets, this lifetime would need to be used as a timer interval. But, this causes an operational problem where the server's administrator doesn't have a clue what the rotation timer is going to be, because it depends on dynamic data in the lease database. Additionally, if the lease with highest valid lifetime expires or is released, the timer should be recalculated which has its own problems and complexity.
> One more thing to keep in mind is that the client can renew at any time before the lease expires, e.g. as a result of the machine reboot. A broken client can even send a Request every 1s, even though T1 is set to 1800s. Each lease update would be recorded in a lease file, and would effectively cause a lease file growth. This could even be used as an attack vector to quickly fill the disk space reserved for the lease file (Though, I admit that there are many similar ways to hammer DHCP server anyway). Having said that, the lease file compression (clean-up) is something that must be done anyway, so as the redundant information (e.g. dozens of Renews from the same client) is removed from the lease file. Relying on the repeating pattern of clients sending Renews at the predictable time may turn against us pretty quickly as clients tend to do odd things.
> Regarding the option 1 and option 2, I described. The server doesn't have to wait for the lease file to be re-written. The only time when the server needs to wait is when the file it is using is renamed, and new file is created for the server to use. The rest of the operation should be performed in background (separate thread or process). Note, that the server doesn't need lease information from the file being cleaned up as long as it is not restarted. This is because the server has all the information in memory. So, the server can use an empty lease file while the clean-up is performed.
> When the clean-up is being performed, the server still serves clients and writes lease updates to the newly created (empty file). In case of option 2, the server will additionally wait when the clean-up is done - it will wait for the data from the currently used lease file being
> *appended* to the cleaned-up lease file. Append operation (e.g. file1 >>
> file2) should be pretty fast as there is no data interpretation, just a simple IO operation.
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
More information about the kea-dev