[kea-dev] On persistent lease file clean-up mechanism
nicolas.chaigneau at capgemini.com
Mon Nov 24 17:20:42 UTC 2014
Thanks for your feedback!
Both option 1 and 2 you describe have, in my opinion, one common issue: the server must halt processing the DHCP packets in order to rewrite the lease file, which can take some time when dealing with very large lease files.
The main point of the algorithm I proposed was to eliminate the file clean-up entirely, while ensuring data integrity (no loss of data).
This is achieved through a time-triggered rotation of the lease file.
I'll take an example, hopefully making myself clearer :)
With a lease time of 30 minutes, and a rotation of the lease file triggered every 30 min:
Server is started at t0.
At t0 + 1 min, lease L1 is allocated, and written in <lease file>.
At t0 + 29 min, lease L2 is allocated, and written in <lease file>.
At t0 + 30 min, the clean-up mechanism is triggered:
<lease file> is renamed to <old lease file> (which now contains leases L1 + L2).
<lease file> is recreated, empty.
At t0 + 31 min, lease L1 (not renewed) expires.
At t0 + 44 min (considering a renew-timer of 15 min), lease L2 is renewed.
At t0 + 60 min, the clean-up mechanism is triggered again:
<lease file> is renamed to <old lease file>. Previous <old lease file> is simply overwritten.
It's alright because any lease that is not expired has necessarily been renewed, hence is in <lease file>.
This property is verified as long as we ensure that the clean-up interval is at least equal to the maximum lease time value.
At any time if the server is restarted, all non-expired leases can be retrieved from the aggregation of <lease file> and <old lease file>.
Hence, no data loss, at the cost of a slightly more complicated start-up processing.
This is a simple scenario where the lease file is cleaned-up only on a regular time trigger.
I understand that combining this with other clean-up triggers, such as those you describe, will make things much more complicated... possibly with multiple lease files, and the need to decide when an older lease file can be safely deleted. I really haven't given this much thought, I only wanted to propose something that would be sufficient for my needs.
Anyway, thanks for reading and considering this :)
> Thanks for the write-up. In general, I feel it is a right direction. But I also agree it is not a trivial matter as the solution should not impair the DHCP service while the clean-up is in place. Also, the solution should prevent race conditions and data loss.
> As a result of the discussions on the mail list we will have to create a page on the trac wiki which will contain requirements for this feature as well as some little design. I will make sure such page is created once we have first thoughts exchanged.
> We haven't yet gone through the phase of discussing this feature because it was somewhat out of scope for the 0.9.1 release. So, all my thoughts here are preliminary and I may be wrong.
> I think we should consider other triggers for the clean-up apart from the timer. Though, the timer could be the first one to implement and other could wait. We could consider triggering clean-up when the lease expiration counter is X, or number of renewed leases is Y etc.
> I agree that the clean-up should be performed on a renamed file and the server should use a different/empty file for new lease updates while the clean-up is in place. However, there is a question how the data is combined when the clean-up is done. Keeping the compressed (cleaned-up) file separately from the currently used file is an option, but it means that each clean-up operation results in creation of one new file holding partial information about the leases. I believe that in many cases administrators would rather want lease information be stored in a single file, not multiple. Also, the subsequent clean-up operations would need to process all saved lease files, since each of them potentially holds some lease information which may have expired.
> I tend to think that the lease data aggregation should rather take place at the end of the clean-up phase and should result in having at most two
> files: one currently used by the server to write lease updates until the next clean-up is triggered. Another one, holding all remaining
> (historical) and cleaned-up lease information. Ideally, they should be combined into a single file, but appending the contents of the currently used file may have some impact on the service availability for a period of time when the append is being done.
> So, the option 1 would be...
> When the clean-up is triggered, the lease file used currently by the server is renamed. The new file is created for the server to use. The renamed file contents are appended to the lease file holding historical data. Then, clean-up is performed on this file with appended information. The server keeps using the same lease file (other than the one on which the clean-up was performed) until next clean-up comes.
> And option 2 ...
> There is only one lease file. When the clean-up is triggered this file is renamed and the new file is created for the server to write new lease updates. When the clean-up is completed on the renamed lease file, the DHCP service is ceased for a short period of time when the most recent lease updates (gathered during clean-up) are appended to the cleaned-up file and the server switches to use this file.
> Both solutions are similar, but second option results in use of a single lease file (which I personally prefer). The first option has an advantage that there is no need to copy over the most recent information to the single lease file.
> I am not sure I fully understand the proposal of keeping the clean-up interval "at least equal to maximum lease time value". Are you proposing that clean-ups are triggered no more frequently than maximum valid-lifetime that may occur for any lease in a lease file? Why server restart is associated with the lease file cleanup in this context? The clean-up should not be triggered until the server starts up and loads lease information from the existing lease files, at which point the server records all updates to leases in an available lease file. And the clean-up should ensure that the server always has a lease file to write to.
> On 11/19/14 15:27, Chaigneau, Nicolas wrote:
> > Hello,
> > I'd like to discuss the topic of cleaning-up the lease file in the case of a "memfile" back-end.
> > That's probably something you've already thought of, but I didn't find specific implementation discussed of on the mailing list archive.
> > Since it's still yet to be implemented, I'd like to share my 2 cents on the subject.
> > This is not a trivial matter: in the context of high availability and very large lease files involved, the situation of a server being unresponsive for several seconds while it handles rewriting the lease file is not something we can afford.
> > So here's how I would do it:
> > - define a lease file clean-up interval; for instance 1H.
> > - every 1H, trigger the clean-up mechanism:
> > - rename <lease file> to <old lease file> (<lease file>~ or any other naming convention)
> > - recreate an empty <lease file>
> > - close and reopen Kea's file handle on <lease file>
> > - at server startup, load in memory both <lease file> and <old lease
> > file> (aggregating their content)
> > If we ensure that the clean-up interval value is *at least* equal to the maximum lease time value, this guarantee no data can be lost during a server restart: any lease not expired is necessarily either in the current or old lease file (possibly both).
> > Both <lease file> and <old lease file> being in the same directory, hence on the same filesystem, the rename operation is immediate. The file's inode and content are unchanged. Only the file name is modified.
> > This probably would not be suitable for everyone's needs, so maybe this could be an optional mechanism.
> > What do you think ?
> > Regards,
> > Nicolas.
This message contains information that may be privileged or confidential and is the property of the Capgemini Group. It is intended only for the person to whom it is addressed. If you are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate, distribute, or use this message or any part thereof. If you receive this message in error, please notify the sender immediately and delete all copies of this message.
More information about the kea-dev