On lease expiration

Tue Jul 22 17:58:16 UTC 2014

On 22/07/14 10:37, Stephen Morris wrote:
> On 08/07/14 21:36, Tomek Mrugalski wrote:
> 
> Taking the points one by one:
> 
> 1. "in fact it is quite complex, as it will need to implement its own
> configuration parser."
> 
> I don't know exactly how the standalone configuration parser has been
> implemented, but if adding configuration items is a big hurdle, we
> need to schedule some time to re-engineer it: it will provide a
> significant impediment to extending Kea in the future.
You won't get any objections from me on this one. The current parser is
a nefarious contraption that should be eradicated. It was just something
I was told to implement "just like auth" and didn't have enough
assertiveness to point out radical differences between configuration
data in auth and dhcp servers.

> 2. "First, it would be inherently incompatible with memfile, our
> default backend."
> 
> This is probably the main drawback. Unless memfile is significantly
> re-engineered, it is incompatible with a multi-process Kea.  (This is
> discussed below.)
No, I think memfile should be left as is. What we should do instead is
to provide a way for DB backend to report its capabilities. For now, it
would be "multi-process capable", but in the short term there may be
other capabilities (e.g. "context-capable").

> 3. "Second, stand alone process would call all external actions in a
> separate process, so users writing hook libraries will need to develop
> multiprocess capabilities, even if we have only one Kea server."
> 
> That is not really a major problem unless users want some application
> that needs to link lease expirations with other lease events and needs
> to keep the details in memory while it runs. Even then, the difficulty
> of implementation depends on what needs to be done.
Right now implementing hooks library is a relatively straightforward
process. When we tell developers than they need to have
multi-threading/multi-process aware hooks, we'll raising the bar
significantly. This is some I'd like to avoid.

> 4. "Third, I have no idea how to implement failover with stand alone
> house keeper."
> 
> Consider it a research project :-)
Is it a one with a dedicated time budget and reasonable financing? Then
I'm all in :-)

> 5. "Finally, this is a new process that needs to be maintained (new
> makefiles, new man page, new binaries to install, new documentation
> etc.)."
> 
> Compared with writing the code, this overhead is small.  We should not
> let it deter us from any particular solution.

>> b) implement house-keeping as part of the existing process. There 
>> would be a configuration parameter that would tell the server to 
>> trigger the routine every X seconds. The server would call select()
>> for at most X seconds, run house-keeping and then call select()
>> again. This approach seems very flexible. If you really need a
>> notification the very second when the lease expires, set X to 1. If
>> you're experiencing an event where extra performance is needed, set
>> it to some large value, so expiration will be checked upon the next
>> day. Of course, there are tradeoffs. The longer value set means
>> that, although the procedure will not happen that often, each
>> house-keeping will take longer as it will have more leases to 
>> process. The benefit of this approach is that logging, hooks, ddns
>>  happen in the same process, so no multi-process hassles for us and
>>  for hook lib developers. Also, failover does not become more 
>> complicated. It is easy to implement as the whole design of 
>> existing code is ready for this (calculating timeout + calling 
>> select() with that timeout).
> 
> My only concern with this is that this means that expiration is
> competing with the allocation of leases, something that will limit
> performance. Also, if all leases that have expired in the interval are
> processed together, there is a risk of "bursty" performance - the
> server refusing to handle new leases for a period because it is
> processing expirations.
> 
> 
>> c) that's really b) + extra capability of being able to disable 
>> house-keeping, at least during normal operation. House-keeping 
>> would be called during startup and shutdown. So if you really need
>>  the absolute max performance and intermediate pauses needed for 
>> house-keeping are a problem for you, use this. You will get the 
>> lease expiration notification after some time ("Least A expired Y 
>> minutes ago"), but that's ok if max performance is your goal.
> 
> Not quite.
> 
> If an expired lease is reused, you would need to do the lease
> expiration processing prior to reassigning the lease. (An alternative
> would be to log the expired lease to a journal file and have the file
> included in the lease expiration housekeeping.)
This is something we do already. When the allocation picks a lease
candidate and discovers that the lease is expired, it is reused.
That is working fine so far. If it isn't broken, don't fix it.

> On reflection, this consideration impacts all solutions to the lease
> expiration problem.  If we were to go for an external process to do
> the house keeping, the DHCP server process could not reuse expired
> leases - it would have to wait until they were cleaned up by the
> housekeeper.
So we're talking about race conditions here. I can imagine displeased
users who discover that the server claims to be out of leases, while
there are expired leases available.

>> d) That's b) + an optimization. When assigning or updating a lease,
>> we can keep the timestamp to the shortest expiration event and
>> dynamically set X to that value. So if there are no leases, we'll
>> select() for MAX_UINT. If there are 1000s of leases, we'll select()
>> until the shortest (or oldest) one expires. This is how expiration
>> is implemented in Dibbler. Seems to be working fine.
> 
> Does Dibbler keep the leases ordered by expiration time? For Kea, we'd
> have to add an index on that field to the lease database to allow
> selection of the lease with the next expiration time.
No, Dibbler doesn't. A new index would be useful if we wanted to pick
the the lease that will expire first. We'll need to have a call similar
to getAllExpiredLeases(). It would be helpful if we could get all leases
in the expiration time order. The could would process leases one by one
until it hits a lease that is not expired yet.

>> e) Over time, we will also need to implement a new command: 
>> lease-expire or db-cleanup. It would add extra capability for 
>> admins who tweaked something in the DB and want Kea to process the
>>  changes. Also, some admins would possibly want to use their 
>> external scripts to do lease expirations, e.g. during low traffic 
>> at 3am.
>>
>> Actual house-keeping routine implementation is lease database 
>> specific. For SQL-based backends, we'll probably implement a query
>>  that returns expired lease. For memfile, we'll either add an index
>>  or will have to traverse the leases. It would be good if we could
>>  do that using bisection. I haven't looked at what's available in 
>> multi-index containers.
>>
>> Thoughts? Comments?
> 
> Ultimately, the best method for lease expiration processing depends
> whether we want to expand Kea beyond its single-thread implementation
> to embrace parallel processing; and if we do, what model we use.
I my opinion there's no such thing as a the "best" method. It will
depend on the deployment scale. The "best" method will likely to be
different if you're running your home office with 5 computers and when
you run multi-million users network. That's why I think that the
house-keeping routine (I call it routine and not process or function to
not skew the perception in any way) independent, so it could be called
both from the main process and also from a dedicated house-keeping process.

> As Kea is now a standalone package, we have two choices for parallelism:
> 
> 1. Make Kea multi-threaded.  This will work for all databases, including
> memfile.
> 
> 2. Make Kea multi-process, disabling this capability for the memfile
> database.

> If we do (1), then Tomasz's suggestion of handling lease expiration in
> the same process as the lease allocation is natural: it is just one more
> thread.
> 
> If we go for (2), I argue that the case against having the lease
> expiration in a separate process disappears. If we have multiple
I don't think the choice between (1) and (2) is a management decision.
The best action would be to implement prototypes and measure their
performance. It will be more work in the short term, but will let us
pick the right solution that will stick with us for many years.

> Coming down to the problem in hand - what do we do for Kea post 0.9 -
> the main task is to write the lease expiration code. We must make this
> as modular as possible: that allows for flexibility.  We could - for
> example - put the code in both a housekeeper process and in the main
> DHCP server, using the former with an SQL database (so obtaining
> paralellism) and the latter with memfile.
That's what I was planning.

Tomek