[kea-dev] Design for Kea Host Reservation

Wed Oct 8 12:35:09 UTC 2014

On 07/10/14 14:51, Tomek Mrugalski wrote:
> On 06/10/14 19:28, Marcin Siodelski wrote:
>>>   "If the allocation of the new lease fails for the reserved address or
>>>   prefix, the allocation engine retries using the client's hint. If that
>>>   fails, it proceeds with a normal allocation process."
>>>
>>> That's completely wrong, I'm afraid. Host reservation is not a
>>> guideline or a suggestion, it's a strict rule the server must follow. HR
>>> can be used to grant or special service, but also also to confine users
>>> in various ways. We can't simply give them a regular address instead.
>>>
>>
>> I take this point. But, you might have seen my email sent earlier today
>> to Thomas where I state this:
>>
>> "For the Host Reservation, there is an assumption that the server will
>> always try to use reserved resources for a host, if any. But, if the
>> reserved resource is unavailable for some reason (e.g. is in use) the
>> server should still be able to provision the client by allocating some
>> other resource. We may obviously speculate whether it is always
>> appropriate for the server to allocate a different address than the one
>> that the administrator wanted a client to get and whether the client
>> should rather not be provisioned in such case. But, I think it is not a
>> problem to make this configurable at the later time once the whole logic
>> is in place. So, this discussion is out of scope in the doc."
>>
>> In my opinion we should not make too strict assumptions because I can
>> imagine customers having some use cases in which this would be allowed.
>> And, I state this again: it is much easier to restrict something (with a
>> configuration knob) than extend the mechanism if the use case appears.
>>
>> I tend to agree that this is going to be a rare case. But for this
>> reason I don't see a massive escalation of issues that someone has
>> received different address than reserved.
>>
>> As you seem to be pretty confident here, I would like you to make a
>> final statement on this: "that we will never, ever need a configuration
>> knob which would allow for dynamic allocation if reserved address is
>> unavailable". If you can make this statement I can remove this from the
>> design.
> I can't speak for all the users. I thought about this a bit more. I'm ok
> with the server assigning an address different than in host reservation,
> under the condition of the reasons being clearly logged in a very
> visible (warning?) way. Something like "Client A has reservation for
> address B, but B is currently assigned to client C. Temporarily
> assigning a new address D to A. Will change that address as soon as C
> attempts to renew B. That correction is expected to happen in E
> seconds.". A bit long, but unambiguous. They other way to express it
> ("hey admin, you messed up and made a reservation for address that is in
> use, we'll correct this mess for you, but it will take some time") would
> likely be considered less appropriate ;)
> 

I updated the design document with the considerations that there are two
approaches that the server could follow. I also pointed out that the
default behavior should be to reject the client for which the
reservation is made and the reserved address is in use.

>> One of the possible approaches would be to wait for the first client to
>> renew his address and once the server sees the renewing client it may
>> send 0 lifetimes to this client to say: "don't use this address anymore,
>> because I have reservation for it. Instead I am giving you this
>> dynamically allocated address". The reserved address gets back to the
>> server and waits for the second client to renew in which case the second
>> client gets the reserved address and the previously allocated address is
>> de-allocated. So, over time there is a transition and both clients
>> remain in service and they finally get their addresses as appropriate.
>> What is wrong with this?
> That's acceptable behaviour and we can go ahead with this. But it has
> its drawbacks. First, the client gets an address from a dynamic pool.
> That may be a problem if host reservation is used for confining or
> segregating clients, e.g. for a handful of clients that forgot to pay
> and are redirected to a captive portal. Second, the address the client
> gets (from dynamic pool) will change in the near future. I suppose both
> are ok, as this is a misconfiguration recovery mechanism.
> 
> As you said, having a knob to allow admin to decide would be the best
> ultimate solution.
> 

Ok.

>> I don't want to get panicked by this. Maybe let's ask users? Maybe let's
>> disable this by default and display warnings when enabled?
> Sure, that will work.
> 
>>
>>> Here's how it should work in my option:
>>>
>>> 1. there are no reservations
>>> 2. client A gets address X
>>> 3. admin add reservation for address X to client B
>>> 4. client B requests an address, the server discovers that there is a
>>>    reservation, but also a lease for client A. It logs an error and
>>>    the client B is not assigned any address.
>>> 5. Client B repeats discovery process in loop, with exponential backoff.
>>> 6. Client A eventually renews, the server discovers that the address
>>> it has is reserved to someone else, send X with 0 lifetime back to A
>>> and assigns other address Y.
>>> 7. Client B does another discovery attempt and get reserved address X.
>>>
>>> Obviously, 3. is a misconfiguration, but we can't completely prevent
>>> that from happening.
>>>
>>> This is the right way to recover from a misconfiguration in my
>>> opinion. If we implement the way you propose, then users will start
>>> asking questions: why didn't the host reservation work? How long till
>>> the server starts using the host reservation I specified? And there
>>> would be no easy answers, because it would depend on T1 and lease
>>> lifetmes (think about clients that get an address and disappear: waiting
>>> till T1 wouldn't be enough, you'd have to wait till valid-lifetime).
>>>
>>> You may argue that the plan I described about generates more
>>> traffic. That is true, but it's a weak argument. First, such
>>> misconfiguration is expected to be a rare event. Second, it gives much
>>> better recovery time. The usual exponential backoff counts to 120
>>> seconds (or 3600 seconds if a client supports RFC 7083). That's still
>>> much better recovery time than some of the real networks we heard about
>>> (e.g. 7 days lifetime in cable networks).
>>
>> Ok, so this is an exponential backoff for the Client B. But Client B
>> still needs to wait for Client A to renew and so as the server can
>> replace the address it is using with a new (not reserved address). So,
>> the Client's B retransmission period doesn't mean anything on its own.
>> If the Client A waits for 7 days before it Renews, Client B is out of
>> service for 7 days. Whereas, with the approach I described it could use
>> some address during the transition period.
>>
>>
>>>
>>> In time, when we get reconfigure support, we will trigger it after
>>> step 4 to make the recovery much faster.
>>
>> Obviously not for DHCPv4.
> Why? See RFC3203 (and RFC6704). Ok, I'm not sure how popular is
> forcerenew implementation.
> 
>> I don't see a reason whye the reservation can't be out of the pool. So
>> you're proposing that when I define a reservation, the configuration
>> mechanism checks if this reservation happens to be in one of the pools
>> defined for a subnet? And if it is, reject the reservation? How would I
>> guarantee this for the HR configuration in the database? What about the
>> cases that someone reconfigured the server as we were discussing above?
> That's what I what thinking about. If the HR is defined in config, we
> could sanity check it during config reload. If the HR is in the database
> we could do two things. First, sanity check it during runtime when we
> happen to read it from the DB. Second, we could implement a command,
> something like host-reservation-check, that would go read all HRs and
> sanity check them. We may implement such a command anyway, regardless of
> what we decide for this particular case.
> 
> The primary reason why in my opinion reservation shouldn't be out of the
> pool is performance. As I said before, if you allow in-pool
> reservations, lease select will become slower.

I see. Performance is a valid point here. I am afraid that without
actual performance data we're not really able to assess what that impact
would be.

I think the in-pool reservations are in fact popular. For example, on my
home router I have a pool of addresses to be handed out to my home
devices. The router's administration panel lists the devices which were
handed out an addresses and for each of them I have a "Reserve" button
which makes a reservation of the address they are currently using. This
is the in-pool reservation.

On the other hand, my home router doesn't have to support thousands of
leases per-second because I have a couple of devices at best. So the
performance is not critical. But, what precludes Kea from being a DHCP
server for small networks like my home network or any other networks
where performance is not a key requirement? This is honest question.

Although the additional check for existing HRs during the dynamic
allocation will always cause performance penalty, I would like to
discuss the usefulness of caching here which could help mitigating the
problem of excessive queries to database. See this scenario:

Client A has a reservation for address X. Client B doesn't have any
reservations but will get an address from the dynamic pool. Client A
requests an address first and will be allocated an address reserved for
him. The HostMgr will have to make a query to the database to obtain
reservations for client A. The caching layer stores this reservation in
server's memory. Client B requests an address and allocation engine
picks one from the pool. This address happens to be X. The server
doesn't have to make additional query to the HR database because the
reservation has been cached. Of course, lookup in memory for the
reservation has performance implications. But, the performance
assessment we made some time ago did not reveal that the server is CPU
bound.

Assuming that the Client B is the first one to send a request, the
HostMgr will not have the reservation for Client A cached. Hence, the
server would need to query the database for the reservation of the
specific address. This would have a performance impact for the dynamic
allocation for the Client B. But, the caching layer would store this
reservation in memory and when Client A shows up, there is no need to
query HR database because the entry has been already cached.

The problem shows up when the allocation engine happens to pick
addresses from the pool and they all happen to be reserved. Assuming
that the server would retry 100 times this would result in 100 queries
to the HR database. On the other hand if I am an administrator aware of
the potential impact of the in-pool reservations on the performance, I
still may struggle to create out-of-pool reservations where possible to
avoid conflicts. So, with a proper configuration I can mitigate the
problem. Nevertheless, additional HR queries (either to the SQL or
cached data) will remain.

Thankfully, I believe that we can make a final decision on this after
doing some performance measurements as the change in the code to take
into account HR reservations during dynamic allocation would be straight
forward (both ways).

> 
> The second argument is monitoring. Right now you could do some
> measurements and try to optimize based on it. For example, check that
> your pool is 1000 addresses long and you already have 1000 valid leases
> in the db, so don't bother searching for available lease. You could do
> statistics and triggers based on it ("hey admin, you're running out of
> addresses, there's only 5 available left out of 1000 total") etc. If you
> allow in-pool HRs, you won't be able to do any of that.

Yes, there is that. But, if Kea allows for both in-pool and out-of-pool
reservations, an administrator willing to do things like this may always
fall back to the sole use of the out-of-pool reservations in which case
he can assume that the pool contains only dynamically allocated
addresses. This obviously has a down side that we push off the
responsibility for making sure that the reservations are distinct from
the dynamic pool onto the administrator. But, if someone really wants to
do things like this he can always automate this check using his own tools.

> 
>>> Is there HR for this client? If yes, use whatever is reserved and be
>>> done with it. If not, use dynamic allocation as it is defined now,
>>> without performing any HR queries at all. It's faster and the code is
>>> simpler.
>>
>> So you're proposing that the server doesn't check if the lease exists
>> for the particular address in the lease database when it has HR?
> No, I meant the opposite. When the server picks a candidate for a
> dynamic lease, it only checks whether the address is used (if there is a
> lease for it) or not. It doesn't check if that particular lease is
> reserved for someone else.
> 
> In general case, allowing in-pool reservations will degrade performance
> even if there are no HRs specified at all. The code would be sending HR
> queries anyway.
> 

I don't disagree.

When I was reading your use case for monitoring and the trigger, I had a
thought that the pool definition could come with an optional parameter
which defines whether the pool has a relaxed or strict policy, i.e.
whether host reservations are allowed or disallowed within this pool.
The allocation engine gets the pool for the client and checks this
property. If it finds that it is a relaxed pool it must use HostMgr to
check for existing reservations which has all the bad sides you
mentioned. If the pool has a strict policy the allocation engine doesn't
have to worry about HRs and proceeds as usual. Now that the pool is
defined as a map in the configuration file, adding this parameter would
be trivial. Of course, for each pool following a strict policy the
conflicts with HRs would need to be checked before configuration is
committed.

Having an additional switch always require some more code and some more
work. So, it doesn't have to be done from day one. Maybe we could use a
strict approach first and then introduce a relaxed one as an extension.
If we do opposite, someone using a relaxed approach (implemented first)
would be unhappy seeing his configuration invalidate with a next release
of Kea (assuming that the strict approach becomes a default). But, which
one should be a default is another discussion.

>> I also don't understand "without performing any HR queries at all". The
>> query to the HR database has to be made to obtain information if the
>> reservation is specified for the host.
> Yup, but only once. For one incoming packet, it is queried once. We're
> talking about a case, where we checked that there's no HR for this
> client and decided to pick a dynamic address. The question here is
> whether we should keep sending HR queries for candidate leases that
> we're picking from dynamic pool.
> 
> Anyway, if my arguments don't convince you, then so be it. We'll measure
> the performance after HR is implemented anyway. We may discover that
> allocating new leases will be slower than it used to be. If the
> degradation will be significant, we could add a configuration knob to
> forbid in-pool reservations. Until there's empirical data, there's no
> point in continuing this discussion. Let's keep the design as you proposed.
> 
>> I have no issue with performance tests. In my opinion we should run them
>> as soon as possible for all changes we make.
> Agree.
> 

Marcin