[kea-dev] Initial proposal for Kea Control API

Wed Jul 13 19:41:12 UTC 2016

Ok, it definitely took me more time, but I finally got round to this one.

On 26.05.2016 22:01, Thomas Markwalder wrote:
> On 5/24/16 7:27 PM, Tomek Mrugalski wrote:
> This is a good first cut.
Thanks. And thanks for the detailed review.

As there are also comments from Shawn and Marcin pending, I didn't want
to alter the requirements numbering. In certain cases I added extra
level (e.g. H.17.1). Once we finish collecting feedback from external
parties, I'll go through and renumber them properly.

> General:
> --------
> 
> 1. We have the command channel, across which we issue commands, yet you
> refer to them frequently as "calls".   As in the Kea Admin guide,  we
> should refer to them as "commands" throughout this document.
Updated.

> 2. We need requirements that describe command channel security.  You
> cited large response sizes as a potential DOS vector. Actually, once
> they have access to the command channel, the hammer can be dropped in
> any number of ways.  Making it secure is a vital aspect and needs to be
> covered by requirements.
Making something absolutely secure today is not possible. But I get your
point. Added requirement A.8 "Authentication MUST be supported on
command channel.". Note the wording implies it is possible to enforce
authentication, but does not mandate it on every deployment.

> 3. People may want an audit log of commands issued, so we should
> consider adding requirements for this.   These may be satisfied simply
> by a dedicated logger but that's an implementation detail.
Good idea. Added requirement A.9.

> Administrative Management
> --------------------------
> 
> 1. The paragraph under A.1,  "Supporting large command parameters end
> and responses..."
> 
> Really this is part of the A.2 discussion and should be incorporated there.
Updated.

> 2. A.5 - implies, but does not explicitly state, that content supplied
> with set-confg would replace the entire existing, in memory
> configuration.  In other words, set-config is intended to supply a full
> configuration, anything not supplied in it does not exist.  We should
> maybe state this explicitly.
Added text. Hopefully it's more explicit now.

> 3. One thing we might consider is to allow set-config to automatically
> dump a successful configuration to a file for diagnostic purposes. 
> Somebody does a set-config, possibly from a remote box and no one
> records the config. If Kea then crashes we would have no certain
> knowledge of it's configuration at the time it went down.   The file
> could be  saved to a time-based file name.  This would not overwrite the
> existing configuration, nor be reloaded at startup (though that has
> interesting possibilities).
I would be cautious about it. For a moment I have write-config command
described, but then realized how dangerous such call could be. If
exploited remotely, it would possibly allow Kea to write files in
locations specified by the attacker. I think better approach is to add a
flag ("write") that, when set, would cause Kea to write down its current
configuration somewhere, but that somewhere must not be arbitrary I
don't know, maybe we will limit it to Kea state directory?
Anyway, that's something to be figured out in the design.

> Lease Management
> ----------------
> 
> 3. "Those two calls will be used to retrieve..."  should be "These two
> commands will be..."
Updated.

> 4.  "Q: Do we want to have a single query (e.g. get-lease4) with
> multiple parameter sets or do we want separate queries..."0
> 
> Initially I thought, they should be be separate commands but I think
> having a single ("overloaded") command is more flexible should we decide
> to add variants in the future.  I don't think the extra parameter logic
> to deal with permutations would be significant.  Whatever we do decide
> here we should apply universally throughout the API.  Either we
> "overload" commands or we do not.
Updated the text to go with overloaded approach.

> 5. "Q: Do we want to support multi-tenancy..."
> 
> This seems like a broader question, than just at our command API level. 
> This is likely to have ramifications other places.  In such a scenario
> then, what would get-lease6(ip-addr) return if there were more than one
> lease on different subnets?  It could the first such lease we found OR a
> collection of the leases.
The comments you and Shawn made give me some ideas. If we want to do
multi-tenancy, it likely should be on a global scope rather than leases
or subnets. In any case, this is out of scope for now, so the question
has been removed.

> We could decide now that all get commands return collections, just as
> SQL selects return rows/result sets, giving us ample flexibility for any
> number of future requirements.
I prefer for each call to return one or zero objects (leases, subnets,
etc). get-something type of calls are relatively easy to implement, but
set-something is trickier. We would have to implement some sort of
transaction (to roll back the leases we already inserted if the next
lease inserion fails, do the same for subnets and hosts). This would be
very difficult to implement (e.g. the change caused subnet with all
leases in it removed, then next subnet failed, so we need to recreate
subnet *and* all leases that we just removed). To avoid this sort of
elaborate logic, it's simpler to go with one object per call approach.

> 6. Does update equate to replacing the entire lease with what is sent
> with the update command?  In other words, is an update equivalent to
> delete/add?  If so it implies that every value for the updated lease
> must be in the JSON supplied to the update.  I'm not saying this is a
> bad thing, I'm simply looking for clarification.
Added extra text. It should be possible to update only some parameters.
The example given is that the sysadmin wants to change lease lifetime,
so he only has to specify IP address and lifetime. In this case only
lifetime will be updated as IP address stays the same for the duration
of the lease lifetime. That's convenient, because he just wants the
client to keep its lease longer and doesn't want to be bothered with
details, like what the subnet-id or cltt was.

> 7. Should we allow them to change the subnet id of lease?  This might
> come in handy for repairing some unseen situation but I'm not certain it
> is good idea.
Yes. The commands can be dangerous if you misuse them. We will put
necessary warnings in the documentation. But clueless sysadmin can wreak
havoc by writing directly to the DB anyway. It's sorta "here's a
shotgun, this is your foot, have a nice day" attitude :)

> 8. Do we not also have the multi-tenancy question with update and delete?
I removed the multi-tenancy from the doc for now.

> 9. "Q: Do we want a way to delete all leases in a subnet? ..."
> 
> Yes, I think this is useful.
Ok, added.

> 
> 10. "Q: Do we want to delete all leases that belong to certain identifier?"
> 
> Are you talking about an identifier having leases in more than one subnet?
> I imagine this could also be useful.
Added overloaded delete-lease{4,6}.

> 11. "Note: There are currently no plans to implement calls that retrieve
> multiple leases..."
> 
> We could implement a row limit with some reasonable number, and maybe a
> flag or parameter for overriding that limit.
> As noted above under general comments, security is a bigger issue than
> just this item.  In theory only admins should be using this and
> ultimately it is up to them to use the commands safely.
I decided to leave this out of scope, but also added it to the list as a
note that we considered this alternative. I'm hoping to get some
feedback from couple friendly companies. It's good to have some
alternatives.

> Host Reservation Management
> -----------------------------
> 
> 12. H.17, as with updating leases, is update-reservation equivalent to
> delete/add?
Added clarification. A user has to specify only those parameters that
are to be changed.

> 13. "Q: For IPv6 there may be multiple IPv6 addresses and/or prefixes
> reserved. There is no easy way to identify them..."
> 
> This question also applies to host options no?  Users might find it
> equally useful to add, update, or delete options without having to
> updating the entire reservation.  We could add these commands as "MAY"
> support.
Yes. Added requirements H.17.1 and H.17.2 to

> 14. "Q: Do we want to specify delete-reservation with (identifier-type,
> identifier, subnet-id)?"
> 
> Yes I think we should include this.
Updated the text.

> Subnet Management
> -----------------
> 16. "TBD: What to do about subnets modification? There are several
> options:..."
> 
> I think a mode parameter is a good idea, certainly one that allows
> choosing between #1 and #3.
> 
> I'm not sure how we would implement #2's subnet validity check as a
> parameter pertaining to a single update.  These checks would have to
> continue over time, rather than as part of the update processing.  How
> would one turn this off again?  What we could do, if we are worried
> about performance is make it a global level parameter, that admins turn
> on or off, if we think the performance impact warrants this or it
> prohibits some form of host reservation behavior.
Updated text. I decided to keep #2 for now, but added a note that it is
not the preferred way. If we get more voices against it, we'll remove
it. To somewhat defend it, I think having such a check (if the lease
belongs to active pool) would be useful as a way for Kea to
automatically recover from someone or something messing up its database.

> 17."Q: How do we want the subnet removal procedure to work? There are
> several possible options:"
> 
> I do not believe #2 is viable because it introduces a violation of
> referential integrity.  This is true even though Memfile doesn't have
> foreign key constraints and for RDBMS's they aren't mandatory. 
> 
> As with the update, I think we should provide a delete "mode" parameter
> that lets them pick either #1 or #3, "retire" or "immediate".
> 
> Under "retire", new subnets cannot have the same ID as a retiring
> subnet, so there is no break in integrity.Retired subnets are deleted
> only after their last lease is removed.  Mode #3 avoids the issue
> entirely by performing a cascading delete. 
> 
> I think that #4, "reconfigure process", might actually be a subnet
> command in itself.  Wouldn't your use case be something like this:
> 
> 1. Delete the current subnet
> 2. Add the new subnet
> 3. Tell clients of the subnet to reconfigure
Ok, I think this is a scope creep. The goal here was to design the API,
not the underlying features. We don't have reconfigure support now, so
anything related to reconfigure is vague at best at this stage.

> 18. As an aside, for an datacenter type setup, where they are going to delete
> thousand of subnets and add thousands more, at some point the ID pool
> runs out.  We may need an administrative command for handling this.
We have it already. You can explicitly specify subnet-id when specifying
a subnet. And even letting Kea assign subnet-ids automatically it's not
going to run out. Let's assume 20k subnets. That's more than I ever
heard anyone was using. Let's further assume this particular deployment
is insane and removes and then add all of those 20k subnets every 5
minutes. At this pace they would loop the subnet-id in a bit over 2 years.

That's not a problem, though. Their subnet-id (assuming the use
automatic numbering) would simply loop back to 0 start counting up
again. If you really think that's an issue, we can consider upgrading
subnet-id to 64 bits.

> Options management - Option Definition commands
> -----------------------------------------------
>  
> 19. "Kea allows specifying options in several scopes:..."  Does not
> mention client class as a scope.
Updated.

> 20. The add-optionX-def commands appear to be the only add commands
> which support adding multiple elements
Because we don't have any way to reference them. We could do something
like saying: get-option offset x (i.e. xth option specified, but that
would be even more confusing).

> - Why support multiples with this one and not others? 
> Consistency and clarity in APIs are important.  We could decide, that
> all add commands should accept a collection of objects.  If I can add
> one host reservation, why not ten?  Or maybe we adopt a naming
I think we should go the other way and try to make all calls single
objects. There are several serious issues with handling multiple
objects. First, we may get hit by the fragmentation. It's solvable, but
requires extra effort. Second issue is much more important. What if
process half of the objects and then the next one fails for whatever
reason. Do we continue or rollback? Sometimes rollback may be extremely
difficult or even impossible. For example if you delete subnets, you may
trigger DNS removals that are already ongoing. You can't roll them back.
So for these reasons I prefer to go with 1 call = 1 object approach.

So we have 2 choices here:

a) update the set/get-options{4,6}-def commands once we figure out how
to reference options that are already added. Any suggestions?

b) decide that it's ok to work on option sets. Symmetry between
different calls is nice, but not that critical. Oh, and we already allow
setting multiple options, multiple IPv6 reservations in host (for the
same reason).

Which one do you prefer?

> - If we're going to support adding more than one at a time, what happens
> if one of entries is invalid, does the whole addition "rollback" and fail?
Yup, that's a problem that is really not solvable. See the DNS update
example above.

> 21. "set-optionsX-def calls set all option definitions. New options will
> replace whatever old definitions may have been there."
> - Does this include the pre-defined "standard" option definitions or
> does this only apply to custom options. I'm not sure I see the
> usefulness of this command.  Do you have use case in mind?
No, I thought only about custom options. You can't have option
definitions for standard options. These are built in and cannot be
overridden from config file.

> - If there is an option definition X and values for option X have been
> specified by two subnets and three host reservations on X but the new
> set of definitions does not define option X?  Does the set command fail?
I wasn't really planning, because it's unenforceable. Right now you can
insert into database an entry that does not match your config. And you
can't enforce that. If you really think this is an issue, maybe the way
to go would be to have a sanity-check type of a call? It would verify
that all your data is consistent.

> 23. The delete command names abbreviate "delete" to "del".  We need to
> be consistent. Either always spell it out or always abbreviate it.
We already have commands that use full names. So no abbreviations.

> 24. As with the set command, how do we handle a delete option definition
> command that  attempts to delete a definition that is in use?
Depends on where it is in use. If it's in a configuration, we can
probably catch that and reject the configuration. However, if it's used
somewhere in the host reservations stored in database, this would be a
problem, as we would have to iterate over all reservations from the DB
and check them one by one. Note we don't have any API for that, so we
would have to extend all backends. Doable, but not worth the effort imho.

> 25. "Q: Do we want also get-optionX-def which would return a single
> option definition?"
> Yes, I think we do.
Ok, added.

> Options management - Option commands
> -----------------------------------------------
> 
> 26. "Kea allow option specification on global and per subnet level. Both
> can be manipulated using the same call. There will be an optional
> parameter subnet-id. If it's not specified, the code applies to global
> level. If there is subnet-id specified, the change applies to specific
> subnet. The same rule applies to all call related to options."
> 
> Actually they can be set at  global, subnet, host and eventually class
> level.  Are these commands intended only to address the global and
> subnet scopes, while options for hosts or classes will be handled under
> object specific commands?
Yes. There's a comment for add-reservation and update-reservation that
explains it would also cover options. There's currently no API designed
for options defined on class level. I simply decided it's not worth the
effort at this stage.

> 27. "add-option4 and add-option6 add new DHCPv4 or DHCPv6 option values."
> 
> Do these commands one option specification or a collection of one or
> more? It's not clear.
Added "of a single DHCPvX option". Hopefully it's clear now.

> 29. "set-options4 and set-options6 set all option values for a given
> scope...." 
> 
> 30. "get-options4 and get-options6 command returns all option values
> defined for a given scope. It has one optional parameter: subnet-it. If
> it's defined, all global options are returned. If it's defined, only
> options for a given scope are returned."
> 
> This has a few issue, first is "subnet-it" you mean "subnet-id".  And I
> think you mean if it is not defined, global options will be returned, if
> it is defined..."
> 
> 
> Interfaces Management
> ---------------------
> 
> 31. "Q: Do we want those calls at all? ...."
> 
> I think we probably do want these eventually.  You could change them
> from MUST to MAY.  But I could see somebody having everything else
> correct but forgetting an interface name or something.  Certainly, I can
> see them wanting the redetect-interfaces.
> 
> 
> Client Classification
> ---------------------
> 
> 32. You don't think we know enough about them to write the
> requirements?  I won't press the issue but I think we do.
At this stage I simply ran out of steam to propose something. If we
implement all those features and people would be using them and asking
for more, then would be a good time to implement classes modification
API. Personally I consider this document a long lived one (think years).
We will add it eventually, but not in the first iteration.

> Runtime Operations
> ------------------
> 
> 33. Since these are only statistics commands, you could maybe rename
> this section.
No. These are statistics only *for now*. I think over time we will have
more parameters, like maybe cpu usage, free memory, database connection
status etc. So I really meant that name when I chose it.

Tomek