[kea-dev] Some thoughts about client classification

Wed Sep 23 05:21:43 UTC 2015

The following includes comments about what the current ISC DHCP code does.
This should not be construed as a requirement or even a suggestion that Kea
adopt the same functionality - it is merely a statement of what people appear
to have found useful over time.

I would like to take a brief step backwards and ask what are our (vague)
requirements for classification either in the upcoming release or in the future.

There are two areas that seem to be most interesting to me.
1) How many different classes do we think we need or want to support with this
functionality?  Depending on how we implement the classification step this
could become a bottleneck with large numbers of classes.  I think this can
be handled by providing a simple version first and then either we or the users
can supply a more specialized  hook if the number of classes becomes overly large
or the classification step becomes overly complex.

I also think that this argues for splitting the classification step from the use
of the class (as is done in Stephen’s straw man).

2) How many classes can a single client be in at one time?  In the current
ISC DHCP code a client can (usefully) be in up to 5 classes at once.  It will
inherit options from all of the classes.  In theory this can be useful but I’m
not sure if it really gets used in practice.  Imagine going to work with your
smartphone.  There might be a class for smartphones (perhaps with a short
lease time as they come and go quickly) a class for the type of smartphone
(say it’s an apple and there is a specific option it uses) a host declaration
(so you get some specific service).

I think for the current release we only need to support one class at a time
but in the future we may want to support multiple classes per client and the
design should allow for that.

**

In Stephen’s addition he mentions having a class block for both global and then
subnet options.  In ISC DHCP there is only one block of options per class directly
but a given class may also be associated with a specific pool which could have
options as well.  As the pool options are always dealt with it avoids the extra 
check on the class subnet options.  However that assumes that one can have
options in the pool structure.

While I’m not sure adding options to the pool would be a good idea I do think
associating the class with a pool might work better than associating it with a
subnet.  Returning to my smartphone example above - it is possible that the
administrators are only willing to supply a limited amount of address space
for phones and want other space in the subnet for laptops or desktops.  It may
be possible to modify the subnets to handle this feature but in many cases the
group administering the DHCP servers is not the same as those administering
the network and routing so the subnets may be effectively fixed as far as the
DHCP team is concerned.

In ISC DHCP each pool may allow or deny specific classes.  The current
suggestion only has an implied allow and only handles one class per
subnet.  I think we can probably defer deny for now (and possible for ever).
I’m unsure about a limit on the number of classes that could be associated
with a subnet (or pool if we change that).  Again returning to the smartphone
example.  One way to handle multiple types of smart phones would be to
have a class per type (for example Android vs iOS) which has the type specific
information and then have a pool that allows both Android and iOS classes
that includes the general smartphone information.  A different way would be
to have a class for smartphones defining the general information and then
the two type specific classes.  The pool would allow the smartphone class.

**

There are a couple more comments in line

> On Sep 22, 2015, at 5:25 AM, Stephen Morris <stephen at isc.org> wrote:
> 
> On 22/09/15 10:12, Marcin Siodelski wrote:
>> On 21.09.2015 21:08, Stephen Morris wrote:
> 
> 
>> For comparisons involving a single option field compared against a
>> specific value (e.g. integer) I don't see why we couldn't try to convert
>> the value specified in the definition of the class to the type held in
>> the option. For example, if the specific option is an array of unsigned
>> integers, this option is represented by the OptionInt class. This class
>> holds the array of integers internally. This class should expose
>> comparison method(s) which would accept the index of the field within
>> the option to compare to, and the value defined in the client class as a
>> string. So for example:
>> 
>> OptionInt<uint32_t>::compare(unsigned int field_index, const
>> std::string& op_type, const std::string& value);
>> 
>> This function would know which type the given field (specified by the
>> field_index) has and would lexical_cast value to this type for comparison.
>> 
>> For the match operation it would probably do it on strings as you
>> propose, but for others it makes more sense to convert value to the
>> specific type, rather than the other way around because if someone types
>> "01" instead of "1" it wouldn't pass the test if compared on strings.
> 
> If that is easy to do, I'm OK with that.  I only suggested strings in an
> attempt to simplify implementation.
> 

I agree with the direction of Marcin’s comment about comparisons.  It is
quite common for users to have some difficulty determining exactly what
they need to match on - which section of a string or which value and
what the format is.  Doing what we can to make this easier (for example
allowing for the use of integers) is probably useful.

>> I'd also think that for Kea 1.0 we could restrict the number of
>> supported operations to "eq" and "match". We could also consider "ne".
> 
> That would be OK (although "eq" is not strictly necessary, as "match" to
> "^string$" is the equivalent).  It does strikes me that we would also
> want a "not-match" operator.
> 
> 
>>> "test": ["chaddr", "eq", "08002b02deadbeef"]
>>> ... matches if the hardware address (in the "chaddr" field) is that
>>> specified.
>>> 
>>> "test": ["chaddr[0:8]", "eq", "08002b02"]
>>> As above but matching all clients if their hardware address starts with
>>> the string specified.
>>> 
>> 
>> One has to note that chaddr is not an option but the field in the
>> packet, so it would require some special code paths. I think that in the
>> first step we don't require classification based on the contents of the
>> chaddr because this is what host reservation is intended to do, with its
>> own semantics.
> 
> My thought here was that for simplicity for the user, the fields in the
> packet are accessed in the same way as options.  But you are right, the
> parsing has to identify that the name is a field in the packet and take
> a special code path.  We could perhaps leave this out for the first release.

I can see the logic behind leaving it out but I’m a bit concerned that it might
be something a lot of people already use.  Perhaps if we leave it out of 1.0 
we can try to ensure it gets in one of the next releases (1.1 or 1.2)?

> 
>> 
>>> "test": ["vendor-class-identifier", "match", "foo"]
>>> Matches if the vendor class identifier option contains the string "foo"
>>> somewhere in it.
>>> 
>>> "test": ["vendor-class-identifier", "match", "bar$"]
>>> Matches if the vendor class identifier ends with the letters "bar".
>>> 
>>> 
>> 
>> I think we also need some example for comparison of some specific option
>> fields, which seems to be quite frequent use case. So, rather than
>> matching the whole option we may sometimes do:
>> 
>> "test": ["vendor-class-identifier[2], "eq", "docsis3.0"]
>> 
>> where [2] is an index of the option field. Note that we don't track
>> names of the fields.
>> 
>> One also has to note that the options have suboptions. These options
>> have to be referenced somehow. Maybe we don't need to do it for 1.0 but
>> some form of encapsulation notation would be needed.
> 
> Again I was trying to keep things simple.  But if we can manage "[n]" to
> indicate an element of an array as well as "[n:m]" to indicate a
> substring, that would be good.  But we are likely to have to deal with
> constructs such as:
> 
>   option[n].suboption[i:j]
> 
> ... which will complicate parsing.
> 
> With regular expression matching we might be able to get away without
> needing to allow a substring specification in the first implementation, as
> 
>   "option[3:5]", "eq", "abc"
> 
> ... could also be written as
> 
>   "option", "match", "^…abc"

Possible we can try and limit the complexity of the basic class matching code
and move any more complicated classification into a hook?

> 
> 
>>> The Kea manual already illustrates how a specific address can be
>>> returned (by incorporating the "client-class" keyword in a "subnet"
>>> clause (section 7.2.13.1). As a subnet within a subnet{4,6} clause can
>>> already contain options that override the global data, maybe the
>>> quickets way to allow the choice of options based on class is to permit
>>> overlapping subnets and pools within the subnet clauses.  This is
>>> illustrated in the fuller example, below:
>>> 
>>> (example omitted)
>> 
>> I think it is hackish and not acceptable in the long run. It will put a
>> serious overhead on administrator to copy paste all these subnets to
>> define new classes. And, who knows how many classes there will be. Also,
>> it would require modification to how we parse subnets configuration
>> because each subnet instance comes with its own, usually self generated
>> subnet id, so they would actually get different ids.
>> 
>> I think the proper way to do it is to, unfortunately, add a "class" map
>> into the subnet structure and for each of them hold options specific to
>> this class.
> 
> You are probably right, I was focusing on getting something running, and
> this is probably not good in the long-term.
> 
> If we are doing this though, we have to do it properly, which implies
> that we need a class map at the global level to override global options
> and a class map at the subnet level to override subnet options.
> 
> Adding a couple of ideas that occurred to me since the last email, how
> about:
> 
> Client class definition
> ---
> 
> "client-classes": [
>    {
>        "name": "class-name",
>        "combine": "and",
> 	"tests": [
> 	   [ selector, op, value ],
>           [ selector, op, value ], ...
>        ]
>    },
>    {
>        "name": "class-name-2",
>        "combine": "or",
>        "tests": [
>            [ selector, op, value ],
>            [ selector, op, value ], ...
>        ]
>    },
> ]
> 
> The modification here is that each class definition has one or more
> tests and there is an extra field that tells how the tests are combined
> for the classification to succeed.  If omitted, the combination defaults
> to "and".
> 
> Note that the "or" could also be implemented by repeating the
> classification element, i.e.
> 
> "client-classes": [
>   {
>      "name": "foo",
>      "combine": "or",
>      "tests": [
>         [ selector1, op1, value1 ],
>         [ selector2, op2, value2 ], ...
>      ]
>   },
> ]
> 
> ... is equivalent to
> 
> "client-classes": [
>    {
>        "name": "foo",
>        "tests": [
>            [ selector1, op1, value1 ]
>        ]
>    },
>    {
>        "name": "foo",
>        "tests": [
>            [ selector2, op2, value2 ]
>        ]
>    }
> ]
> 
> I think it would be more effort to prohibit that alternative
> representation than to allow it.  In fact, allowing multiple definitions
> of a class also allows the creation of more complicated logical
> expressions than simple AND and OR.
> 
> 
> Client class use
> ---
> The tricky thing is to add a syntax that is compatible with what we have
> at the moment.  Currently, options are defined in the "option-data"
> array. either at the server level or at the subnet level.
> 
> I suggest adding a "class-option-data" array of the form:
> 
> "class-option-data": [
>    {
>        "client-class": "class-name",
>        "option-data": [ ... ]
>    }
> ]
> 
> ... which associates an option-data definition with a class.  This can
> be added to both the top-level and the subnet declarations to provide
> class-specific option data.
> 
> The search path for an option value is then:
> 
> 1. Subnet class-option-data matching the client class
> 2. Subnet option-data
> 3. Global class-option-data matching the client class
> 4. Global option-data
> 
> As an example, consider the following configuration (apologies for the
> length).  It may be easier to read the notes at the bottom then look at
> the relevant parts of the configuration.
> 
> 
> "Dhcp4": {
> 
> # (Class definitions have been omitted from this example.)
> 
>    "option-data": [
>        {
>            "name": "domain-name-servers",
>            "data": "192.0.2.3"
>        }
>    ],
>    "class-option-data": [
>        {
>            "client-class": "beta",
>            "option-data": [
>                {
>                    "name": "domain-name-servers",
>                    "data": "10.0.0.1"
>                }
>            ]
>        },
>        {
>            "client-class": "gamma",
>            "option-data": [
>                {
>                    "name": "time-servers",
>                    "data": "10.2.3.4"
>                }
>            ]
>        }
>    ],
>    "subnet4": [
>        {
>            "subnet": "192.0.2.0/24",
>            "client-class": "alpha"
>        },
>        {
>            "subnet": "192.0.3.0/24",
>            "client-class": "beta"
>        },
>        {
>            "subnet": "192.0.4.0/24",
>            "client-class": "gamma"
>        },
>        {
>            "subnet": "192.0.5.0/24",
>            "client-class": "delta",
>            "option-data": [
>                {
>                    "name": "domain-name-servers",
>                    "data", "10.0.0.2"
>                }
>             ]
>        },
>        {
>             "subnet": "192.0.6.0/24",
>             "option-data": [
>                  {
>                       "name": "domain-name-servers",
>                       "data", "10.0.0.3"
>                  }
>             ],
>             "class-option-data": [
>                  {
>                       "client-class": "epsilon",
>                       "option-data": [
>                           {
>                              "name": "domain-name-servers",
>                              "data": "10.0.0.4"
>                           }
>                       ]
>                  }
>             ]
>        }
>    ]
> }
> 
> (Extraneous option definition information omitted.)
> 
> The logic for picking up the options is that if a client is classified as:
> 
> * "alpha": The subnets are searched sequentially, and this class matches
> the class restriction of the first subnet 192.0.2.0/24.  As there is no
> option definition in the "subnet" clause, Kea checks the global options.
> The only class option definitions are for the class "beta", so it picks
> up the global option definitions, giving the DNS server address as
> 192.0.2.3.
> 
> * "beta": The second subnet (192.0.3.0/24) is selected with the class
> matching.  Again there are no option definitions in the subnet, so the
> global options are searched.  There are global class option definitions
> for "beta", so those options are picked up, giving the DNS server as
> 10.0.0.1.  The generic global options are checked but as the only option
> defined is for domain-name-servers - which Kea has already found - that
> definition is ignored.
> 
> * "gamma": The subnet 192.0.4.0/24 is selected.  Like "beta" there is a
> global class option definition for this class, so "time-servers" is set
> to 10.2.3.4.  Unlike "beta" though, when the global option-data is
> examined, Kea finds domain-name-servers defined.  This is an option that
> has not already been found, so in addition to "time-servers", the option
> "domain-name-servers" is picked up with a value of 192.0.2.3.
> 
> * "delta": domain-name-servers is defined in the matching subnet
> definition, so the value of 10.0.0.2 is used.  No other options are
> defined in the search path, so that is the only option picked up.
> 
> * "epsilon": Kea will settle on the subnet 192.0.6.0/24 as all the other
> subnets are restricted to other classes.  Within this subnet, there is a
> class option definition for domain-name-servers matching the class
> "epsilon" so the defined value 10.0.0.4 is used.  There are no other
> options in the search path, so this is the only option returned.
> 
> * Other classes: again the subnet 192.0.6.0/24 is used as that is the
> only one that matches.  The client-option-data clause in that subnet
> definition only matches "epsilon", so other classes will use the
> "option-data" clause and the value of domain-name-servers as 10.0.0.3.
> Again, as there are no other options in the search path, this the only
> option used.
> 
> Stephen
> 
> 
> _______________________________________________
> kea-dev mailing list
> kea-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/kea-dev