[kea-dev] Requirements for statistics module in Kea

Thu Apr 9 05:48:32 UTC 2015

> On Apr 3, 2015, at 8:20 AM, Tomek Mrugalski <tomasz at isc.org> wrote:
> 
> Folks,
> One of the major features in upcoming 0.9.2 release are statistics. I
> just wrote an initial set of requirements for this piece of code:
> 
> http://kea.isc.org/wiki/StatsRequirements
> 
> I'd love to hear your comments. I plan to work on the design next week.
> There's no strict deadline for your feedback, but the sooner you provide
> it the better.
> 
> Thanks,
> Tomek
> _______________________________________________
> kea-dev mailing list
> kea-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/kea-dev

In reading through the requirements and Marcin’s comments I have
concerns about overloading the servers with work to collect and
maintain the statistics.  I would move all the requirements to process
the data out of the server and into something else.  This could be
a process that is supplied with Kea or a process that a user creates.
(It isn’t clear from the requirements if you are already thinking along
these lines or were considering having the server do all the work itself.)

This would avoid the sever processes spending time trying to
keep any derived information updated but would still allow a user
to derive that information themselves if and when they wanted it.

If we feel that having the derived information is an early requirement
we could include work on that process as well as the data collection,
but it’s also something we might be able to leave to the users for now.

In my model this process would also be where any decisions about 
saving the statistics over time would be made as well as how often
to poll the server processes.  This suggests that this process might
need to be able to poll for objects in some sort of grouped fashion
(for example get global stats every 10 seconds, get pool usage every
60 seconds, get DDNS usage every 10 minutes).

This would also require two externally visible items 1) something to
describe the various objects that are available (essentially a MIB from
the SNMP days) 2) a path for the stats process to get the information
from the servers (in the short term this could be the CSV file somebody
else mentioned).

**

One issue that should be addressed, especially when exporting the data,
is the atomicity of the objects.  If I do a poll of the global values for v4 will
they all represent the same time?  (In some cases it is useful to have them
be atomic with respect to each other but I think in general that isn’t required
in DHCP.  However we do need to be clear on what the rules for the objects are.)

**

I would also consider if it is worthwhile having an easy method to disable
the actual gathering of the statistics.  Optimally this could be enabled / disabled
without a restart of the system.  I may not normally care about the stats
but want to be able to enable them if something seems to be broken.
(This would work better for some counters than others - starting to count
the number of DISCOVERS is pretty easy to understand, starting to count
the number of addresses in use would be a lot less useful.)

This may make more sense if some of the counters are specified as 
primarily for debugging.  For example one might want to have a set of
different counters for why a packet didn’t generate a response (such as
malformed packet, malformed options, unknown options, bad options,
bad checksum, etc) and not count them most of the time, 

**

For 1 we then may not need the floating values - in general I would
probably try to limit the types of values to those that we actually need
and use but try to make the code extensible.  

For 4 we may want a specific incrementCounter(“foo”) if that provides a
useful performance improvement.  I imagine that the great bulk of the calls
will be a simple increment and so I think it is probably worth optimizing that
path.

I would move 5, 12, and 16 into the stats process I described above.

Can you elaborate a bit on 6?  I can see that meaning something like 
the number of DISCOVERS received or the total number of addresses in
use (or both).  In the first case it is probably a direct counter.  In the second
that might either be an item that is directly counted or something that is
derived, perhaps by adding up all the addresses in use in all of the pools.
In some cases we may wish to include such a derived value while in others
we may simply have the consumer do the derivation themselves.

For 9  and 11 we may wish to be able to get some grouped subset of the
statistics (by server, by pool…)

For the specific requirements we may want to look at a Case diagram to
track the packets through processing and see where we might want to put
counters.  It can be useful to have a counter for all the places a packet could go.