[kea-dev] Statistics design proposal for 0.9.2

Wed Apr 15 08:43:42 UTC 2015

On 14.04.2015 15:18, Tomek Mrugalski wrote:
> Hi all,
> One of the features planned for 0.9.2 are statistics. Here is the
> proposed design: http://kea.isc.org/wiki/StatsDesign.
>
> This design is not the most advanced or fully featured. It's a
> compromise between what we could do and what actually can do in the
> limited timeframe of 0.9.2 release.
>
> The basic concept is that the statistics are currently simple, but they
> can evolve over future releases. Whatever evolution path we'll choose,
> the API should remain stable, if possible.
>
> Please review and comment.
>

I would like to clarify the comment I have made at some point about the 
use of concurrency when gathering the statistical information. I didn't 
really mean that the statistics manager should run in a separate 
process. I was rather thinking that it should run in a separate thread. 
This thread could create a socket and listen on the fd belonging to this 
socket. This would allow for better responsiveness of the stats manager 
in the presence of many DHCP packets being received on possibly many 
interfaces. This would also allow to perform certain independent tasks 
like, reception of a command, unparsing JSON, creating and sending the 
response concurrently with the main thread which handles DHCP stream. 
There is a problem with the concurrent access to the StatsMgr such that 
certain values have to be locked for write when second thread is reading 
them. But, that is not something that can't be solved with the design of 
the StatsMgr.

The stats manager's operation is going to be based on time intervals. 
For example: keep statistics collected for the last 5 minutes. The use 
of threads would probably make it much easier to use asio-based timers 
which are asynchronous, i.e. based on callbacks invoked when specific 
timers expire. You can't do it easily when you are hanging on the call 
to select() in the main thread.

I take the point about the limited time for 0.9.2 but I am afraid we get 
too much hammered to the idea of the synchronous processing even when we 
could do better. I leave it up to you, but basically if I understand 
correctly what Shawn said at some point, the lack of concurrency with 
respect to statistics is the problem in isc-dhcp.

I wonder how statistics is going to be configured. I understand that 
you're planning to add invocations to the StatsMgr in multiple places in 
the code where you're going to bump the counters. But, is it going to be 
possible to enable/disable specific counters so as they are not bumped 
if not needed? Or, it is assumed that the counter bumping operation is 
fast enough that such optimization would not bring a lot of benefit? 
 From the "Performance Optimization" section however it seems that it 
has been of your concern.

I am iffy about the naming for statistics per subnet You say, 
"subnet[0].packets-received". But, what if I remove the subnet with 
index 0 from the configuration? The subnets will get renumbered and the 
statistics will now apply to wrong subnet. Wouldn't it be better to 
identify subnets using SubnetID which is supposed to be unique?

On the related note. Does this also account for the statistics per 
interface?

In the data extraction section we should keep in mind that the 
communication over the unix socket requires two sockets: one for the 
client and one for the server. So I guess, you'll need to extend the 
"control-socket" parameter to specify two names? I am also not so sure 
that choosing the string as a parameter for control-socket configuration 
is a right choice. If you want to use the same parameter for future 
sockets: TCP, UDP or whatever else, it may quickly occur that you need 
more parameters. If I am correct about the two names for socket files 
you already have three parameters that describe the socket communication.

It would be useful if the design included some sample JSON requests and 
responses, including responses which report errors in statistics 
gathering. The organization of the JSON query and response should be a 
subject for review because it will be troublesome to modify it once 
people start implementing proprietary clients.

I also wonder if this "protocol" shouldn't be the base for the remote 
management API, in which case we should take into account use cases for 
the management API here? Not that I want to start implementing 
management API right now, but just make sure that it will be compatible 
when we implement it.

On the class diagram, I still think that it may be useful to make it 
generic and allow for some additional types apart from the ones you 
listed. In particular, string value. Suppose someone writes a hook and 
wants to store some textual information in it like last error found.

Doesn't Observation require setValue modifiers?

Marcin