[kea-dev] Requirements for the Logging and Diagnostics in Kea

Thu Apr 9 18:30:23 UTC 2015

On 09.04.2015 07:48, Shawn Routhier wrote:
>
>> On Apr 7, 2015, at 4:10 AM, Marcin Siodelski <marcin at isc.org> wrote:
>>
>> Hi All,
>>
>> One of the goals for the Kea 0.9.2. release is "Logging and Diagnostics Improvements". This includes a number of little improvements and tweaks to the existing logging system, but some of the possible enhancements are much more significant than this.
>>
>> I have created the requirements document http://kea.isc.org/wiki/DiagnosticsRequirements for "Logging and Diagnostics Improvements", which is meant to summarize the requirements mentioned in different discussions we've had so far.
>>
>> This document is still a "draft" because there is the outstanding requirement which we're discussing internally and I am not sure if there is a reliable way to implement it. That is:
>>
>> "Kea MUST provide an utility to automatically locate a core file after server crash to prevent future server runs from overriding the core file."
>>
>> Note that not only does this requirement provide a way to prevent override of the core file but it also allows for having a convenient way to put all useful debugging information in a single place (The same utility could copy the current configuration file, lease file, log file etc).
>>
>> However, I am not familiar with any standard and reliable way to locate the core files. On Linux I know that there is a core_pattern file but it may redirect the core file to a program, rather than save it in the specific location on the disk. In that case, you don't really know where the core dump goes.
>>
>>
>> Please review all other requirements in the document and provide your feedback.
>>
>> Thanks,
>>
>> Marcin Siodelski
>>
>> DHCP SW Engineer
>> ISC
>> _______________________________________________
>> kea-dev mailing list
>> kea-dev at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/kea-dev
>
>
> I have one high level concern which is to make sure that the logging doesn’t end up
> impacting the performance of the server in general.  Using different log levels should
> help some there but we may want to do some testing to see if anything more needs
> to be done (allow to enable / disable some of the logging at run time or at compile time
> etc).
>
> 1-8 - this is written such that it implies a single tool.  Do you want to mean that?  Or would
> several tools be acceptable (for example one tool per backend each with something to read
> that backend and that uses a common method to output to the CSV file).
>

I very much want to make it a single tool. Multiple tools mean more 
maintenance. My initial thought was that we create a C++ tool which 
links with the libdhcpsrv library and connects to the appropriate 
backends. Once the tool is connected to the backend it uses the common 
logic to gather all leases from the backend and dumps it to the CSV 
file, using the functions of the Memfile backend. This approach would 
have the following advantages:

- If the SQL schema is extended with new columns, we just update the 
backends and both the server and the tool uses the updated schema.
- Since it would be the C++ tool, we could use the Gtest for unit testing.
- We could have a command line parameter to point to the Kea 
configuration file and the tool could extract the database 
configuration, without a need to pass the database name, user name etc. 
manually.

However, we have decided that we don't want to go down this path because 
of time constraints. Instead we want to use the tools shipped with MySQL 
and PostgreSQL to dump the database in the CSV format. It lacks the 
advantages listed above, but it has a major advantage that it can be 
implemented quickly.

We're going to do it with the shell script. There is nothing that 
precludes the shell script to expose the command line parameter to 
select if you want to connect with mysql or postgresql. The other 
options would be the same for both db types, so there is probably no 
reason to split it to two separate tools.

> Note that in the CSV case we may have several files that represent the “lease file”.  As they
> are in the proper format we would probably want to simply copy all of them as part of the
> collection process.
>

So, you're proposing to add the requirement for the tool to merge the 
csv files into a single file? Although, it is probably convenient, I 
guess we can get away without it for now. Just browse multiple files.

> 9, 10, 11, 12 - I’d like to clarify that this is happening for each individual callout.  So if
> there are two callouts at the same hook we would get
> info about calling callout 1
> info about returning from callout 1
> info about calling callout 2
> info about returning from callout 2
> If so we might also want a log message to indicate the start and end of the whole process
> and might be able to move some of the duplicated information there so:
> info about entering hook
> same as above - except no hook information as that is already available.
> info about leaving hook
>

IMHO, we should also have the separate log message saying that we are 
entering the hook and that we're leaving the hook. So, I think we're on 
the same page.

> We may want to include some sort of internal transaction id on each packet and use that
> to connect the various log messages together.
>

Perhaps we can just use the transaction id from the packet received? 
But, even then, there may be messages which will lack the transaction 
id, depending how deep in the call stack they are.

> In 12 we have the callout execution time, I’m wondering if there are other times we might want
> as well - potentially the total execution time and the time waiting for responses from the backend
> DB might be useful.
>

Probably, but I'd not add it for 0.9.2. The major concern is that 
someone will write a hook which will be slow and will impact the server 
performance. The time spent sending queries to the database is something 
that is better predictable because we have ran our benchmarks and tested it.

> In Tomek’s mail he mentioned logging information similar to that from ISC DHCP.  There
> are three useful notes to consider.
>
> 1) There may be a couple of items to add in some of the
> messages.  Sadly I don’t recall them offhand but there were a couple of them where you
> couldn’t tell quite what had happened from the message.
>
> 2) we may wish to make some of
> them more or less visible (possibly by use of log level).  With the current code we can get
> rather large log files fairly quickly on a busy system, if Kea does get much better performance
> this will be much worse.  Trying to notice “interesting” items within a large log file can be
> difficult.  Yes you can search for things such as DHCPINFORM if you know that your
> enterprise doesn’t use that but if we can provide a good way to stress odd items people
> would probably like that.

Kea uses multiple logging levels. The default logging level is usually 
INFORM. On this level we will not log any messages on per packet basis, 
e.g. "received DHCPINFORM from X via Y". Such messages would be logged 
on the DEBUG level. The DEBUG severity is associated with the debuglevel 
value ranging from 0 .. 99. So, it is possible to differentiate the 
logging messages by the debuglevel. I presume that we will make use of 
that mechanism heavily.

>
> 3) With ISC DHCP many people use scripts to review their log
> files.  We try hard not to change the log entries once we have started using them to avoid
> breaking existing scripts.  It would be best if we decide on the structure and format of
> these log entries for Kea by 1.0 and don’t change them thereafter.  Also providing a good
> description of them in the docs would probably be appreciated.
>

I agree with this. But, there may be issues with this when we migrate to 
the multi-threaded/process model. But, I think it is whole a different 
discussion.

Marcin