[bind10-dev] Statistics features & tasks

Fri Dec 17 02:13:58 UTC 2010

Hi,

A few comments that might be helpful.

The users always want more DNS statistics, but they also want faster 
code and smaller footprint. So the question needs to be posed back to 
the requesters "What exactly do you want and how much performance and 
footprint are you willing to pay for it?" Matching what BIND 9 does is a 
plenty high goal to start with.

One user theme that has come up often is a module that would take some 
of the statistics data and turn it into SNMP data. I really dislike SNMP 
for just about anything, but it is the tool many people expect to use. 
IMO, the most interesting SNMP data is related to total query volume, 
views, how many queries are dropped if there are filters, how many 
return various DNS errors. You have this on your list, and I just wanted 
to put out some concrete things that might go into that.

There is also a question as to how one talks to this SNMP module as 
compared to any system level SNMP engine. (IMO, I would not try to feed 
stuff to the system engine, there are too many possibilities out there.)

Another thing that people will want are what I class as runtime 
statistics (as compared with DNS statistics.) This is stuff like how 
many instances of each module are running, what their memory and cpu 
usage is and things like that. Given the way BIND 10 runs compared to 
BIND 9, this is going to be very important to ops teams. They will need 
to figure out exactly what is going on and asking them to dig through 
the output of various system tools to get what they want is not a good 
way to encourage migration.

Thinking forward a little farther, it would be great is there was a 
filtered stream mechanism. Thinking crudely about it, I connect up, load 
the particular events and stats I care about, and only those things are 
sent back on my stream. It's much more efficient to filter the messages 
before they leave the server rather than to send the whole stream and 
have the client drop what they don't care about. There are a bunch of 
subtleties that make my say this should be done a good bit later than 
the simple send it all mode.

One theme that often comes up is looking for the highest traffic clients 
of a recursive server (could apply for auth, but it's less interesting.) 
It can both be useful for planning and for detecting bots. The challenge 
is that all this machinery needs to be added to the path for every 
query. Not an early task, IMO.

Everyone wants to be able to put the stats into a sql database. There is 
a design question as to whether that data should live on the BIND 
server. If this is to be the kind of thing that gets written into the 
database and never read out (you'd be amazed how often this is true) 
then having it on the server makes sense. If this is data is to be read 
many times, then it makes more sense for the server to live somewhere 
else and have the server only load the data.

One way to look at this is to define a given schema for the stats. 
Another way is to load the data into a schema of the user's design and 
have the sql commands to do that be the responsibility of the user. I 
like the later approach. My thinking would be to have something that is 
kind of the inverse of what the BIND 9 dlz stuff did, where the data 
elements are exposed and then substituted into the sql commands 
according to rules that the user defines.

For the visual tool part, just feed rrdtool and then anyone else can do 
whatever they want from there. I assume this is more demonstration code 
than anything else.

jerry

On 12/16/2010 02:52 PM, Shane Kerr wrote:
> All,
>
> Kambe-san has worked with Aharen-san and Fujiwara-san to produce a list
> of features&  tasks for the statistics work. The idea is that we can
> then fit the statistics into the Scrum model.
>
> I think it makes sense to discuss them a bit, and then we can think
> about estimates.
>
> First, we have the the features with a suggested ordering:
>
>    High
>        * Extend collection of statistics from Auth
>        * HTTP/XML statistics reporting in BIND 9 style
>        * Add collection of statistics from Recursor
>        * Add collection of statistics from Xfrin/Xfrout
>        * Add/extend collection of statistics from such modules
>          - Boss
>          - Msgq
>          - Cfgmgr
>        * SNMP statistics reporting
>    Low
>
> I think this matches my own preferences, although perhaps Larissa has
> other feelings.
>
> A few new features were suggested:
>
>   * Adopting visual graph drawing including third parties
>
> I'm not exactly sure what the extend of this is. Is the idea to provide
> data in a format that specific 3rd party visualization tools can use?
> I'm not opposed to this in principle, although this will probably be
> "contrib"-style additions rather than core BIND.
>
>   * Adopting statistics storage which is stored the data once
>     calculated by stats daemon
>     (It's like SQLlite or in-memory DB?)
>
> The idea here is that the statistics will remain across reboots? That
> makes sense to me, assuming that is the idea.
>
>   * Management of statistics items
>     (It's currently defined in 'config_data' of the stats spec file.)
>
> Okay, this is also important. Note that I'll be posting a version of
> Jerry Scharf's work on the BIND 10 command tool soon, which should help
> define how this will work. (Don't be scared, it's good stuff.)
>
>
> Moving on from the features to the task list:
>
>    Task related to Stats:
>      - Overhauling the current design of the stats daemon for implement
>        HTTP/XML reporting
>
> I think we need to discuss this. I assumed that HTTP/XML reporting was
> to be done by a separate process. That way if you don't want HTTP/XML
> reporting you don't have to run the code at all, plus any errors in the
> code won't affect other statistics reporting, and so on. These are all
> of the usual reasons for using separate processes for different
> functionality.
>
> If the redesign is intended to support HTTP/XML functionality in a
> separate task then that is fine. If the redesign is meant to include
> this in the statistics collection daemon, then I'd like to know the
> motivations.
>
>    Tasks related to Auth (after #347):
>      - Merge IntervalTimer into lib/asiolink
>      - Definition of auth statistics to be tracked
>      - Review of definition of auth statistics to be tracked
>      - Code for configurable interval of submitting statistics via Cmdctl
>      - Review of code for configurable interval of submitting statistics
>      - Code for parsing response from statistics module
>      - Review of code for parsing response from statistics module
>      - Design for reducing performance overhead of auth statistics
>      - Review of design for reducing performance overhead of auth
> statistics
>      - Code for reducing performance overhead of auth statistics
>      - Review of code for reducing performance overhead of auth
> statistics
>
> These other tasks all make sense to me. :)
>
> --
> Shane
>
> _______________________________________________
> bind10-dev mailing list
> bind10-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind10-dev