[bind10-dev] Comment to Statistics Ideas

Shane Kerr shane at isc.org
Tue Jan 19 11:54:35 UTC 2010


Fujiwara-san,

On Fri, 2010-01-15 at 17:13 +0900, fujiwara at wide.ad.jp wrote:
> I wrote my idea of statistics daemon in this mail and comment to
> "Statistics Ideas" at http://bind10.isc.org/wiki/StatisticsIdeas.

Thanks!

> Please comment.

Some comments below inline below...

> My idea of statistics daeon was:
> 
> * Assumption
>  - Each AuthSrv has the same dataset in one BIND 10 system.
>  - Each AuthSrv has each statistics counters
>  - Stats daemon collects statistics data (counter set) from all AuthSrvs
>  - Stats daemon publishes statistics
>  - Querylog is managed by another system
> 
> * My idea was
>  - Stats daemon sends a collecting command to each AuthSrv via the CC channel
>     periodically
>  - Stats daemon totals each counter collected from AuthSrvs
>  - Stats daemon publishes one counter set
>  - Stats daeomn holds short time data only
>  - SNMP Stats server respond the recent couter set

So, the main purpose of the stats daemon is to collect counters from a
number of identical daemons and combine them. Do you think this would be
a general-purpose thing, or specific to AuthSrv counters?

> What we need to define and fix:
> 
>  - Stats couter list
>    - static or dynamic?
>    - if dynamic, need to consider counter managing command.

I tend towards dynamic, as it allows us flexibility for future changes,
and avoids versioning issues (matching the stats daemon with specific
AuthSrv versions).

>  - Statistics collecting command/protocol in CC channel
>    - data format

Simple XML makes sense, unless the statistics become huge. Maybe there
should be a way to ask for changes only, in order to minimize the
overhead?

So, something like:

  Command:   getAllStats
  Arguments: NONE
  Returns:   statistics, stat_id
  
        Returns all statistics from a server, and a statistics
        identifier that can be used to retrieve stats efficiently in the
        future.

  Command:   getNewStats
  Arguments: stat_id
  Returns:   statistics, stat_id

        Returns all statistics from a server that have been modified
        since the given stat_id, as well as a statistics identifier that
        can be used to retrieve stats efficiently in the future.

The use of these commands could be hidden by a class that manages these:

class statsConnection:
    def getStats(self):
        if self.stat_id is None:
            (self.stat_id, self.stats) = self.conn.getAllStats()
        else:
            (self.stat_id, new_stats) = self.conn.getNewStats(self.stat_id)
            self.stats = self._merge_stats(self.stats, new_stats)
        return self.stats

This kind of thing is only necessary if we are returning a lot of stats,
of course! :)

>  - Collecting frequency

The simplest thing is to have a configuration parameter and let the
administrator set this. We probably want to use a fractional number of
seconds, since some people may want this updated more than once per
second (crazy but maybe nice for GUIs).

This may be something that various stats reporting programs want to
modify. So, for example, a BIND 10 system may dump stats to a log file
every hour, but someone may start SNMP monitoring during a problem and
want these every 30 seconds.

Another possibility is that these are collected "on demand". This may be
useful for XML where we want stats, but if someone isn't asking then we
don't need them updated. Of course, this may cause a bit of latency when
getting stats via XML, but I think this should be quite minimal.

>  - How to publish
>    - SNMP
>    - XML ?

We have to have XML that is compatible with the BIND 9 XML stats (only
better). ;)

>    - command?

Probably useful. I'm not sure whether it makes sense to go through the
command & control daemon or use the XML daemon for this.

>  - How to implement internal database.

I suggest SQLite is probably the most straightforward. (See below for
more...)

> After here, comment to StatisticsIdeas in-line.
> -------------------------------------------------------------------------------
> | = Statistics Ideas =
> | 
> | This wikipage is for brainstorming ideas for the statistics interface.
> | 
> | == Existing BIND 9 features ==
> | 
> |   * have around 130 counters, such as IPv6 requests received, Requests with TSIG received, Zone transfer requests rejected, etc.
> |   * almost all statistics counters supported in BIND 8
> 
> Need to define which counters each AuthSrv has.
> 
> |   * XML-based statistics interface
> 
> Is the interface outputs totals of all AuthSrvs' counters?

Yes, I think so.

> |   * statistics counters about internal status (sockets/tasks/memory usage)
> 
> These counters cannot be totaled/merged.

So, these are handled outside of the statistics daemon, right?

> |   * file based statistics (but not needed if going to just XML)
> |   * per zone statistics
> 
> number of zones may be large number.
> need to define how to add/remove zones to collect zone's statistics.

Not disagreeing... we also need to decide on the default. I think we
should turn zone statistics "on" by default, since I think most servers
won't have a huge number of zones and may be interested.

> | == Ideas ==
> | 
> |   * per remote host statistics
> 
> need to consider how to collect.
> 
> # I use querylog to evaluate per remote host statistics.
> 
> |   * RESTful interface to get to specific statistics in XML (not everything at once)
> 
> need to define output interface

Well... output is mostly done... I think we use the BIND 9.7 interface.

> |   * all BIND 10 components may have statistics counters (even if not DNS related)
> 
> need to define.

Yes, and probably out of scope for the statistics daemon itself.

> |   * Separate daemon to handle HTTP requests and/or generate XML reports; BIND 10 will have separate processes running so having a central location to submit counter information might be useful.
> 
> Is it statistics daemon?

Based on your description, I think this is a separate program.

> |   * will the daemon store details on all statistics?
> 
> Internal database ? external database ?

I think initially via an internal, SQLite, database. At some point
someone will want these pushed into a "real" database and we'll want to
make a way of connecting to arbitrary SQL back-ends, but I think it is
best to implement that when the need arises. (Or even better, have a
patch submitted when this happens.)

> |   * use CC / msgq for subscribing to statistics and to send or receive stats data
> 
> Yes, I assumed.
> 
> |   * How much will the stats daemon hold?
> 
> Short time?
> 
> |   * Will the stats daemon periodically write to disk to store its collected data?
> 
> Yes. Stats daemon periodically write to on-memory or on-disk database.
> 
> |   * "statistics daemon" (ala syslogger) that is generic to listen for stats messages, then keeps counters or other growing stats data, and then can export them in some format(s) which can be used for reports and graphs. (Evi Nemeth shared the initial idea for this stats daemon at June 2009 face-to-face meeting.)
> 
> True, I think.
> 
> |   * It will also listen for queries so can provide the individual stats counters (or data) in near real-time.
> 
> I don't think so. It's AuthSrv's job.
> 
> |   * A daemon because many different programs may be sending stats data to it simultaneously; for example, a HTTPD webserver could submit various counters to it and a DNS server could submit counters to it also.
> 
> need to define counter list.
> 
> |   * Does it need to be a daemon? Can't it be a tool that's invoked by things that works on files?
> 
> If all components run on the same machine,
> shared memory is the best solution, I think.
> 
> But for example, query counter should be added up each counter from each AuthSrv.
> 
> |   * What's wrong with many different programs invoking an utility to alter an on-disk database?
> |   * On-disk may be too slow? So this would only write to disk periodically (even if every second).
> 
> The stat daemon may store data to disk database, I think.
> 
> |   * Then again, the initial program submitting the stats would have its own counters and doesn't need to submit them in real time. So maybe speed doesn't matter.
> 
> If there are multiple AuthSrvs which have same data, Stats need to add
> up each counters and it is preferable to collect information from each
> AuthSrv on the same time.
> 
> |   * What about top-ten lists, such as top ten records getting queries? If the server is hosting 1 million zones, would the stats daemon need to keep track of all of these? Or should the auth server component itself keep track?
> 
> need to define what AuthSrv collects and collecting protocol.
> 
> |   * Why not just query the individual components for their details and just have non-daemon tool do these queries and generate custom reports? (So no stats daemon needed or always running.)
> 
> Stats program may start by SNMP queries, or periodically, or always running.
> 
> |   * What about just having the zone databases also have data fields for different per-record and per-zone counters?
> 
> I don't know. need to define each AuthSrv has same data or not.
> 
> |   * Would it be useful to keep track of record creation/birth time, record change/modification time? record last access/read time? (per record or per RRset?)
> 
> I think it may be another function.
> DNS Update may be logged by logging daemon or AuthSrv.

--
Shane




More information about the bind10-dev mailing list