[bind10-dev] Python logging framework proposal

Sun Jun 20 16:06:41 UTC 2010

On 5/18/2010 12:58 AM, zhanglikun wrote:
>> Along with logging, alerting is often important.  This is where bind 9
>> fails; we are very good at reporting things to syslog, but we are really
>> bad at any sort of "scoreboard" type of alarm status or error reporting
>> that can be polled.  Do we want to combine these or not?  I would say
>> not right now, but "is my server healthy?" is a question we should be
>> able to answer without requiring someone to look through log files.
> 
> Yeah, agree,
> How about the alerting is done by stats module? like, If stats find there
> are 10 error or warning logs in 1 minutes, ok, the server isn't healthy now,
> send one alert email to administrator to report it.

This is an interesting discussion. You might consider doing something
similar to a product I work with. Alerts get sent to another process on
another server which stores the raw data in a database and then conducts
analysis on it later. Among other alerts are things like database
interaction or browser interaction taking too long. You could set up
different categories for things like slow response from an authorative
server, invalid responses from a server, TC responses, etc. This is not
of course an exhaustive list. You could then write analyses on what's
going on. Such an Alert server should be set up to handle data from
multiple systems since people usually have more than one nameserver that
they deal with and you don't want to have to set up one of these for
each server. The alert server can also conduct alive checks to make sure
the serves haven't going down and send email to responsible parties. It
could also send daily and weekly summaries of possible issues.

Hope this idea is useful.

Danny