Generating more useful statistics from BIND 9

sthaug at nethelp.no sthaug at nethelp.no
Thu Mar 24 11:27:12 UTC 2005


Having tested Nominum CNS (and loved it! wish we could afford it!), I
find the standard statistics facilities of "out of the box" BIND 9 to be
lacking. For a recursive name server, I would like to see:

- Number of client queries this name server is receiving per interval
- Number of server queries (queries to other, authoritative name servers)
this name server is sending per interval
- Hit rate in the name server's cache, either per interval or accumulated
- Number of failures (SERVFAIL) per interval
- Number of outstanding client queries

Number of outstanding client queries can be obtained via "rndc status",
and number of failures per interval can be obtained from "rndc stats".
However, number client queries per interval, number of server queries
per interval, and hit rate in the name server's cache cannot.

I patched bind-9.3.1rc1 to generate the necessary statistics and return
them all on one line in the form of an "rndc stats2" command. This gives
me output of the type shown below (4 consecutive "rndc stats2" commands
at 10 second intervals):

20050322 22:03:18 rclients 159/1000 nxdomain 158410678 recursion 119906469 failure 27626551 clntquery 541173877 srvquery 148937516
20050322 22:03:28 rclients 176/1000 nxdomain 158411768 recursion 119907328 failure 27626713 clntquery 541179730 srvquery 148938545
20050322 22:03:38 rclients 199/1000 nxdomain 158413016 recursion 119908197 failure 27626896 clntquery 541185353 srvquery 148939623
20050322 22:03:48 rclients 154/1000 nxdomain 158414360 recursion 119909142 failure 27627172 clntquery 541191401 srvquery 148940768

These can then be run through a suitable awk/perl/whatever script to 
give the suitable statistics, e.g.:

20050322 22:03:28 176  85   2  585 102  85  77
20050322 22:03:38 199  86   3  562 107  84  77
20050322 22:03:48 154  94   4  604 114  84  77

Columns: Date, time, # of outstanding client queries, # of recursions
per second, # of SERVFAILs as percentage of client queries in interval,
# of client queries per second, # of server queries per second (which
this name server generates), hitrate in cache for interval, cumulative
hitrate.

The awk script I use is:

BEGIN {i=10;print "Date Time Rcli Recu Fail Cqry Sqry Hitrate CumHitrate"}
{split($4,cl,"/");rec=$8;fail=$10;cq=$12;sq=$14;drec=rec-prec;dfail=fail-pfail;dcq=cq-pcq;dsq=sq-psq;printf("%s %s %3i %3i %3i %4i %3i %3i %3i\n",$1,$2,cl[1],drec/i,dfail*100/dcq,dcq/i,dsq/i,(dcq-drec)*100/dcq,(cq-rec)*100/cq);prec=rec;pfail=fail;pcq=cq;psq=sq}

Notes:

- The "recursion" statistic above (same as from "rndc stats") is, as far
as I can see, the number of client queries that result in the recursive
name server having to generate queries to other, authoritative name
servers. But it is *not* the same as the number of such queries actually
generated (typically there are more server queries generated than the
"recursion" statistic indicates - I assume this is due to retries etc.)

I would like a statistic that is closer to the number of packets
actually sent - which is why the "recursion" statistic is not used
directly. However, given the number of client queries, the "recursion"
statistic can be used to calculate hit rate in the recursive name server
cache: (client queries - recursion) / (client queries), either per
interval or accumulated.

- Should maybe count update, notify, iquery and unknown opcodes as
"client query" too. See the code from bin/named/client.c routine
client_request, starting at line 1548 (currently client queries are
incremented only in ns_query_start line 3386 called from client_request
line 1551).

- The extended statistics (and the names of these) are stored in
statically allocated arrays (see lib/dns/inc_stats2.c) - no fancy
creation & destruction routines here. Works okay for me in my
non-threaded/uniprocessor environment, probably not suitable for the
general threaded/multiprocessor case.

Steinar Haug, Nethelp consulting, sthaug at nethelp.no

----------------------------------------------------------------------

--- bin/named/include/named/control.h.orig	Fri Sep  3 05:43:32 2004
+++ bin/named/include/named/control.h	Tue Feb 22 23:19:59 2005
@@ -38,6 +38,7 @@
 #define NS_COMMAND_REFRESH	"refresh"
 #define NS_COMMAND_RETRANSFER	"retransfer"
 #define NS_COMMAND_DUMPSTATS	"stats"
+#define NS_COMMAND_STATS2	"stats2"
 #define NS_COMMAND_QUERYLOG	"querylog"
 #define NS_COMMAND_DUMPDB	"dumpdb"
 #define NS_COMMAND_TRACE	"trace"
--- bin/named/include/named/server.h.orig	Mon Mar  8 05:04:21 2004
+++ bin/named/include/named/server.h	Tue Feb 22 23:28:14 2005
@@ -163,6 +163,12 @@
 ns_server_dumpstats(ns_server_t *server);
 
 /*
+ * Print extended statistics
+ */
+isc_result_t
+ns_server_stats2(ns_server_t *server, isc_buffer_t *text);
+
+/*
  * Dump the current cache to the dump file.
  */
 isc_result_t
--- bin/named/server.c.orig	Wed Nov 10 23:13:56 2004
+++ bin/named/server.c	Tue Feb 22 23:56:35 2005
@@ -60,6 +60,7 @@
 #include <dns/rootns.h>
 #include <dns/secalg.h>
 #include <dns/stats.h>
+#include <dns/stats2.h>
 #include <dns/tkey.h>
 #include <dns/view.h>
 #include <dns/zone.h>
@@ -3577,6 +3578,11 @@
 		fprintf(fp, "%s %" ISC_PRINT_QUADFORMAT "u\n",
 			dns_statscounter_names[i],
 			server->querystats[i]);
+	ncounters = DNS_STATS2_NCOUNTERS;
+	for (i = 0; i < ncounters; i++)
+		fprintf(fp, "%s %" ISC_PRINT_QUADFORMAT "u\n",
+			dns_stats2counter_names[i],
+			dns_querystats2[i]);
 	
 	zone = NULL;
 	for (result = dns_zone_first(server->zonemgr, &zone);
@@ -4024,6 +4030,23 @@
 		     soaqueries, server->log_queries ? "ON" : "OFF",
 		     server->recursionquota.used, server->recursionquota.max,
 		     server->tcpquota.used, server->tcpquota.max);
+	if (n >= isc_buffer_availablelength(text))
+		return (ISC_R_NOSPACE);
+	isc_buffer_add(text, n);
+	return (ISC_R_SUCCESS);
+}
+
+isc_result_t
+ns_server_stats2(ns_server_t *server, isc_buffer_t *text) {
+	unsigned int n;
+
+	n = snprintf((char *)isc_buffer_used(text),
+		     isc_buffer_availablelength(text),
+		     "rclients %d/%d nxdomain %llu recursion %llu failure %llu clntquery %llu srvquery %llu",
+		     server->recursionquota.used, server->recursionquota.max,
+		     server->querystats[3], server->querystats[4],
+		     server->querystats[5],
+		     dns_querystats2[0], dns_querystats2[1]);
 	if (n >= isc_buffer_availablelength(text))
 		return (ISC_R_NOSPACE);
 	isc_buffer_add(text, n);
--- bin/named/query.c.orig	Wed Jun 30 16:13:05 2004
+++ bin/named/query.c	Tue Feb 22 00:03:25 2005
@@ -40,6 +40,7 @@
 #include <dns/resolver.h>
 #include <dns/result.h>
 #include <dns/stats.h>
+#include <dns/stats2.h>
 #include <dns/tkey.h>
 #include <dns/view.h>
 #include <dns/zone.h>
@@ -3381,6 +3382,8 @@
 	dns_rdatatype_t qtype;
 
 	CTRACE("ns_query_start");
+
+	inc_stats2(dns_statscounter_clientqry);
 
 	/*
 	 * Ensure that appropriate cleanups occur.
--- bin/named/control.c.orig	Fri Sep  3 05:43:31 2004
+++ bin/named/control.c	Tue Feb 22 23:19:45 2005
@@ -103,6 +103,8 @@
 		result = ISC_R_SUCCESS;
 	} else if (command_compare(command, NS_COMMAND_DUMPSTATS)) {
 		result = ns_server_dumpstats(ns_g_server);
+	} else if (command_compare(command, NS_COMMAND_STATS2)) {
+		result = ns_server_stats2(ns_g_server, text);
 	} else if (command_compare(command, NS_COMMAND_QUERYLOG)) {
 		result = ns_server_togglequerylog(ns_g_server);
 	} else if (command_compare(command, NS_COMMAND_DUMPDB)) {
--- lib/dns/include/dns/stats2.h.orig	Sun Mar  6 16:52:37 2005
+++ lib/dns/include/dns/stats2.h	Tue Feb 22 23:13:29 2005
@@ -0,0 +1,17 @@
+#ifndef DNS_STATS2_H
+#define DNS_STATS2_H 1
+
+/*
+ * Client / server query statistics counter types.
+ */
+typedef enum {
+	dns_statscounter_clientqry = 0,	/* Client generated query */
+	dns_statscounter_serverqry = 1	/* Server generated query */
+} dns_stats2counter_t;
+
+#define DNS_STATS2_NCOUNTERS 2
+
+const char *dns_stats2counter_names[DNS_STATS2_NCOUNTERS];
+unsigned long long dns_querystats2[DNS_STATS2_NCOUNTERS];
+
+#endif /* DNS_STATS2_H */
--- lib/dns/resolver.c.orig	Wed Feb  9 00:59:44 2005
+++ lib/dns/resolver.c	Mon Feb 21 22:58:58 2005
@@ -47,6 +47,7 @@
 #include <dns/rdatatype.h>
 #include <dns/resolver.h>
 #include <dns/result.h>
+#include <dns/stats2.h>
 #include <dns/tsig.h>
 #include <dns/validator.h>
 
@@ -1310,6 +1311,8 @@
 	if ((query->options & DNS_FETCHOPT_TCP) == 0)
 		address = &query->addrinfo->sockaddr;
 	isc_buffer_usedregion(buffer, &r);
+
+	inc_stats2(dns_statscounter_serverqry);
 
 	/*
 	 * XXXRTH  Make sure we don't send to ourselves!  We should probably
--- lib/dns/Makefile.in.orig	Thu Dec  9 05:07:15 2004
+++ lib/dns/Makefile.in	Mon Feb 21 23:31:47 2005
@@ -52,7 +52,9 @@
 DNSOBJS =	acl. at O@ adb. at O@ byaddr. at O@ \
 		cache. at O@ callbacks. at O@ compress. at O@ \
 		db. at O@ dbiterator. at O@ dbtable. at O@ diff. at O@ dispatch. at O@ \
-		dnssec. at O@ ds. at O@ forward. at O@ journal. at O@ keytable. at O@ \
+		dnssec. at O@ ds. at O@ forward. at O@ \
+		inc_stats2. at O@ \
+		journal. at O@ keytable. at O@ \
 		lib. at O@ log. at O@ lookup. at O@ \
 		master. at O@ masterdump. at O@ message. at O@ \
 		name. at O@ ncache. at O@ nsec. at O@ order. at O@ peer. at O@ portlist. at O@ \
--- lib/dns/inc_stats2.c.orig	Sun Mar  6 16:51:41 2005
+++ lib/dns/inc_stats2.c	Tue Feb 22 23:12:27 2005
@@ -0,0 +1,14 @@
+#include <dns/stats2.h>
+
+/*
+ * Second version of inc_stats from query.c - only called to increment
+ * client query or server query counters, don't bother with zone.
+ */
+unsigned long long dns_querystats2[DNS_STATS2_NCOUNTERS];
+const char *dns_stats2counter_names[DNS_STATS2_NCOUNTERS] =
+	{"clntquery", "srvquery"};
+
+inline void
+inc_stats2(dns_stats2counter_t counter) {
+	dns_querystats2[counter]++;
+}


More information about the bind-workers mailing list