bind9 is taking little Breaks for Some Reason.

Martin McCormick martin at dc.cis.okstate.edu
Fri May 25 15:55:01 UTC 2007


	I have done more testing and am almost certain that the
problem lies in the box, itself though I am just as mystified as
ever as to what is happening.

	The one diagnostic tool that spotlights the problem for
sure is netstat run as:

netstat -w5

This prints counts of input packets and total bytes received as
well as output packets, errors and bytes sent during a 5-second
interval.

	In one test, I started the readings at exactly 16:00 on
Thursday and let them accumulate all night with the following
command:

netstat -5w |tee counts

Which prints both to the screen and to the file named counts.

Next morning, I stopped the test and looked at the syslog of our
DHCP server which is running on a different box and looked for
the first "timed out" complaints.  There was 1 at 16:21 and a
few seconds.

	I then stripped out the headers that netstat puts in so
all that was left was columns of numbers and sorted the 6TH
column which is bytes out.  In the entire roughly 15-hour
period, there were several columns with only 178 bytes sent out
the Ethernet interface in 5 seconds.  On our master DNS, there
are normally tens to hundreds of kilobytes sent in 5 seconds.
The first output drought occurred just 21 minutes after I started
the test and coincided with the first "timed out" message.  Here
is what the minute in which the  outage occurred looked like.
The first and last lines are normal and then you see the hit.

            input        (Total)           output
   packets  errs      bytes    packets  errs      bytes colls
      1372     0     145160       1246     0     206202     0
       676     0      63491        217     0      33595     0
       578     0      50200          6     0        560     0
       647     0      55292          2     0        242     0
       681     0      58581          1     0        178     0
       763     0      65570          3     0        302     0
       725     0      63723          2     0        246     0
       781     0      66721          3     0        302     0
       746     0      64504         15     0       1222     0
       770     0      66112         14     0       1150     0
       942     0      84863        298     0      88924     0
      2419     0     256723       2234     0     381206     0

	Basically, the packets go in during one of these
narcoleptic seizures and hardly anything comes back out for several seconds
although the interface stays up.  After about a minute, the
output comes back up to normal and the Sun comes out and the
birds sing for anywhere from 30 minutes to a couple of hours and
then another hiccup.

	As I said earlier, nothing is complaining about
anything.  bind looks quite tranquil based on a rndc status
that I ran off of an expect script triggered by the timeout
messages.

Martin McCormick WB5AGZ  Stillwater, OK 
Systems Engineer
OSU Information Technology Department Network Operations Group



More information about the bind-users mailing list