BIND 9.2.1 - Unexpected cache behavior

Todd Herr todd at angrysunguy.com
Mon Oct 27 16:23:44 UTC 2003


Greetings.

We've got a fleet of Sun V100 servers deployed as caching-only
DNS servers.  These servers all run BIND 9.2.1 on Solaris 8.  The
servers have 1GB of RAM, and one CPU.

The servers experience wide fluctuations in load, based on their
location.  That is to say, servers in location A consistently
receive on average 875 queries per second, while at location B
they may only receive 200 queries per second.

What we're seeing at our busy sites is behavior that says to me
that TTL values are not being honored in the cache on these busy
servers.  For instance, a record with a TTL of 24 hours shows up
in our logs as a createfetch entry 58 unique times in a recent
five hour time period.  I infer from this that the record expired
from cache 57 times in that five hour time period.

This behavior is consistent with max-cache-size being reached, I
would guess, but max-cache-size does not appear in the named.conf
file, so by default it would be unlimited, yes?  The server
indicates, through top, that the named process has a size of
538MB, and that the server itself has 157MB of memory free.  top
also indicates that the named process is consuming 85 - 90% of
CPU resources, all in user and kernel space.

The only options explicitly specified in named.conf are:

   options {
        directory "/";
        dump-file "log/name_dump.db";
        allow-query {
          localhost;
          ...
        };
        auth-nxdomain yes;
        recursion yes;
        statistics-file "log/server_stats";
        allow-recursion {
          localhost;
          ...
        };
        recursive-clients 15000;
   };

(Note: ... here substitutes for some address-match-lists that we
 have defined.)

For logging, I'm only logging the resolver category by default;
query logging is configured into my logging options, but is
turned off during normal operation.

The questions I have are:

1. Can anyone hazard a guess as to the reason I'm apparently
   seeing entries prematurely expire from cache?  Alternatively,
   if I'm not seeing entries prematurely expire from cache, can
   anyone tell me what it is that I'm seeing?

2. Are there Solaris kernel tuning options/strategies available
   to me to make these boxes perform better?

3. The BIND 9 ARM says that each recursive client uses on the
   order of 20 kilobytes of RAM; does named reserve that much
   per recursive client at startup and does that mean that 300MB
   of my 538MB of process size is not cache but just space for
   recursive clients?

4. These servers are expected to be able to handle upwards of
   1500 to 2000 queries per second.  What would be a good number
   for recursive-clients, if not 15000?

Thanks for any clues that you folks can provide.

-- 
Todd Herr                                    todd at angrysunguy.com


More information about the bind-users mailing list