reconfig times

Kelsey Cummings kgc at sonic.net
Tue May 24 18:38:27 UTC 2005


I've been trying to track down a couple of performance related problems
with bind9 and have found at least one thing that's causing us trouble 
today.

Initially while one of my name servers has just been started it
consistently will process a reconfig in under a second - usually around
0.6.  A perfectly acceptable time to reload and validate the config and
start answering requests again.  However, after a number of days of
operation (these are also recursive servers) the server will suddenly start
to take longer to come back from a reconfig until it starts to take longer
than 10 seconds to come back with causes my anycast system to withdraw
routes to the servers.

It appears that all of the time is spent in dumping and reloading the
entire red/black tree to check it for consistency with the new
configuration.  What I'm curious about is no apparent linear relation between
the cache size and the length of time that it takes for the reconfig to
finish.  If anything, it appears to be dependent on the length of time the
server has been in operation. 

For example - 4 of my servers are currently running caches ~180mb and
returning from a reconfig in ~0.6 seconds.  Whereas 2 other servers with
just over 200mb of cache are taking ~12.9 seconds.  And another, with 260mb
of cache is currently taking 1.48 seconds.

Has any one else seen this kind of behavior?  This, combined with the
apparent cache-cleaning CPU spinning 'bug' that a few other people have
seen this is making it hard to run bind9 in high availability environments.


-Kelsey



More information about the bind-users mailing list