Memory utilisation problem on busy bind resolver
cswiger at mac.com
Tue Aug 9 15:09:29 UTC 2011
On Aug 9, 2011, at 7:31 AM, Dennis Perisa wrote:
> We are running a number of BIND 9.7.3-p3 caching nameservers. In the
> last couple of months, we've observed the memory utilisation of named
> increasing at a steady rate of 1-2% per day on our busiest resolver
> with no indication of subsiding - on occasion, there have been large
> step increases of 1 GB or so.
Yeah, I've seen similar things on machines used to perform DNS resolution of busy webserver logfiles-- seemed like BIND-9.4 (.4.ESV.4) was ignoring max-cache-size setting entirely, but BIND-9.6.x seemed to do OK. I wonder if there's a regression with BIND-9.7.x?
> All our other resolvers are configured identically but are behaving
> themselves with memory utilisation remaining at fairly constant
> I've looked at all the named logs until my eyeballs have almost fallen
> out of my head but am unable to determine the cause of this. So I'm
> taking a step back and hoping to get some advice - what else can I do
> to find the cause of this? Or is it something I simply need to live
It could be anything from memory leaks in named or the system libraries like libc, to a bug in named not honoring the cache size settings. Does a cache flush actually help reduce VM usage of named in your case?
You haven't mentioned which platform you are using, but looking for leaks can involve anything from "env MALLOC_OPTIONS='U' ktrace named" for many BSD flavors, "leaks named" for OS X, mtrace() for GNU libc, to recompiling named and the libraries using Valgrind/Purify/etc.
Or, one can also use gdb attach to a running named and try to see the current state of the cache and so forth; someone from ISC who is more familiar with the exact code there can probably give more specific debugging hints.
> We're looking at measures such as periodic cache flushes, and tuning
> max-cache-size, max-cache-ttl and max-cache-nttl params to limit
> memory usage, but this may only be treating the symptom and costs us
> extra cpu cycles.
I hear this-- in my prior case, tuning max-cache-size, recursive-clients, etc didn't make any difference...
More information about the bind-users