memory leak or other problem

JINMEI Tatuya / 神明達哉 jinmei at isl.rdc.toshiba.co.jp
Tue Oct 2 18:12:07 UTC 2007


At Mon, 01 Oct 2007 08:25:23 -0400,
David Filion <df at auto123.com> wrote:

> >> I have a strange problem with my bind servers where the process runs 
> >> great until the memory usage gets very high. I am running CentOS 4.5 and 
> >> the latest version of Bind for that distribution bind-9.2.4-24. On one 
> >> of my servers there is not enough RAM in the system and about every 24 
> >> hours I have to restart the bind server. After I restart everything runs 
> >> great until I get to about 80% memory usage.
> > 
> > We are seeing something similar on one our SLES 10 boxes 
> > (bind-9.3.2-17.15).  The server in question is a secondary DNS.  Every 
> > 40 minutes it gets reloaded to pick up any new zones.  named's memory 
> > foot print just keeps growing until it stops responding.
> 
> Any updates on this issue?  One of our DNS stopped responding and Bind 
> needed to be restarted to get it answering again.

I'd first suggest migrating to BIND 9.4 or later, because prior
versions rely on libc malloc (by default), which can perform very
poorly depending on the library implementation details.

If 9.4 still causes the same problem (I remember someone else in this
thread has reported that), I strongly suspect it's somehow related to
BIND9's heavyweight cache cleaning mechanism: it goes through the
entire cache database periodically and/or under over-memory situations
to see if there's any disposable cache entries.  I wrote a patch that
completely eliminates this overhead by cleaning stale entries with an
LRU-like purge policy and without causing any batch operations.

I found it worked pretty well, but since I myself have never seen the
inactivity problem like the ones reported in this thread, I'm not
100% sure if this really solves the problem.  I'm very much interested
in the effectiveness of this approach, so please let me know if anyone
of you are willing to test it with real query traffic that would cause
the reported problem.

					JINMEI, Tatuya
					Communication Platform Lab.
					Corporate R&D Center, Toshiba Corp.
					jinmei at isl.rdc.toshiba.co.jp



More information about the bind-users mailing list