dnsperf and BIND memory consumption

ivan jr sy ivan_jr at yahoo.com
Wed Nov 26 18:34:59 UTC 2008


Hi all,

I know this is a an old thread, but I wish to resurrect this in hopes to find answers.. 

9.5 + threads on FreeBSD 7 is better performance wise, but there is this problem.

9.4 + threads on FreeBSD 7 is almost 50% of the performance, but there is no issues like this. 9.5 without threads doesnt have this issue but same in performance. 

more data below... its basically the same as Vinny's but im stressing out that 9.5 with threads has a good performance.

hoping there's some shed of light as to where to get a patch for this issue.

Thanks!
- Ivan


system:
FreeBSD 7.0 RELEASE AMD64
Server is a Dell SC1435 with 4 CPU's, no Hyperthreading, 2GB of RAM and a 150GB RAID1
Dnsperf run from a different server on the same network segment over Gig-E

1. FreeBSD 7-RELEASE+BIND 9.4.2-P2 = 34,000 QPS, 94MB mem

2. FreeBSD 7-RELEASE+BIND 9.5.0-P2 threaded = 82,000 QPS, 1.5GIG mem! (and it wont stop until the test script ends, and does not go back to its original state)

3. FreeBSD 7-RELEASE+BIND 9.5.0-P2 non-threaded = 34,000 QPS, 95MB mem

FIRST TEST
# pkg_info | grep bind
bind94-base-9.4.2.2 The BIND DNS suite with updated DNSSEC and threads
# named -v
BIND 9.4.2-P2
# ldd /usr/sbin/named
/usr/sbin/named:
        libcrypto.so.5 => /lib/libcrypto.so.5 (0x8007a9000)
        libthr.so.3 => /lib/libthr.so.3 (0x800a3b000)
        libc.so.7 => /lib/libc.so.7 (0x800b51000)

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
13677 bind        7 100    0 93704K 77912K select 1   6:13 194.43% named

Notes:
1. regardless how many times the script was used, memory consumption
remained the same..

2. a few seconds after the script was terminated... the CPU normalize..
  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
13677 bind        7  98    0 93704K 77912K select 3   7:57  0.00% named

SECOND TEST
# pkg_info | grep bind
bind95-base-9.5.0.2 The BIND DNS suite with updated DNSSEC and threads
# named -v
BIND 9.5.0-P2
# ldd /usr/sbin/named
/usr/sbin/named:
        libcrypto.so.5 => /lib/libcrypto.so.5 (0x8007bf000)
        libxml2.so.5 => /usr/local/lib/libxml2.so.5 (0x800a51000)
        libz.so.4 => /lib/libz.so.4 (0x800c95000)
        libiconv.so.3 => /usr/local/lib/libiconv.so.3 (0x800da9000)
        libm.so.5 => /lib/libm.so.5 (0x800fa2000)
        libthr.so.3 => /lib/libthr.so.3 (0x8010bc000)
        libc.so.7 => /lib/libc.so.7 (0x8011d2000)

  PID USERNAME  THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
67304 bind        7  99    0  1524M  1509M select 1   2:10 200.54% named

Notes:
1. memory consumption of 1.5G after only running the script 26 times. thats 1.3 million authoritative queries.

2. the script was terminated and the memory consumption was still the same.

3RD TEST
(very similar to 1st test)


Hardware Details
CPU: Quad-Core AMD Opteron(tm) Processor 2350 (1995.01-MHz K8-class CPU)
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
cpu2: <ACPI CPU> on acpi0
cpu3: <ACPI CPU> on acpi0
SMP: AP CPU #2 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #1 Launched!
usable memory = 2133491712 (2034 MB)
avail memory  = 2058821632 (1963 MB)

# uname -a
FreeBSD jaljeb.infoweapons.com 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sun Feb 24 10:35:36 UTC 2008     root at driscoll.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC


FreeBSD 7.0 RELEASE AMD64
Server is a Dell SC1435 with 4 CPU's, Hyperthreading disabled, 2GB of RAM and a 150GB RAID1
Dnsperf run from a different server on the same network segment over Gig-E




--- On Fri, 8/8/08, Vinny Abello <vinny at tellurian.com> wrote:

> From: Vinny Abello <vinny at tellurian.com>
> Subject: RE: dnsperf and BIND memory consumption
> To: "JINMEI Tatuya / 神明達哉" <Jinmei_Tatuya at isc.org>
> Cc: "bind-users at isc.org" <bind-users at isc.org>
> Date: Friday, August 8, 2008, 2:33 AM
> > -----Original Message-----
> > From: bind-users-bounce at isc.org
> [mailto:bind-users-bounce at isc.org] On
> > Behalf Of JINMEI Tatuya / ????
> > Sent: Thursday, August 07, 2008 3:56 AM
> > To: Vinny Abello
> > Cc: bind-users at isc.org
> > Subject: Re: dnsperf and BIND memory consumption
> >
> > At Thu, 7 Aug 2008 00:58:23 -0400,
> > Vinny Abello <vinny at tellurian.com> wrote:
> >
> > > OK. I've recompiled BIND 9.5.0-P2 (from
> ports) without threads
> > > enabled. I no longer see the memory leak at all.
> I'm running dnsperf
> > > and I see a constant of 18MB which is much more
> reasonable for what
> > > I am doing. For me it's easy to reproduce.
> Some more information
> > > that may help reproduce it:
> >
> > > FreeBSD 7.0 STABLE AMD64 (cvsup'ed within the
> past week)
> > > BIND 9.5.0-P2 installed via ports with threads
> enabled
> > > Server is a Dell PowerEdge 2850 with 2 CPU's,
> Hyperthreading
> > disabled, 4GB of RAM and a 36GB RAID1 array on a Perc4
> controller (LSI
> > MegaRAID chipset)
> > > Dnsperf run from a different server on the same
> network segment over
> > Gig-E
> >
> > This looks quite similar to the one we heard before. 
> I suspect this
> > is due to some bad interaction between BIND9 and the
> FreeBSD's thread
> > library or its kernel, rather than application memory
> leak (in which
> > case you can confirm it by stopping named while its
> memory is growing
> > and seeing it crash).  Here is what I suggested at
> that time to
> > identify the memory eater (but unfortunately we
> couldn't get any
> > feedback on it at that time), could you try it?
> 
> Sure, I can give it a shot.
> 
> >
> =======================================================================
> > - create a symbolic link from
> "/etc/malloc.conf" to "X":
> >  # ln -s X /etc/malloc.conf
> 
> What exactly is this trying to accomplish here? JFYI, I
> don't have a file /etc/malloc.conf on my server. Did you
> mean /etc/make.conf? Where is X being referenced?
> 
> > - start named with a moderate limitation of virtual
> memory size, e.g.
> >  # /usr/bin/limits -v 384m $path_to_named/named
> <command line options>
> >
> > Then the named process will eventually abort itself
> with a core dump
> > due to malloc failure.  Please show us the stack trace
> at that point.
> > Hopefully it will reveal the malloc call that keeps
> consuming memory.
> 
> How would I show the trace that you require once this
> happens?
> 
> >
> > Notes:
> > - of course, this is a very radical way of diagnosing;
> you need to
> >  keep watching the process because it's
> "guaranteed" to be aborted.
> > - the VM size must be carefully chosen so that malloc
> failure won't
> >  happen due to normal named processing.  I think 384MB
> is reasonable
> >  enough according to the statistics you provided so
> far, but I'm not
> >  100% sure about that.
> > - it's better to keep my latest patch to adb.c and
> to run named with
> >  '-n 1' so that the mutex_init in adb.c
> won't trigger the malloc
> >  failure.
> > - the global symbolic link from /etc/make.conf affects
> other
> >  processes.  So, if you're running a different
> process than named
> >  that can consume a lot of memory or can cause malloc
> failure, we
> >  should find an alternative approach (there are some,
> but they are
> >  more complicated so let's discuss those only when
> they are really
> >  necessary).
> 
> Shouldn't be a problem here. Again, it's just being
> tested and this is the only thing the server is doing.
> 
> >
> =======================================================================
> >
> > BTW, you should be able to find the previous
> discussion on this matter
> > by searching the bind-users at isc.org list with the
> subject of
> > "max-cache-size doesn't work with
> 9.5.0b1".
> 
> I may have to go back and find this thread.
> 
> >
> > ---
> > JINMEI, Tatuya
> > Internet Systems Consortium, Inc.
> >
> > p.s. I'm pretty sure it's different from the
> 'memory leak' issue of
> > BIND9/Windows.  Let's forget it in this context.
> 
> Fair enough. I'll trust you on that.


      



More information about the bind-users mailing list