bind-9.10.0-P2 memory leak?

lconrad at go2france.com lconrad at go2france.com
Tue Sep 23 08:15:43 UTC 2014





On Monday 08/09/2014 at 9:05 pm, lconrad at go2france.com wrote:
>
>
>
>
>
>
> On Tuesday 09/09/2014 at 9:22 am, Mike Hoskins (michoski)  wrote:
>> Do you guys have max-cache-size set?  I didn't see it in the 
>> borderworlds
>> named.conf.  I've seen similar growth problems when testing 9.x before
>> setting that (experiment at the time just to see what would happen, 
>> and
>> confirmed this behavior).  Set sensible resource limits based on 
>> available
>> resources.
>>
>> -----Original Message-----
>> From: Vinícius Ferrão <ferrao at if.ufrj.br>
>> Date: Tuesday, September 9, 2014 at 10:17 AM
>> To: Thomas Schulz <schulz at adi.com>
>> Cc: "bind-users at isc.org" <bind-users at isc.org>
>> Subject: Re: bind-9.10.0-P2 memory leak?
>>
>>>
>>> I'm having the exactly same issue. Take a look at my post 
>>> @ServerFault:
>>> http://serverfault.com/questions/616752/bind-9-10-constantly-killed-on-fre
>>> ebsd-10-0-with-out-of-swap-space
>>>
>>> Sent from my iPhone
>>>
>>> On 09/09/2014, at 11:15, "Thomas Schulz" <schulz at adi.com> wrote:
>>>
>>>>
>>>>>
>>>>> Hello
>>>>>
>>>>> I recently upgraded my authoritative nameservers to bind-9.10.0-P2 and
>>>>> after a while one of them ended up using all its swap and the named
>>>>> process got killed. The other servers are seeing similar behaviour,
>>>>> but
>>>>> I restarted named on all of them to postpone further crashes.
>>>>>
>>>>> I am using rate-limiting as well DLZ with PostgreSQL. The server has
>>>>> two
>>>>> views. The operating system is FreeBSD 8.4.
>>>>>
>>>>> My configuration:
>>>>> http://borderworlds.dk/~xi/named-leak/named.conf
>>>>>
>>>>> Log of the memory usage:
>>>>> http://borderworlds.dk/~xi/named-leak/named-mem-usage.log
>>>>>
>>>>> As you can see, in less than a week, named has grown more than 900MB
>>>>> in
>>>>> size.
>>>>>
>>>>> Is anyone else experiencing something similar?
>>>>>
>>>>> If I need to provide more information, I will be happy to do so.
>>>>>
>>>>> --
>>>>> Christian Laursen
>>>>
>>>> What version did you upgrade from? I am seeing bind 9.9.5 and 9.9.6
>>>> grow without any evidence that it will ever stop. See my mail to this
>>>> list with the subject "Re: Process size versus cache size." Mine is
>>>> growing slower than yours, but it is now up to 548 MB.
>>>>
>>>> Tom Schulz
>>>> Applied Dynamics Intl.
>>>> schulz at adi.com
>
> freebsd 10.0, bind-9.10.0-p2
>
> logging the rss field for named process:
>
>
> less /var/tmp/bind_rss_history.txt
>
> 2014-09-06  17:03:34     338224
> 2014-09-06  18:00:00     395828
> 2014-09-06  19:00:00     444008
> 2014-09-06  20:00:00     487236
> 2014-09-06  21:00:00     525892
> 2014-09-06  22:00:00     567940
> 2014-09-06  23:00:00     611120
> 2014-09-07  00:00:00     644772
> 2014-09-07  01:00:00     674904
> 2014-09-07  02:00:00     700492
> 2014-09-07  03:00:00     726364
> 2014-09-07  04:00:00     748328
> 2014-09-07  05:00:00     774316
> 2014-09-07  06:00:00     799064
> 2014-09-07  07:00:00     827808
> 2014-09-07  08:00:00     867444
> 2014-09-07  09:00:00     917444
> 2014-09-07  10:00:00     972268
> 2014-09-07  11:00:00    1029304
> 2014-09-06  17:03:34     338224
> 2014-09-06  18:00:00     395828
> 2014-09-06  19:00:00     444008
> 2014-09-06  20:00:00     487236
> 2014-09-06  21:00:00     525892
> 2014-09-06  22:00:00     567940
> 2014-09-06  23:00:00     611120
> 2014-09-07  00:00:00     644772
> 2014-09-07  01:00:00     674904
> 2014-09-07  02:00:00     700492
> 2014-09-07  03:00:00     726364
> 2014-09-07  04:00:00     748328
> 2014-09-07  05:00:00     774316
> 2014-09-07  06:00:00     799064
> 2014-09-07  07:00:00     827808
> 2014-09-07  08:00:00     867444
> 2014-09-07  09:00:00     917444
> 2014-09-07  10:00:00     972268
> 2014-09-07  11:00:00    1029304
> 2014-09-06  17:03:34     338224
> 2014-09-06  18:00:00     395828
> 2014-09-06  19:00:00     444008
> 2014-09-06  20:00:00     487236
> 2014-09-06  21:00:00     525892
> 2014-09-06  22:00:00     567940
> 2014-09-06  23:00:00     611120
> 2014-09-07  00:00:00     644772
> 2014-09-07  01:00:00     674904
> 2014-09-07  02:00:00     700492
> 2014-09-07  03:00:00     726364
> 2014-09-07  04:00:00     748328
> 2014-09-07  05:00:00     774316
> 2014-09-07  06:00:00     799064
> 2014-09-07  07:00:00     827808
> 2014-09-07  08:00:00     867444
> 2014-09-07  09:00:00     917444
> 2014-09-07  10:00:00     972268
> 2014-09-07  11:00:00    1029304
> 2014-09-07  12:00:00    1088408
> 2014-09-07  13:00:00    1142456
> 2014-09-07  14:00:00    1184344
> 2014-09-07  15:00:00    1226052
> 2014-09-07  16:00:00    1267760
> 2014-09-07  17:00:00    1309736
> 2014-09-07  18:00:00    1347532
> 2014-09-07  19:00:00    1383300
> 2014-09-07  20:00:00    1418932
> 2014-09-07  21:00:00    1459112
> 2014-09-07  22:00:00    1506108
> 2014-09-07  23:00:00    1544512
> 2014-09-08  00:00:00    1576344
> 2014-09-08  01:00:00    1600988
> 2014-09-08  02:00:00    1623128
> 2014-09-08  03:00:00    1644520
> 2014-09-08  04:00:00    1665716
> 2014-09-08  05:00:00    1688844
> 2014-09-08  06:00:00    1713836
> 2014-09-08  07:00:00    1748720
> 2014-09-08  08:00:00     240072
> 2014-09-08  09:00:00     371388
> 2014-09-08  10:00:00     456952
> 2014-09-08  11:00:00     530696
> 2014-09-08  12:00:00     599792
> 2014-09-08  13:00:00     666280
> 2014-09-08  14:00:00     727884
> 2014-09-08  15:00:00     789672
> 2014-09-08  16:00:00     853456
> 2014-09-08  17:00:00     916520
> 2014-09-08  18:00:00     967940
> 2014-09-08  19:00:00    1011616
> 2014-09-08  20:00:00    1051452
> 2014-09-08  21:00:00    1095352
> 2014-09-08  22:00:00    1146388
> 2014-09-08  23:00:00    1198776
> 2014-09-09  00:00:00    1241256
> 2014-09-09  01:00:00    1279640
> 2014-09-09  02:00:00    1312936
> 2014-09-09  03:00:00    1342592
> 2014-09-09  04:00:00    1372092
> 2014-09-09  05:00:00    1407444
> 2014-09-09  06:00:00    1441632
> 2014-09-09  07:00:00    1483464
>
> This never happened with earlier BIND9, and our mx1 uses this 
> recursive BIND machine for all domain/ptr  lookups
>
> I've never seen any bind take over 1GB of RAM.
>
> max-cache-size isn't the solution, only a band-aid
>
> the sawtooth above is from restarting named.
>
> named has halted twice in the past couple weeks, we suspected some 
> kind of attack, the only trace we had was in syslog with something 
> like "swap space failed, named halted", but with a dedicated DNS box 
> and 3 GB, there should never be any swapping.  I set a watcher for 
> "swap used > 1%".  Got an alert, I saw the named rss to be 1.9GB.  
> restarted bind and wrote the rss named logging script.
>
> Len
I added

max-cache-size  512m;

... did rndc reconfig, but after 12+ hours


 ps auxw | egrep named

USER      PID %CPU %MEM    VSZ    RSS TT  STAT STARTED        TIME 
COMMAND

bind    48153 12.9 27.0 869128 843444  -  Rs    3:34PM   111:42.29 
/usr/local/sbin/named -t /var/named -u bind -c /usr/local/etc/na

here is the log of bind sizes per hour:


2014-09-22  08:00:00    2313544
2014-09-22  09:00:00    2364360
2014-09-22  10:00:00    2417516
2014-09-22  11:00:00    2473336
2014-09-22  12:00:00    2525620
2014-09-22  13:00:00    2574624
2014-09-22  14:00:00    2625256
2014-09-22  15:00:00    2665212   < got a MONIT alert that mem swap 
size > 1%

2014-09-22  16:00:00     144168   <<<  reconfig with max-cache-size  
512m;
2014-09-22  17:00:00     229640
2014-09-22  18:00:00     292020
2014-09-22  19:00:00     340384
2014-09-22  20:00:00     382100
2014-09-22  21:00:00     432468
2014-09-22  22:00:00     475600
2014-09-22  23:00:00     511724
2014-09-23  00:00:00     546976
2014-09-23  01:00:00     574872
2014-09-23  02:00:00     599428
2014-09-23  03:00:00     621684
2014-09-23  04:00:00     645568
2014-09-23  05:00:00     672608
2014-09-23  06:00:00     702096
2014-09-23  07:00:00     741240
2014-09-23  08:00:00     789264


named -v
BIND 9.10.0-P2


uname -a
FreeBSD 10.0-RELEASE-p7

Len
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20140923/289b0137/attachment.html>


More information about the bind-users mailing list