bind-9.10.0-P2 memory leak?
lconrad at go2france.com
lconrad at go2france.com
Tue Sep 23 08:15:43 UTC 2014
On Monday 08/09/2014 at 9:05 pm, lconrad at go2france.com wrote:
>
>
>
>
>
>
> On Tuesday 09/09/2014 at 9:22 am, Mike Hoskins (michoski) wrote:
>> Do you guys have max-cache-size set? I didn't see it in the
>> borderworlds
>> named.conf. I've seen similar growth problems when testing 9.x before
>> setting that (experiment at the time just to see what would happen,
>> and
>> confirmed this behavior). Set sensible resource limits based on
>> available
>> resources.
>>
>> -----Original Message-----
>> From: Vinícius Ferrão <ferrao at if.ufrj.br>
>> Date: Tuesday, September 9, 2014 at 10:17 AM
>> To: Thomas Schulz <schulz at adi.com>
>> Cc: "bind-users at isc.org" <bind-users at isc.org>
>> Subject: Re: bind-9.10.0-P2 memory leak?
>>
>>>
>>> I'm having the exactly same issue. Take a look at my post
>>> @ServerFault:
>>> http://serverfault.com/questions/616752/bind-9-10-constantly-killed-on-fre
>>> ebsd-10-0-with-out-of-swap-space
>>>
>>> Sent from my iPhone
>>>
>>> On 09/09/2014, at 11:15, "Thomas Schulz" <schulz at adi.com> wrote:
>>>
>>>>
>>>>>
>>>>> Hello
>>>>>
>>>>> I recently upgraded my authoritative nameservers to bind-9.10.0-P2 and
>>>>> after a while one of them ended up using all its swap and the named
>>>>> process got killed. The other servers are seeing similar behaviour,
>>>>> but
>>>>> I restarted named on all of them to postpone further crashes.
>>>>>
>>>>> I am using rate-limiting as well DLZ with PostgreSQL. The server has
>>>>> two
>>>>> views. The operating system is FreeBSD 8.4.
>>>>>
>>>>> My configuration:
>>>>> http://borderworlds.dk/~xi/named-leak/named.conf
>>>>>
>>>>> Log of the memory usage:
>>>>> http://borderworlds.dk/~xi/named-leak/named-mem-usage.log
>>>>>
>>>>> As you can see, in less than a week, named has grown more than 900MB
>>>>> in
>>>>> size.
>>>>>
>>>>> Is anyone else experiencing something similar?
>>>>>
>>>>> If I need to provide more information, I will be happy to do so.
>>>>>
>>>>> --
>>>>> Christian Laursen
>>>>
>>>> What version did you upgrade from? I am seeing bind 9.9.5 and 9.9.6
>>>> grow without any evidence that it will ever stop. See my mail to this
>>>> list with the subject "Re: Process size versus cache size." Mine is
>>>> growing slower than yours, but it is now up to 548 MB.
>>>>
>>>> Tom Schulz
>>>> Applied Dynamics Intl.
>>>> schulz at adi.com
>
> freebsd 10.0, bind-9.10.0-p2
>
> logging the rss field for named process:
>
>
> less /var/tmp/bind_rss_history.txt
>
> 2014-09-06 17:03:34 338224
> 2014-09-06 18:00:00 395828
> 2014-09-06 19:00:00 444008
> 2014-09-06 20:00:00 487236
> 2014-09-06 21:00:00 525892
> 2014-09-06 22:00:00 567940
> 2014-09-06 23:00:00 611120
> 2014-09-07 00:00:00 644772
> 2014-09-07 01:00:00 674904
> 2014-09-07 02:00:00 700492
> 2014-09-07 03:00:00 726364
> 2014-09-07 04:00:00 748328
> 2014-09-07 05:00:00 774316
> 2014-09-07 06:00:00 799064
> 2014-09-07 07:00:00 827808
> 2014-09-07 08:00:00 867444
> 2014-09-07 09:00:00 917444
> 2014-09-07 10:00:00 972268
> 2014-09-07 11:00:00 1029304
> 2014-09-06 17:03:34 338224
> 2014-09-06 18:00:00 395828
> 2014-09-06 19:00:00 444008
> 2014-09-06 20:00:00 487236
> 2014-09-06 21:00:00 525892
> 2014-09-06 22:00:00 567940
> 2014-09-06 23:00:00 611120
> 2014-09-07 00:00:00 644772
> 2014-09-07 01:00:00 674904
> 2014-09-07 02:00:00 700492
> 2014-09-07 03:00:00 726364
> 2014-09-07 04:00:00 748328
> 2014-09-07 05:00:00 774316
> 2014-09-07 06:00:00 799064
> 2014-09-07 07:00:00 827808
> 2014-09-07 08:00:00 867444
> 2014-09-07 09:00:00 917444
> 2014-09-07 10:00:00 972268
> 2014-09-07 11:00:00 1029304
> 2014-09-06 17:03:34 338224
> 2014-09-06 18:00:00 395828
> 2014-09-06 19:00:00 444008
> 2014-09-06 20:00:00 487236
> 2014-09-06 21:00:00 525892
> 2014-09-06 22:00:00 567940
> 2014-09-06 23:00:00 611120
> 2014-09-07 00:00:00 644772
> 2014-09-07 01:00:00 674904
> 2014-09-07 02:00:00 700492
> 2014-09-07 03:00:00 726364
> 2014-09-07 04:00:00 748328
> 2014-09-07 05:00:00 774316
> 2014-09-07 06:00:00 799064
> 2014-09-07 07:00:00 827808
> 2014-09-07 08:00:00 867444
> 2014-09-07 09:00:00 917444
> 2014-09-07 10:00:00 972268
> 2014-09-07 11:00:00 1029304
> 2014-09-07 12:00:00 1088408
> 2014-09-07 13:00:00 1142456
> 2014-09-07 14:00:00 1184344
> 2014-09-07 15:00:00 1226052
> 2014-09-07 16:00:00 1267760
> 2014-09-07 17:00:00 1309736
> 2014-09-07 18:00:00 1347532
> 2014-09-07 19:00:00 1383300
> 2014-09-07 20:00:00 1418932
> 2014-09-07 21:00:00 1459112
> 2014-09-07 22:00:00 1506108
> 2014-09-07 23:00:00 1544512
> 2014-09-08 00:00:00 1576344
> 2014-09-08 01:00:00 1600988
> 2014-09-08 02:00:00 1623128
> 2014-09-08 03:00:00 1644520
> 2014-09-08 04:00:00 1665716
> 2014-09-08 05:00:00 1688844
> 2014-09-08 06:00:00 1713836
> 2014-09-08 07:00:00 1748720
> 2014-09-08 08:00:00 240072
> 2014-09-08 09:00:00 371388
> 2014-09-08 10:00:00 456952
> 2014-09-08 11:00:00 530696
> 2014-09-08 12:00:00 599792
> 2014-09-08 13:00:00 666280
> 2014-09-08 14:00:00 727884
> 2014-09-08 15:00:00 789672
> 2014-09-08 16:00:00 853456
> 2014-09-08 17:00:00 916520
> 2014-09-08 18:00:00 967940
> 2014-09-08 19:00:00 1011616
> 2014-09-08 20:00:00 1051452
> 2014-09-08 21:00:00 1095352
> 2014-09-08 22:00:00 1146388
> 2014-09-08 23:00:00 1198776
> 2014-09-09 00:00:00 1241256
> 2014-09-09 01:00:00 1279640
> 2014-09-09 02:00:00 1312936
> 2014-09-09 03:00:00 1342592
> 2014-09-09 04:00:00 1372092
> 2014-09-09 05:00:00 1407444
> 2014-09-09 06:00:00 1441632
> 2014-09-09 07:00:00 1483464
>
> This never happened with earlier BIND9, and our mx1 uses this
> recursive BIND machine for all domain/ptr lookups
>
> I've never seen any bind take over 1GB of RAM.
>
> max-cache-size isn't the solution, only a band-aid
>
> the sawtooth above is from restarting named.
>
> named has halted twice in the past couple weeks, we suspected some
> kind of attack, the only trace we had was in syslog with something
> like "swap space failed, named halted", but with a dedicated DNS box
> and 3 GB, there should never be any swapping. I set a watcher for
> "swap used > 1%". Got an alert, I saw the named rss to be 1.9GB.
> restarted bind and wrote the rss named logging script.
>
> Len
I added
max-cache-size 512m;
... did rndc reconfig, but after 12+ hours
ps auxw | egrep named
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME
COMMAND
bind 48153 12.9 27.0 869128 843444 - Rs 3:34PM 111:42.29
/usr/local/sbin/named -t /var/named -u bind -c /usr/local/etc/na
here is the log of bind sizes per hour:
2014-09-22 08:00:00 2313544
2014-09-22 09:00:00 2364360
2014-09-22 10:00:00 2417516
2014-09-22 11:00:00 2473336
2014-09-22 12:00:00 2525620
2014-09-22 13:00:00 2574624
2014-09-22 14:00:00 2625256
2014-09-22 15:00:00 2665212 < got a MONIT alert that mem swap
size > 1%
2014-09-22 16:00:00 144168 <<< reconfig with max-cache-size
512m;
2014-09-22 17:00:00 229640
2014-09-22 18:00:00 292020
2014-09-22 19:00:00 340384
2014-09-22 20:00:00 382100
2014-09-22 21:00:00 432468
2014-09-22 22:00:00 475600
2014-09-22 23:00:00 511724
2014-09-23 00:00:00 546976
2014-09-23 01:00:00 574872
2014-09-23 02:00:00 599428
2014-09-23 03:00:00 621684
2014-09-23 04:00:00 645568
2014-09-23 05:00:00 672608
2014-09-23 06:00:00 702096
2014-09-23 07:00:00 741240
2014-09-23 08:00:00 789264
named -v
BIND 9.10.0-P2
uname -a
FreeBSD 10.0-RELEASE-p7
Len
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20140923/289b0137/attachment.html>
More information about the bind-users
mailing list