memory management troubles/rndc flush hangs bind

Cathy Almond cathya at isc.org
Tue Jun 8 08:51:46 UTC 2010


Hi Stas,

I've raised a bug ticket (#21479) with your report below.  In general,
for a problem like this, if it doesn't already appear in bind-users with
any explanation, then send email to bind9-bugs to report the problem.

9.5 introduced LRU cache - this is most likely why you are seeing a
difference between 9.4 and 9.6.

We'll be in touch via the bug ticket report.

Kind regards,

Cathy

Stas Pirogov wrote:
> Hello,
> 
> first let me apologize for the length of this message.
> I will try to be as short as I can.
> 
> Today we have around 20 servers running bind 9.4 and 9.6 (latest versions)
> on CentOS 5.x (between 5.2 and 5.5) with 2.6 64bit kernel.
> 
> Our servers have around 35000 zones with overall of 250M of disk space 
> used for them.
> 
> On load bind takes around 900M of memory.
> 
> For bind 9.4 we used 1000M max-cache which allowed us having named grow
> to up to 2.3G of resourses in memory.
> 
> Since bind 9.6 (I didn't try this on 9.5) we have trouble managing amount
> of memory that bind will use. Even having max-cache of default 2M will 
> eventually bring named to more than 3G of resources and at this point 
> strange things begin to happen:
> 
> 1. With non-multithreaded bind the 'rndc flush' (which we run once a day) 
> will crash bind and produce following log entry:
> 
> 05-Jun-2010 05:10:03.684 general: info: received control channel command 'flush'
> 05-Jun-2010 05:10:03.684 general: critical: cache.c:978: fatal error:
> 05-Jun-2010 05:10:03.684 general: critical: RUNTIME_CHECK(((*((&cache->cleaner.lock)))++ == 0 ? 0 : 34) == 0) failed
> 05-Jun-2010 05:10:03.684 general: critical: exiting (due to fatal error in library)
> 
> This is from bind 9.7.0-P2. The cache.c line 978 contains:
> 
> LOCK(&cache->cleaner.lock);
> 
> 2. With threaded bind the 'rndc flush' will create situation at which the 
> named is still running, but there's no service.
> 
> Here are some outputs of such hanging process from bind 9.6.2-P1:
> 
> ps auxww:
> 
> root      2248 25.3 74.8 3153312 3029568 ?     Ssl  May17 7918:25 /usr/local/sbin/named -4 -n 2
> root     15281  0.0  0.0  39292  1456 ?        Ssl  05:09   0:00 /usr/local/sbin/rndc flush
> 
> pstack:
> 
> Thread 5 (Thread 0x41206940 (LWP 2249)):
> #0  0x000000377fc0aee9 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x0000000000560c2a in run ()
> #2  0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0
> #3  0x000000377f4d3d1d in clone () from /lib64/libc.so.6
> Thread 4 (Thread 0x41c07940 (LWP 2250)):
> #0  0x000000377fc0d4c4 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x000000377fc08e1a in _L_lock_1034 () from /lib64/libpthread.so.0
> #2  0x000000377fc08cdc in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00000000004564d7 in water ()
> #4  0x0000000000554820 in isc__mem_get ()
> #5  0x0000000000493a8b in createiterator ()
> #6  0x000000000045633a in dns_cache_flush ()
> #7  0x000000000050698d in dns_view_flushcache ()
> #8  0x000000000041e1bf in ns_server_flushcache ()
> #9  0x000000000040b720 in ns_control_docommand ()
> #10 0x000000000040e718 in control_recvmessage ()
> #11 0x0000000000560d9c in run ()
> #12 0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0
> #13 0x000000377f4d3d1d in clone () from /lib64/libc.so.6
> Thread 3 (Thread 0x42711940 (LWP 2251)):
> #0  0x000000377fc0b150 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x0000000000573c00 in isc_condition_waituntil ()
> #2  0x0000000000562df9 in run ()
> #3  0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0
> #4  0x000000377f4d3d1d in clone () from /lib64/libc.so.6
> Thread 2 (Thread 0x43112940 (LWP 2252)):
> #0  0x000000377f4d4108 in epoll_wait () from /lib64/libc.so.6
> #1  0x0000000000570b8d in watcher ()
> #2  0x000000377fc0673d in start_thread () from /lib64/libpthread.so.0
> #3  0x000000377f4d3d1d in clone () from /lib64/libc.so.6
> Thread 1 (Thread 0x2ae3ff041530 (LWP 2248)):
> #0  0x000000377f4307bf in sigsuspend () from /lib64/libc.so.6
> #1  0x000000000056426e in isc_app_run ()
> #2  0x00000000004124eb in main ()
> 
> pmap:
> 
> 2248:   /usr/local/sbin/named -4 -n 2
> Address           Kbytes     RSS   Dirty Mode   Mapping
> 0000000000400000    1860    1428       0 r-x--  named
> 00000000007d0000      56      48      28 rw---  named
> 00000000007de000       8       8       8 rw---    [ anon ]
> 0000000012b40000  750300  750084  750084 rw---    [ anon ]
> 0000000040806000       4       0       0 -----    [ anon ]
> 0000000040807000   10240      36      36 rw---    [ anon ]
> 0000000041207000       4       0       0 -----    [ anon ]
> 0000000041208000   10240      36      36 rw---    [ anon ]
> 0000000041d11000       4       0       0 -----    [ anon ]
> 0000000041d12000   10240       8       8 rw---    [ anon ]
> 0000000042712000       4       0       0 -----    [ anon ]
> 0000000042713000   10240       8       8 rw---    [ anon ]
> 000000377f000000     112      48       0 r-x--  ld-2.5.so
> 000000377f21b000       4       4       4 r----  ld-2.5.so
> 000000377f21c000       4       4       4 rw---  ld-2.5.so
> 000000377f400000    1336     432       0 r-x--  libc-2.5.so
> 000000377f54e000    2044       0       0 -----  libc-2.5.so
> 000000377f74d000      16      16       8 r----  libc-2.5.so
> 000000377f751000       4       4       4 rw---  libc-2.5.so
> 000000377f752000      20      16      16 rw---    [ anon ]
> 000000377f800000       8       0       0 r-x--  libdl-2.5.so
> 000000377f802000    2048       0       0 -----  libdl-2.5.so
> 000000377fa02000       4       4       4 r----  libdl-2.5.so
> 000000377fa03000       4       4       4 rw---  libdl-2.5.so
> 000000377fc00000      88      64       0 r-x--  libpthread-2.5.so
> 000000377fc16000    2044       0       0 -----  libpthread-2.5.so
> 000000377fe15000       4       4       4 r----  libpthread-2.5.so
> 000000377fe16000       4       4       4 rw---  libpthread-2.5.so
> 000000377fe17000      16       4       4 rw---    [ anon ]
> 0000003780000000     520       8       0 r-x--  libm-2.5.so
> 0000003780082000    2044       0       0 -----  libm-2.5.so
> 0000003780281000       4       4       4 r----  libm-2.5.so
> 0000003780282000       4       4       4 rw---  libm-2.5.so
> 0000003780400000      80       4       0 r-x--  libz.so.1.2.3
> 0000003780414000    2044       0       0 -----  libz.so.1.2.3
> 0000003780613000       4       4       4 rw---  libz.so.1.2.3
> 0000003781c00000    1228      12       0 r-x--  libxml2.so.2.6.26
> 0000003781d33000    2048       0       0 -----  libxml2.so.2.6.26
> 0000003781f33000      36      20      16 rw---  libxml2.so.2.6.26
> 0000003781f3c000       4       0       0 rw---    [ anon ]
> 0000003782c00000      12       4       0 r-x--  libcap.so.1.10
> 0000003782c03000    2048       0       0 -----  libcap.so.1.10
> 0000003782e03000       4       4       4 rw---  libcap.so.1.10
> 00002aaaaaacc000     188     188     188 rw---    [ anon ]
> 00002aaaaaafc000   85540   85540   85540 rw---    [ anon ]
> 00002aaaafe86000   18460   18444   18444 rw---    [ anon ]
> 00002aaab10f6000     264     260     260 rw---    [ anon ]
> 00002aaab11a1000     260     260     260 rw---    [ anon ]
> 00002aaab11e3000   11440   11440   11440 rw---    [ anon ]
> 00002aaab1d10000    3120    3112    3112 rw---    [ anon ]
> 00002aaab201d000   13260   13232   13232 rw---    [ anon ]
> 00002aaab2d11000    6760    6744    6744 rw---    [ anon ]
> 00002aaab33ac000    5200    5196    5196 rw---    [ anon ]
> 00002aaab38c1000    3380    3372    3372 rw---    [ anon ]
> 00002aaab3c0f000    5200    5156    5156 rw---    [ anon ]
> 00002aaab4124000   13260   13184   13184 rw---    [ anon ]
> 00002aaab4e18000     520     520     520 rw---    [ anon ]
> 00002aaab4e9b000   23660   23656   23656 rw---    [ anon ]
> 00002aaab65b7000    7280    7276    7276 rw---    [ anon ]
> 00002aaab6cd4000     780     780     780 rw---    [ anon ]
> 00002aaab6d98000    7280    7276    7276 rw---    [ anon ]
> 00002aaab74b5000    2860    2852    2852 rw---    [ anon ]
> 00002aaab7781000    1820    1820    1820 rw---    [ anon ]
> 00002aaab7949000    8320    8320    8320 rw---    [ anon ]
> 00002aaab816a000   22880   22868   22868 rw---    [ anon ]
> 00002aaab97c3000    1820    1820    1820 rw---    [ anon ]
> 00002aaab998b000   10660   10656   10656 rw---    [ anon ]
> 00002aaaba3f5000   10400   10400   10400 rw---    [ anon ]
> 00002aaabae1e000   23660   23644   23644 rw---    [ anon ]
> 00002aaabc53a000    1040    1036    1036 rw---    [ anon ]
> 00002aaabc63f000     780     776     776 rw---    [ anon ]
> 00002aaabc703000     780     776     776 rw---    [ anon ]
> 00002aaabc7c7000    1560    1556    1556 rw---    [ anon ]
> 00002aaabc94e000    2600    2584    2584 rw---    [ anon ]
> 00002aaabcbd9000    1040    1032    1032 rw---    [ anon ]
> 00002aaabccde000    1560    1556    1556 rw---    [ anon ]
> 00002aaabce65000     780     776     776 rw---    [ anon ]
> 00002aaabcf29000     780     776     776 rw---    [ anon ]
> 00002aaabcfed000     780     776     776 rw---    [ anon ]
> 00002aaabd0b1000     520     516     516 rw---    [ anon ]
> 00002aaabd134000     780     772     772 rw---    [ anon ]
> 00002aaabd1f8000     520     520     520 rw---    [ anon ]
> 00002aaabd27b000    1560    1560    1560 rw---    [ anon ]
> 00002aaabd402000     780     780     780 rw---    [ anon ]
> 00002aaabd4c6000    1040    1040    1040 rw---    [ anon ]
> 00002aaabd5cb000     780     776     776 rw---    [ anon ]
> 00002aaabd68f000    1820    1816    1816 rw---    [ anon ]
> 00002aaabd857000     780     780     780 rw---    [ anon ]
> 00002aaabd91b000     780     776     776 rw---    [ anon ]
> 00002aaabd9df000    1040    1036    1036 rw---    [ anon ]
> 00002aaabdae4000    1040    1032    1032 rw---    [ anon ]
> 00002aaabdbe9000    1300    1300    1300 rw---    [ anon ]
> 00002aaabdd2f000     780     776     776 rw---    [ anon ]
> 00002aaabddf3000     520     520     520 rw---    [ anon ]
> 00002aaabde76000    1820    1812    1812 rw---    [ anon ]
> 00002aaabe03e000     780     776     776 rw---    [ anon ]
> 00002aaabe102000    1300    1292    1292 rw---    [ anon ]
> 00002aaabe248000     520     520     520 rw---    [ anon ]
> 00002aaabe2cb000    1300    1300    1300 rw---    [ anon ]
> 00002aaabe411000     780     780     780 rw---    [ anon ]
> 00002aaabe4d5000    1300    1296    1296 rw---    [ anon ]
> 00002aaabe61b000     780     776     776 rw---    [ anon ]
> 00002aaabe6df000    1560    1560    1560 rw---    [ anon ]
> 00002aaabe866000     520     520     520 rw---    [ anon ]
> 00002aaabe8e9000     520     520     520 rw---    [ anon ]
> 00002aaabe96c000     520     516     516 rw---    [ anon ]
> 00002aaabe9ef000     780     780     780 rw---    [ anon ]
> 00002aaabeab3000     780     776     776 rw---    [ anon ]
> 00002aaabeb77000    1560    1556    1556 rw---    [ anon ]
> 00002aaabecfe000     520     520     520 rw---    [ anon ]
> 00002aaabed81000     520     520     520 rw---    [ anon ]
> 00002aaabee04000    1300    1288    1288 rw---    [ anon ]
> 00002aaabef4a000    1040    1036    1036 rw---    [ anon ]
> 00002aaabf04f000    1040    1040    1040 rw---    [ anon ]
> 00002aaabf154000    2340    2328    2328 rw---    [ anon ]
> 00002aaabf39e000    2860    2852    2852 rw---    [ anon ]
> 00002aaabf66a000    1820    1808    1808 rw---    [ anon ]
> 00002aaabf832000     520     520     520 rw---    [ anon ]
> 00002aaabf8b5000    1820    1816    1816 rw---    [ anon ]
> 00002aaabfa7d000    1040    1036    1036 rw---    [ anon ]
> 00002aaabfb82000    1820    1820    1820 rw---    [ anon ]
> 00002aaabfd4a000     780     780     780 rw---    [ anon ]
> 00002aaabfe0e000    1820    1808    1808 rw---    [ anon ]
> 00002aaabffd6000    2080    2060    2060 rw---    [ anon ]
> 00002aaac01df000     520     520     520 rw---    [ anon ]
> 00002aaac0262000     520     520     520 rw---    [ anon ]
> 00002aaac02e5000    1300    1292    1292 rw---    [ anon ]
> 00002aaac042b000     520     516     516 rw---    [ anon ]
> 00002aaac04ae000     780     776     776 rw---    [ anon ]
> 00002aaac0572000     520     520     520 rw---    [ anon ]
> 00002aaac05f5000    1300    1296    1296 rw---    [ anon ]
> 00002aaac073b000    1040    1036    1036 rw---    [ anon ]
> 00002aaac0840000     780     772     772 rw---    [ anon ]
> 00002aaac0904000    1560    1556    1556 rw---    [ anon ]
> 00002aaac0a8b000     780     780     780 rw---    [ anon ]
> 00002aaac0b4f000    1300    1296    1296 rw---    [ anon ]
> 00002aaac0c95000    1560    1552    1552 rw---    [ anon ]
> 00002aaac0e1c000     520     516     516 rw---    [ anon ]
> 00002aaac0e9f000     780     780     780 rw---    [ anon ]
> 00002aaac0f63000    2080    2072    2072 rw---    [ anon ]
> 00002aaac116c000     780     776     776 rw---    [ anon ]
> 00002aaac1230000     520     516     516 rw---    [ anon ]
> 00002aaac12b3000    2340    2324    2324 rw---    [ anon ]
> 00002aaac14fd000     780     780     780 rw---    [ anon ]
> 00002aaac15c1000    1560    1560    1560 rw---    [ anon ]
> 00002aaac1748000     780     776     776 rw---    [ anon ]
> 00002aaac180c000    1820    1816    1816 rw---    [ anon ]
> 00002aaac19d4000    1820    1812    1812 rw---    [ anon ]
> 00002aaac1b9c000    1040    1040    1040 rw---    [ anon ]
> 00002aaac1ca1000    2600    2600    2600 rw---    [ anon ]
> 00002aaac1f2c000   10660   10608   10608 rw---    [ anon ]
> 00002aaac29f1000   20280   20168   20168 rw---    [ anon ]
> 00002aaac3ed1000    1024    1024    1024 rw---    [ anon ]
> 00002aaac3fe3000   65056   65056   65056 rw---    [ anon ]
> 00002aaac8000000   65508   65340   65340 rw---    [ anon ]
> 00002aaacbff9000      28       0       0 -----    [ anon ]
> 00002aaacc000000   65480   65480   65480 rw---    [ anon ]
> 00002aaacfff2000      56       0       0 -----    [ anon ]
> 00002aaad0000000   63504   63504   63504 rw---    [ anon ]
> 00002aaad4000000   65332   65332   65332 rw---    [ anon ]
> 00002aaad7fcd000     204       0       0 -----    [ anon ]
> 00002aaad8000000   65356   65356   65356 rw---    [ anon ]
> 00002aaadbfd3000     180       0       0 -----    [ anon ]
> 00002aaadc000000   65420   65420   65420 rw---    [ anon ]
> 00002aaadffe3000     116       0       0 -----    [ anon ]
> 00002aaae0000000   61648   61648   61648 rw---    [ anon ]
> 00002aaae4000000   65500   65500   65500 rw---    [ anon ]
> 00002aaae7ff7000      36       0       0 -----    [ anon ]
> 00002aaae8000000   64428   64428   64428 rw---    [ anon ]
> 00002aaaebeeb000    1108       0       0 -----    [ anon ]
> 00002aaaec000000   64156   64156   64156 rw---    [ anon ]
> 00002aaaefea7000    1380       0       0 -----    [ anon ]
> 00002aaaf0000000   64896   64896   64896 rw---    [ anon ]
> 00002aaaf4000000   65340   64116   64116 rw---    [ anon ]
> 00002aaaf7fcf000     196       0       0 -----    [ anon ]
> 00002aaaf8000000   64840   64840   64840 rw---    [ anon ]
> 00002aaafc000000   64820   64820   64820 rw---    [ anon ]
> 00002aaafff4d000     716       0       0 -----    [ anon ]
> 00002aab00000000   64780   64780   64780 rw---    [ anon ]
> 00002aab04000000   65292   65292   65292 rw---    [ anon ]
> 00002aab07fc3000     244       0       0 -----    [ anon ]
> 00002aab08000000   65452   65452   65452 rw---    [ anon ]
> 00002aab0bfeb000      84       0       0 -----    [ anon ]
> 00002aab0c000000   65252   65252   65252 rw---    [ anon ]
> 00002aab0ffb9000     284       0       0 -----    [ anon ]
> 00002aab10000000   63060   62212   62212 rw---    [ anon ]
> 00002aab13d95000    2476       0       0 -----    [ anon ]
> 00002aab14000000   65488   64704   64704 rw---    [ anon ]
> 00002aab17ff4000      48       0       0 -----    [ anon ]
> 00002aab18000000  256296  256036  256036 rw---    [ anon ]
> 00002aab28000000   65372   65372   65372 rw---    [ anon ]
> 00002aab2bfd7000     164       0       0 -----    [ anon ]
> 00002aab2c000000   61408   61152   61152 rw---    [ anon ]
> 00002aab30000000   63468   63468   63468 rw---    [ anon ]
> 00002aab33dfb000    2068       0       0 -----    [ anon ]
> 00002aab34000000   47816   47540   47540 rw---    [ anon ]
> 00002aab38000000   33204   14516   14516 rw---    [ anon ]
> 00002aab3a06d000   32332       0       0 -----    [ anon ]
> 00002ae3ff02d000       4       4       4 rw---    [ anon ]
> 00002ae3ff03e000     276     276     276 rw---    [ anon ]
> 00007fff8fa4a000      84      20      20 rw---    [ stack ]
> ffffffffff600000    8192       0       0 -----    [ anon ]
> ----------------  ------  ------  ------
> total kB         3161504 3029568 3027536
> 
> strace -fp:
> 
> Process 2248 attached with 5 threads - interrupt to quit
> [pid  2252] epoll_wait(7,  <unfinished ...>
> [pid  2251] clock_gettime(CLOCK_REALTIME,  <unfinished ...>
> [pid  2250] futex(0x2aaab104c088, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
> [pid  2249] futex(0x2ae3ff047084, FUTEX_WAIT_PRIVATE, 4239917443, NULL <unfinished ...>
> [pid  2248] rt_sigsuspend([] <unfinished ...>
> [pid  2251] <... clock_gettime resumed> {1275976570, 97551000}) = 0
> [pid  2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569205, {0, 301251000}) = -1 ETIMEDOUT (Connection timed out)
> [pid  2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  2251] clock_gettime(CLOCK_REALTIME, {1275976570, 400051000}) = 0
> [pid  2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569207, {0, 252521000}) = -1 ETIMEDOUT (Connection timed out)
> [pid  2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  2251] clock_gettime(CLOCK_REALTIME, {1275976570, 654023000}) = 0
> [pid  2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569209, {0, 75751000}) = -1 ETIMEDOUT (Connection timed out)
> [pid  2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0
> [pid  2251] clock_gettime(CLOCK_REALTIME, {1275976570, 731031000}) = 0
> [pid  2251] futex(0x2ae3ff048074, FUTEX_WAIT_PRIVATE, 93569211, {0, 155742000}) = -1 ETIMEDOUT (Connection timed out)
> [pid  2251] futex(0x2ae3ff048020, FUTEX_WAKE_PRIVATE, 1) = 0
> 
>>From what I can understand the threads are hanging waiting for lock and 
> nothing happens afterwards.
> 
> Without running 'rndc flush' the bind will eventually reach 4G and crash 
> with some other error which I currently don't have.
> 
> Up to now we tried different max-cache settings and threaded/non-threaded
> compilations without much difference.
> 
> In all situations the named is 64-bit executable.
> 
> The problem never happens with bind 9.4.3-P5 that we run (nor with older 
> version of 9.4), so it seems that from 9.6 (maybe even 9.5) the memory 
> management changed. I also tried tests with 9.7.0-P1/P2 with same outcome.
> 
> Any help on the issue will be greatly appreciated. I'm open to any suggestions.
> 
> Thanks in advance.
> 
> Stas Pirogov
> 013 Netvision
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users




More information about the bind-users mailing list