Bind 9.3.0 on FreeBSD 5.3 exits with signal 11

Mark Andrews Mark_Andrews at isc.org
Wed Jan 19 21:44:57 UTC 2005


> Hi,
> 
> I've got a problem with Bind9.3.0 under the following circumstances:
> 
> - Server: Sun Fire V65x, 2 x Xeon 2,8Ghz (Running with Hyperthreading), 2 GB=
>    
>   Ram
> - OS: FreeBSD 5.3-STABLE, interface configured with two IPs
> - 2 completly seperated instances of Bind 9.3.0 compiled from source,=
>  running 
>   with uid bind and each listening on a different ip
> - I've compiled bind with the following options:
>   --prefix=3D/usr/local/bind9/bind9-res1 \
>   --exec-prefix=3D/usr/local/bind9/bind9-res1 \
>   --with-openssl=3Dno
> 
> The Server is one of our caching-only-nameservers (serves only 3 zones and=
>  does 
> recursion otherwise) for our customers (in use for just about 3 days) and=
>  each 
> instance of bind processes about 500-1000 queries/s.
> Everything works fine so far, but there is one major problem: 
> The instances of named exit with signal 11 and coredump approximately every=
>  
> 14-18 hours always at times when the cache of each named has grown about 550=
>  mb 
> and the queries are at their peak (max. 1000 queries/s). What I've done so=
>  far 
> is to limit the max-cache-size to 500 mb and modify 
> recursive-clients/tcp-clients but without any effort. 
> Here are the relevant parts of the options-statement:

	You are hitting FreeBSD's 512 MB datasize limit and memory
	requests are failing.  This is then exercising error paths
	which unfortunately have bugs on them.  Stack backtraces
	from the core dumps would be useful as it would give us
	a chance to remove the bugs (bind9-bugs at isc.org).

	To raise the datasize limit above 512 MB you need to tune the
	kernel.  You will also need to specify datasize in options
	or have raised the per/process limit before starting named.
	Note: you don't want to raise the datasize above the amount
	of real memory in the machine.  Named does not perform well
	if it needs to page.

	Note max-cache-size only affects the memory used for the cache
	not for other things.  

> ++++++++++++++++++++++++++++++++++++++++++
> options {
>         listen-on { xxx.xxx.xxx.xxx; };
>         listen-on-v6 { none; };
>         directory "/usr/local/bind9/bind9-res1";
>         pid-file "named.pid";
>         allow-query { known_hosts; !any; };
>         allow-transfer { trusted; !any; };
>         blackhole { block; };
>         notify no;
>         recursive-clients 5000;
>         tcp-clients 500;
>         querylog no;
>         max-cache-size 500M;
> };
> ++++++++++++++++++++++++++++++++++++++++++
> 
> Another thing that I've encountered with this version of bind is that if the=
>  
> cache has grown approximately every hour each instance of bind comsumes=
>  around 
> 95% CPU for a short period (estimated 1-2 minutes) which produces a high=
>  load on 
> the server. During this time named answers only a limited number of queries=
>  
> (around 100-200 queries/s), the others probably will timeout.

	This is the cache cleaning removing stale records.
 
> Has anybody an idea to solve this issue? Any help or hints would be=
>  appreciated.
> 
> Kind Regards,
> 
> Dirk
> 
> 
> 
> 
> 
> 
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark_Andrews at isc.org



More information about the bind-users mailing list