Bind 9 / Bind 8 / NOTIFY updates and system load

Mark_Andrews at isc.org Mark_Andrews at isc.org
Tue Dec 23 01:05:14 UTC 2003


> Bind Users:
> 
> I am having several problems with my Bind Infrastructure lately and I want
> some advice:
> 
> 2 major issues in summary: 
> 
> 	1: memory grows to exceed system available, when I put some memory
> size limits on the process crashed when it reached the size instead of
> clearing out memory.
> 
> 	2: I have about 9000 domains, and have a hierarchical setup, and
> NOTIFY updates are taking sometimes 2-3 hours to be honored by slaves.
> 
> 
> Details:
> 	Platform: Solaris 8 and 9.
> 	Software: Resolvers for internal services ( Mail servers ) Bind 9
> (latest)
> 		Master xfer host is running Bind 9 latest.
> 		Resolvers for external customer ( Dialup ) Bind 8 ( want to
> move to 9 )
> 	Setup: All zone data is in a database and is extracted to zone files
> on a xfer master box, all this Bind 9 box does is send Notifies that a zone
> has changed, and it serves up the files to the slaves. This xfer box does
> not allow recursive, and is strictly for updating the slaves. 
> 
> 
> 	Problem 1: I have tried tuning down the datasize to 400m, but
> eventually the server crashes.
> 			Can someone give me a breakdown on recommended
> settings for a Bind 9 
> 			server where there are on average several hundred q
> per s, as in a nameserver
> 			for an Email Cluster? If I remove the 400m
> requirement the server will remain stable for 
> 			about 2 weeks and then just start loosing domains.

	You are using the wrong control.  You should be using
	max-cache-size.  max-cache-size sets up a soft limit on the
	about of memory used by the cache.  When the memory exceeds
	7/8 of the limit the cache will start throwing away entries
	at random until the memory drops to 3/4 of the limit.

	Note this does not affect the adb cache in 9.2.  In 9.3 the
	adb cache is also controlled by this.

	You should also seperate the authoritative and caching roles.

> 	Problem 2: Slaves are taking almost 2 hours to actually do a
> transfer of a Notify, is this caused by load? Some config entry? I have the
> parallel number of axfr's allowed set to over 100, there are about 12 slave
> servers all pulling primary zones from one master server. I see the NOTIFY
> go out, and then I watch for how long the slaves take to honor it. Sometimes
> it has taken until midnight, over 6-8 hours from when the change was made?
> Should it take this long.

	Turn off notify on the slave zones unless they are a master for another
	server.

	Notifies and refresh queries are rate limited via a single queue,
	serial-query-rate (default 10).  There are 108000 notifies sent out
	at startup with the configuration you have.
 
> I can provide more detail on the above two problems if you let me know
> specifically what you want to know.
> 
> 
> Thanks in advance..
> 
> 
> Shane Brath
> 
> 		
> 	
> 
> 
--
Mark Andrews, Internet Software Consortium
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark.Andrews at isc.org


More information about the bind-users mailing list