IXFR journal dump making 9.2.4 server non-responsive

Thu Dec 16 21:11:21 UTC 2004

> In article <cpsk8r$22ej$1 at sf1.isc.org>, Derek D. wrote:
> > We subscribe to an e-mail DNS RBL that we zone transfer via IXFR and
> > have noticed what we believe to be a correlation of BIND stop answering
> > queries and the dumping of the journal file to disk.
> > 
> > The server is a Sun v120 with 2GB of RAM running Solaris 8 and Bind
> > 9.2.4.
> > 
> > I noticed the Bind 9 ARM mentions that the default time for dumping the
> > journal file to disk is 15 minutes, but we seem to be seeing it at
> > about 20 minutes.  For example the end of transfer log entry for the
> > zone is at 00:56:25 and all is well until 01:16:51 when log entries
> > stopped.  Then at 01:20:45 queries start getting logged again.  During
> > this outage the machine is running pretty close to 100% CPU and a truss
> > shows that a new zone file is being dumped to disk.  Normally the
> > machine is running with a load average of about 0.2.  A fresh start of
> > BIND takes about 8 to 10 minutes to load this zone plus the others that
> > is has.  The RBL zone file is about 102MB.
> 
> Presumably you mean the zone file is being dumped, rather than the
> journal - the journal is constantly updated as updates to the zone come
> in.
> 
> I've seen this problem before on a large zone slaved using IXFR. The
> problem appears to be that, 15 minutes after an update, BIND will write
> out the zone file. While it's doing this, the in-memory copy is locked,
> which prevents access to it. Any thread which attempts to read this
> copy will block until it becomes unlocked. In doing so, the thread is
> prevented from doing any other work (normally, the zone file would
> be written out and unlocked in a few milliseconds, so this wouldn't
> be an issue).
> 
> If sufficient queries are made against the zone in question, all the
> threads on your server will be taken up waiting for the zone to finish
> writing, and you'll stop responding to all queries.
> 
> > Does the above make any sense?
> > Would a dual CPU box help this?
> 
> Not really. You'll be able to have more threads, but if your server
> is busy enough, they'll still all eventually block. Disk IO speed is
> probably the real limiting factor.
> 
> > Any ideas or suggestions?
> 
> Increase the number of threads (beware of overloading the server if it's
> busy, though), remove the "file" directive from the zone config (if you
> can live with having to refetch the entire zone every time you start
> the nameserver), or put the file into a memory filesystem, syncing it to
> disk every 15 minutes or so, and putting it back after a reboot.
> 
> None of these are ideal solutions. I wish I could tell you how I solved
> the problem when I saw it, but I ended up not having to slave the
> huge zone, so the issue went away.
> 
> Brian
> -- 
>    *  *   * *  **       *  * ** ** *   *
>    *  ** *      *      ** *   *  *    *
>  *    *        *     *  *             *

	Upgrade to 9.3.0 which has incremental master file dumping.
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: Mark_Andrews at isc.org