[bind10-dev] b10-auth stops responding for several tens of seconds after receiving IXFR

JINMEI Tatuya / 神明達哉 jinmei at isc.org
Tue Jan 8 07:09:01 UTC 2013


At Tue, 08 Jan 2013 15:06:35 +0900,
Yoshitaka Aharen <aharen at jprs.co.jp> wrote:

> > - check if this also happens when the zone is reloaded from a file
> >   (not from the SQLite3 DB)
> No, it didn't for MasterFiles.

Okay, then I guess the zone builder thread of b10-auth somehow blocks
the other thread within the sqlite3 library.  One thing I can think of
this is: http://bind10.isc.org/ticket/1756
The builder thread needs to iterate over the new version of the zone
using a zone iterator, but due to the problem described in #1756, it
takes very long time for a large zone.  Depending on what's happening
in the iteration SQL query and on thread scheduling details, the other
thread might be blocked while the builder thread is busy working in
sqlite3.

So, one thing it might be worth trying is to manually add the indexes
proposed in #1756:
sqlite> CREATE INDEX records_byrname_and_type ON records (rname, rdtype);

without this, this SQL query should take very long time:
> SELECT rdtype, ttl, sigtype, rdata, name FROM records WHERE zone_id = XX ORDER BY rname, rdtype;
(where "XX" is the zone ID value of the jp zone in your experiment),
while it will be reasonably fast after adding the indexes.

I'd be interested in whether it can also affect the query handling
during the reload period.

> > BTW, what exactly do you mean by "About 1 minute (after)"?  You first
> > said it took 4 minutes to finish the update.  Did this "1 minute"
> > follow those 4 minutes, or was that the final 1 minute of those 4
> > minutes?  Did b10-auth respond to queries *while* reloading (with the
> > older version of zone)?
> I meant the latter, and no, it doesn't respond for about 30 seconds.
> 
> First, b10-xfrin retrieves IXFR and updates SQLite3 DB (it takes a few
> minutes). Then b10-auth stops responding to queries for about 30 seconds.
> Then b10-auth resumes to respond with older version of zone. About 1
> minute after b10-auth resumes to respond, it answers with new version of
> zone. Totally it takes about 4 minutes to finish the update.

If I understand it correctly, it looks quite strange in some points.
First, if the IXFR only consists of a small number of updates it, the
update phase isn't expected to take a few minutes.  And, if it really
takes such a long period, I suspect b10-auth was actually returning
SERVFAIL for some period due to the recently reported issue:
http://bind10.isc.org/ticket/2609
even if it didn't stop responding.

Did this also happen, or b10-auth responded with correct answers while
xfrin was handling IXFR?  If you can share log messages for these
events it may also help.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.


More information about the bind10-dev mailing list