[bind10-dev] b10-auth stops responding for several tens of seconds after receiving IXFR
Yoshitaka Aharen
aharen at jprs.co.jp
Tue Jan 8 09:01:41 UTC 2013
Hello,
On Mon, 07 Jan 2013 23:09:01 -0800
JINMEI Tatuya / 神明達哉 <jinmei at isc.org> wrote:
> At Tue, 08 Jan 2013 15:06:35 +0900,
> Yoshitaka Aharen <aharen at jprs.co.jp> wrote:
>
> > > - check if this also happens when the zone is reloaded from a file
> > > (not from the SQLite3 DB)
> > No, it didn't for MasterFiles.
>
> Okay, then I guess the zone builder thread of b10-auth somehow blocks
> the other thread within the sqlite3 library. One thing I can think of
> this is: http://bind10.isc.org/ticket/1756
> The builder thread needs to iterate over the new version of the zone
> using a zone iterator, but due to the problem described in #1756, it
> takes very long time for a large zone. Depending on what's happening
> in the iteration SQL query and on thread scheduling details, the other
> thread might be blocked while the builder thread is busy working in
> sqlite3.
Thank you for your information.
> So, one thing it might be worth trying is to manually add the indexes
> proposed in #1756:
> sqlite> CREATE INDEX records_byrname_and_type ON records (rname, rdtype);
>
> without this, this SQL query should take very long time:
> > SELECT rdtype, ttl, sigtype, rdata, name FROM records WHERE zone_id = XX ORDER BY rname, rdtype;
> (where "XX" is the zone ID value of the jp zone in your experiment),
> while it will be reasonably fast after adding the indexes.
It takes 16 seconds with the index (sorry I forgot to measure it before
adding the index).
> I'd be interested in whether it can also affect the query handling
> during the reload period.
It works. With the index, b10-auth does not stop responding to queries
while reloading. Thank you for your suggestion.
However it slows down xfrin processing. It takes about 1.5 times longer
than without the index.
> > > BTW, what exactly do you mean by "About 1 minute (after)"? You first
> > > said it took 4 minutes to finish the update. Did this "1 minute"
> > > follow those 4 minutes, or was that the final 1 minute of those 4
> > > minutes? Did b10-auth respond to queries *while* reloading (with the
> > > older version of zone)?
> > I meant the latter, and no, it doesn't respond for about 30 seconds.
> >
> > First, b10-xfrin retrieves IXFR and updates SQLite3 DB (it takes a few
> > minutes). Then b10-auth stops responding to queries for about 30 seconds.
> > Then b10-auth resumes to respond with older version of zone. About 1
> > minute after b10-auth resumes to respond, it answers with new version of
> > zone. Totally it takes about 4 minutes to finish the update.
>
> If I understand it correctly, it looks quite strange in some points.
> First, if the IXFR only consists of a small number of updates it, the
> update phase isn't expected to take a few minutes. And, if it really
> takes such a long period, I suspect b10-auth was actually returning
> SERVFAIL for some period due to the recently reported issue:
> http://bind10.isc.org/ticket/2609
> even if it didn't stop responding.
>
> Did this also happen, or b10-auth responded with correct answers while
> xfrin was handling IXFR? If you can share log messages for these
> events it may also help.
No, it responds with correct answers while xfrin was handling IXFR, at
least for SOA queries. And I think the situation described in the ticket
does not match; we have enabled in-memory datasource. I understood
(re)loadzone can fail if it is executed while b10-xfrin is processing
IXFR.
Thanks,
--
Yoshitaka Aharen <aharen at jprs.co.jp>
Japan Registry Services Co., Ltd.
More information about the bind10-dev
mailing list