[bind10-dev] b10-auth stops responding for several tens of seconds after receiving IXFR
JINMEI Tatuya / 神明達哉
jinmei at isc.org
Wed Jan 9 03:29:50 UTC 2013
At Tue, 08 Jan 2013 18:01:41 +0900,
Yoshitaka Aharen <aharen at jprs.co.jp> wrote:
> > without this, this SQL query should take very long time:
> > > SELECT rdtype, ttl, sigtype, rdata, name FROM records WHERE zone_id = XX ORDER BY rname, rdtype;
> > (where "XX" is the zone ID value of the jp zone in your experiment),
> > while it will be reasonably fast after adding the indexes.
> It takes 16 seconds with the index (sorry I forgot to measure it before
> adding the index).
>
> > I'd be interested in whether it can also affect the query handling
> > during the reload period.
> It works. With the index, b10-auth does not stop responding to queries
> while reloading. Thank you for your suggestion.
> However it slows down xfrin processing. It takes about 1.5 times longer
> than without the index.
I'd first suggest checking if it's really IXFR (i.e., it doesn't fall
back to AXFR, e.g., because xfrin cannot find the given). In any
case, we need to discuss the acceptable scalability level with
SQLite3, and the conclusion is probably just to support other DB
backends for higher scalability. Until then, I think we need to work
this around with the additional indexes at the cost of increased size
of DB file and longer update time (but, again, it's surprising if it
really matters for IXFR with a small number of updates).
Another thing: I think I figured out where the block happens. The
builder thread needs to acquire a lock shared with the other auth
thread in the preparation of a zone reload to avoid a tricky race
condition. The assumption here is the preparation is not time
consuming, but it involves creating a data source iterator with
sending an SQL query, and in this SQLite3 case, the query handling is
really time consuming as we now know. We should probably delay the
first query so that we can guarantee the iterator creation is not a
"blocking" operation (whether it's the restrictive SQLite3 or other
server-based DB systems). If we do this, I guess this particular
problem of yours will be resolved (although the iteration query is
still heavy so it'd still take unnecessarily long). If you're
interested, a quick hack workaround is the patch copied below. If my
guess is correct, it will also solve the blocking problem even without
the additional indexes (of course, this patch is a quick hack and is
not correct in that it simply removes the necessary lock. but it
should be safe in your setup where auth only refers to the SQLite3
data source for loading, not for serving).
---
JINMEI, Tatuya
Internet Systems Consortium, Inc.
diff --git a/src/bin/auth/datasrc_clients_mgr.h b/src/bin/auth/datasrc_clients_mgr.h
index 5bbdb99..05df8b0 100644
--- a/src/bin/auth/datasrc_clients_mgr.h
+++ b/src/bin/auth/datasrc_clients_mgr.h
@@ -622,7 +622,9 @@ DataSrcClientsBuilderBase<MutexType, CondVarType>::getZoneWriter(
// source for lookup. So we need to protect the access here.
datasrc::ConfigurableClientList::ZoneWriterPair writerpair;
{
+#if 0 // experimentally disabled
typename MutexType::Locker locker(*map_mutex_);
+#endif
writerpair = client_list.getCachedZoneWriter(origin);
}
More information about the bind10-dev
mailing list