[bind10-dev] scalability issue with SQL(ite3) select with 'like'

Mon Mar 5 19:10:02 UTC 2012

At Mon, 5 Mar 2012 11:09:37 +0100,
João Damas <joao at bondis.org> wrote:

> you would probably be better off with varchar(255) for a domain name or varchar(63) for a label rather than generic TEXT

Perhaps, but we cannot simply assume the max length is 255 (or 63)
because it's a textual representation and could contain something like
'\DDD'

> > - But this trivial change still doesn't work, because we specify an
> >  expression for the right-hand side of LIKE:
> > 
> > const char* const q_count_str = "SELECT COUNT(*) FROM records "
> >    "WHERE zone_id=?1 AND rname LIKE (?2 || '%');";
> > 
> >  SQLite3 doesn't use indices for this query; it only uses indices
> >  "a string literal or a parameter bound to a string literal that does
> >  not begin with a wildcard character" (see again the above link).
> 
> right, it can't use indices fully if the % is anywhere but the end.
> 
> you may want to look at representing the names using their component labels rather than as a flat string (labels in the same zone, not talking about delegations as that would be a different zone)

You mean like storing each label in a separate column?  That's indeed
one of the possible approaches I mentioned (I think) in my original
message of this thread.  It would also help handle other tricky cases
like a label containing a 'dot'.

> > But queries for a non existent name was still slow in my experiment,
> > and I figured out that bottleneck too.  It was this query:
> > 
> > const char* const q_previous_str = "SELECT name FROM records "
> >    "WHERE zone_id=?1 AND rdtype = 'NSEC' AND "
> >    "rname < $2 ORDER BY rname DESC LIMIT 1";
> 
> why store retype as a string instead of their integer value. Integer comparison is several orders of magnitude faster, with or without indices for the strings.

Good question, and I don't know the answer:-) I've actually wondered
whether it may make sense to use numeric representation for RR types
and classes for several reasons including the search performance and
eliminating ambiguities (such as TYPE1 and A).  I guess one reason for
the choice of text (although may not be very intentional) was for
readability for humans, especially when we allow admins to modify the
database by hand, not via the BIND10 APIs.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.