[bind10-dev] scalability issue with SQL(ite3) select with 'like'

Mon Mar 5 18:52:45 UTC 2012

At Mon, 5 Mar 2012 10:55:22 +0100,
Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:

> > const char* const q_count_str = "SELECT COUNT(*) FROM records "
> >     "WHERE zone_id=?1 AND rname LIKE ?2;";
> > 
> >   and have the caller append '%' to the string passed to
> >   sqlite3_bind_text().
> 
> Won't the sqlite library do too much escaping and „deactivate“ our %? Probably
> not, but just checking.

Apparently not (but that's a good question) as the
"findRRsetEmptyNode" test fails if we don't append "%" here:
    const string revname_text = name.reverse().toText() + "%";
while all tests pass if we do.

> > - I also noticed this query is slow:
> >     "SELECT rdtype, ttl, sigtype, rdata, name FROM records " // ITERATE
> >         "WHERE zone_id = ?1 ORDER BY rname, rdtype",
> >   Because it builds a temporary B-tree for "ORDER BY".  Like the
> >   "previous" case, we could solve this by adding an index on (rname,
> >   rdtype) in this order.  But in this case I wonder we could rather
> >   omit the "order by rdtype" part.  Adding another compound index
> >   makes the DB file even bigger and would make update slower.  Unless
> >   we need this ordering by the API contract it's probably better to be
> >   less strict.
> 
> Well, this is used to make sure RRs are aggregated to RRsets correctly, they
> need to be consecutive. I think we don't need anything else than that single
> RRset „sticks together“ in the output of the query and we probably could do it
> in a different way. But I don't have a quick idea how.

Hmm, okay.  Realistically, (if I understand it correctly) we're
currently using the iterator result only for xfrout, in which case we
don't even need the concept of "RRset".  We might want to use it for
dumping a zone content into a textual zone file in future, but it also
doesn't require the RRset concept, although it would be less readable
if RRs of the same same RRset are not listed as a group.  So I guess
we could make a compromise here in practice.

One possible approach to ensure the 'name then type' ordering would be
to use numeric value for types, not texts.  Then maybe we can get the
result efficiently without a separate index or even if we need an
index it may be much more lightweight.

Another note: in terms of the concept of RRsets, just ordering by the
RR type is not sufficient because for RRSIGs we should also consider
the covered type (but again, for the expected purposes it shouldn't be
critical).

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.