[bind10-dev] scalability issue with SQL(ite3) select with 'like'

Mon Mar 5 10:09:37 UTC 2012

On 5 Mar 2012, at 10:18, JINMEI Tatuya / 神明達哉 wrote:

> At Sat, 3 Mar 2012 13:59:25 +0100,
> Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:
> 
>>> Is my understanding correct?  If so, is there any easy way (easy = by
>>> not fundamentally changing the table organization) to get the result
>>> we want to get more efficiently?  What we need to know is basically
>>> that when we know there is no row whose name is
>>> "nxdomain.example.com." whethere there is a row whose name is a pure
>>> subdomain of it (e.g. "a.nxdomain.example.com.").
>> 
>> That is to find if the nxdomain.example.com is empty nonterminal,
>> right?
> 
> Not only for that.  We use it to ensure that a wildcard match is
> actually the best one:
> 
> Zone has *.example.com.
> Query is a.b.example.com.
> 
> We need to make sure there's no name on or under b.example.com.  To do
> so we use LIKE as  "rname LIKE com.example.b.%" (to be precise the
> current SQLite3 data source implementation doesn't actually do it, and
> the new version uses "name LIKE %.b.example.com."...but ignore these
> subtle gaps for simplicity).
> 
> Anyway, I was actually wrong in saying with LIKE we cannot use
> indices.  But the proposed change in ticket #324 still doesn't work
> for several (one stupid, other more subtle) reasons:
> 
> - We specify "STRING" as the data type for "text-like" columns, but
>  that's not a valid type name for SQLite3 (or SQL in general I
>  suspect), and SQLite3 regard it as an "NUMERIC" affinity:
>  http://www.sqlite.org/datatype3.html#affinity
>  SQLite3 doesn't use indices for LIKE unless the corresponding column
>  is of TEXT affinity:
>  http://www.sqlite.org/optoverview.html
>  (See Section 4.0 "The LIKE optimization")

you would probably be better off with varchar(255) for a domain name or varchar(63) for a label rather than generic TEXT

> 
>  This is the stupid error.  We should have used TEXT instead of
>  STRING from the beginning.
> 
> - But this trivial change still doesn't work, because we specify an
>  expression for the right-hand side of LIKE:
> 
> const char* const q_count_str = "SELECT COUNT(*) FROM records "
>    "WHERE zone_id=?1 AND rname LIKE (?2 || '%');";
> 
>  SQLite3 doesn't use indices for this query; it only uses indices
>  "a string literal or a parameter bound to a string literal that does
>  not begin with a wildcard character" (see again the above link).

right, it can't use indices fully if the % is anywhere but the end.

you may want to look at representing the names using their component labels rather than as a flat string (labels in the same zone, not talking about delegations as that would be a different zone)

>  So, instead of concatenating the parameter and wildcard (%) in the
>  template string, we should use a single parameter
> 
> const char* const q_count_str = "SELECT COUNT(*) FROM records "
>    "WHERE zone_id=?1 AND rname LIKE ?2;";
> 
>  and have the caller append '%' to the string passed to
>  sqlite3_bind_text().
> 
>  This is the subtle one.
> 
> But queries for a non existent name was still slow in my experiment,
> and I figured out that bottleneck too.  It was this query:
> 
> const char* const q_previous_str = "SELECT name FROM records "
>    "WHERE zone_id=?1 AND rdtype = 'NSEC' AND "
>    "rname < $2 ORDER BY rname DESC LIMIT 1";

why store retype as a string instead of their integer value. Integer comparison is several orders of magnitude faster, with or without indices for the strings.

> 
> Because (I suspect) the result of "rname < $2" can be big and there's
> no index on "rdtype", so SQLite3 needs to perform an expensive scan
> for the large search space.  (If the comparison for name or rname is
> an equality check it's reasonably fast - I guess this is because
> SQLite3 uses the expensive scan only for the index search result, and
> in the equality case the scan range is quite limited in practice).
> 
> This can be solved by adding this index:
> CREATE INDEX records_bytype_and_rname ON records (rdtype, rname)
> Note that the order of rdtype and rname is important.  Since the
> comparison for rname is an inequality check, the corresponding column
> in the index must be placed last (See section 1.0 of
> http://www.sqlite.org/optoverview.html).  Also, simply adding an index
> on "rdtype" doesn't really work well, because even though the indices
> are (seemingly) used for searches themselves, SQLite3 creates a
> temporary B-tree for the "ORDER BY" part.  So, while it's fast if the
> zone is unsigned (because the tree would be empty), it will be quite
> slow for a signed zone, for which this query is actually needed.
> 
> I've made these updates to the trac324new branch and pushed it to the
> public repo (I've already started another task, so I'll release the
> ownership for now).  If someone wants to complete this task, please
> do.
> 
> Some additional notes:
> 
> - This query in sqlite3_accessor.cc (the new implementation) doesn't
>  seem to be good for the same reason(s):
>    "SELECT rdtype, ttl, sigtype, rdata " // ANY_SUB
>        "FROM records WHERE zone_id=?1 AND name LIKE (\"%.\" || ?2)",
>  it uses an expression, and it places the wildcard first ('%').
>  We'll have to fix that, too.
> - The bottleneck for "find previous" could be avoided for unsigned
>  zones.  We should probably first identify whether we need to know
>  the previous name before (i.e., whether the zone is NSEC signed)
>  actually trying to find it.
> - I also noticed this query is slow:
>    "SELECT rdtype, ttl, sigtype, rdata, name FROM records " // ITERATE
>        "WHERE zone_id = ?1 ORDER BY rname, rdtype",
>  Because it builds a temporary B-tree for "ORDER BY".  Like the
>  "previous" case, we could solve this by adding an index on (rname,
>  rdtype) in this order.  But in this case I wonder we could rather
>  omit the "order by rdtype" part.  Adding another compound index
>  makes the DB file even bigger and would make update slower.  Unless
>  we need this ordering by the API contract it's probably better to be
>  less strict.
> 
> I plan to make tickets for (some of) these points.
> 
> ---
> JINMEI, Tatuya
> Internet Systems Consortium, Inc.
> _______________________________________________
> bind10-dev mailing list
> bind10-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind10-dev