[bind10-dev] scalability issue with SQL(ite3) select with 'like'
João Damas
joao at bondis.org
Mon Mar 5 10:09:37 UTC 2012
On 5 Mar 2012, at 10:18, JINMEI Tatuya / 神明達哉 wrote:
> At Sat, 3 Mar 2012 13:59:25 +0100,
> Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:
>
>>> Is my understanding correct? If so, is there any easy way (easy = by
>>> not fundamentally changing the table organization) to get the result
>>> we want to get more efficiently? What we need to know is basically
>>> that when we know there is no row whose name is
>>> "nxdomain.example.com." whethere there is a row whose name is a pure
>>> subdomain of it (e.g. "a.nxdomain.example.com.").
>>
>> That is to find if the nxdomain.example.com is empty nonterminal,
>> right?
>
> Not only for that. We use it to ensure that a wildcard match is
> actually the best one:
>
> Zone has *.example.com.
> Query is a.b.example.com.
>
> We need to make sure there's no name on or under b.example.com. To do
> so we use LIKE as "rname LIKE com.example.b.%" (to be precise the
> current SQLite3 data source implementation doesn't actually do it, and
> the new version uses "name LIKE %.b.example.com."...but ignore these
> subtle gaps for simplicity).
>
> Anyway, I was actually wrong in saying with LIKE we cannot use
> indices. But the proposed change in ticket #324 still doesn't work
> for several (one stupid, other more subtle) reasons:
>
> - We specify "STRING" as the data type for "text-like" columns, but
> that's not a valid type name for SQLite3 (or SQL in general I
> suspect), and SQLite3 regard it as an "NUMERIC" affinity:
> http://www.sqlite.org/datatype3.html#affinity
> SQLite3 doesn't use indices for LIKE unless the corresponding column
> is of TEXT affinity:
> http://www.sqlite.org/optoverview.html
> (See Section 4.0 "The LIKE optimization")
you would probably be better off with varchar(255) for a domain name or varchar(63) for a label rather than generic TEXT
>
> This is the stupid error. We should have used TEXT instead of
> STRING from the beginning.
>
> - But this trivial change still doesn't work, because we specify an
> expression for the right-hand side of LIKE:
>
> const char* const q_count_str = "SELECT COUNT(*) FROM records "
> "WHERE zone_id=?1 AND rname LIKE (?2 || '%');";
>
> SQLite3 doesn't use indices for this query; it only uses indices
> "a string literal or a parameter bound to a string literal that does
> not begin with a wildcard character" (see again the above link).
right, it can't use indices fully if the % is anywhere but the end.
you may want to look at representing the names using their component labels rather than as a flat string (labels in the same zone, not talking about delegations as that would be a different zone)
> So, instead of concatenating the parameter and wildcard (%) in the
> template string, we should use a single parameter
>
> const char* const q_count_str = "SELECT COUNT(*) FROM records "
> "WHERE zone_id=?1 AND rname LIKE ?2;";
>
> and have the caller append '%' to the string passed to
> sqlite3_bind_text().
>
> This is the subtle one.
>
> But queries for a non existent name was still slow in my experiment,
> and I figured out that bottleneck too. It was this query:
>
> const char* const q_previous_str = "SELECT name FROM records "
> "WHERE zone_id=?1 AND rdtype = 'NSEC' AND "
> "rname < $2 ORDER BY rname DESC LIMIT 1";
why store retype as a string instead of their integer value. Integer comparison is several orders of magnitude faster, with or without indices for the strings.
>
> Because (I suspect) the result of "rname < $2" can be big and there's
> no index on "rdtype", so SQLite3 needs to perform an expensive scan
> for the large search space. (If the comparison for name or rname is
> an equality check it's reasonably fast - I guess this is because
> SQLite3 uses the expensive scan only for the index search result, and
> in the equality case the scan range is quite limited in practice).
>
> This can be solved by adding this index:
> CREATE INDEX records_bytype_and_rname ON records (rdtype, rname)
> Note that the order of rdtype and rname is important. Since the
> comparison for rname is an inequality check, the corresponding column
> in the index must be placed last (See section 1.0 of
> http://www.sqlite.org/optoverview.html). Also, simply adding an index
> on "rdtype" doesn't really work well, because even though the indices
> are (seemingly) used for searches themselves, SQLite3 creates a
> temporary B-tree for the "ORDER BY" part. So, while it's fast if the
> zone is unsigned (because the tree would be empty), it will be quite
> slow for a signed zone, for which this query is actually needed.
>
> I've made these updates to the trac324new branch and pushed it to the
> public repo (I've already started another task, so I'll release the
> ownership for now). If someone wants to complete this task, please
> do.
>
> Some additional notes:
>
> - This query in sqlite3_accessor.cc (the new implementation) doesn't
> seem to be good for the same reason(s):
> "SELECT rdtype, ttl, sigtype, rdata " // ANY_SUB
> "FROM records WHERE zone_id=?1 AND name LIKE (\"%.\" || ?2)",
> it uses an expression, and it places the wildcard first ('%').
> We'll have to fix that, too.
> - The bottleneck for "find previous" could be avoided for unsigned
> zones. We should probably first identify whether we need to know
> the previous name before (i.e., whether the zone is NSEC signed)
> actually trying to find it.
> - I also noticed this query is slow:
> "SELECT rdtype, ttl, sigtype, rdata, name FROM records " // ITERATE
> "WHERE zone_id = ?1 ORDER BY rname, rdtype",
> Because it builds a temporary B-tree for "ORDER BY". Like the
> "previous" case, we could solve this by adding an index on (rname,
> rdtype) in this order. But in this case I wonder we could rather
> omit the "order by rdtype" part. Adding another compound index
> makes the DB file even bigger and would make update slower. Unless
> we need this ordering by the API contract it's probably better to be
> less strict.
>
> I plan to make tickets for (some of) these points.
>
> ---
> JINMEI, Tatuya
> Internet Systems Consortium, Inc.
> _______________________________________________
> bind10-dev mailing list
> bind10-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind10-dev
More information about the bind10-dev
mailing list