[bind10-dev] scalability issue with SQL(ite3) select with 'like'

Tue Mar 13 14:42:01 UTC 2012

Jinmei,

Sorry for the late reply. I'd been meaning to give a detailed look at
this since you posted it, but haven't carved out the time yet.

On Monday, 2012-03-05 01:18:46 -0800, 
JINMEI Tatuya / 神明達哉 <jinmei at isc.org> wrote:

> - We specify "STRING" as the data type for "text-like" columns, but
>   that's not a valid type name for SQLite3 (or SQL in general I
>   suspect), and SQLite3 regard it as an "NUMERIC" affinity:
>   http://www.sqlite.org/datatype3.html#affinity
>   SQLite3 doesn't use indices for LIKE unless the corresponding column
>   is of TEXT affinity:
>   http://www.sqlite.org/optoverview.html
>   (See Section 4.0 "The LIKE optimization")
> 
>   This is the stupid error.  We should have used TEXT instead of
>   STRING from the beginning.

Actually this turns out not to be an error at all. SQLite treats STRING
declarations like TEXT declarations:

sqlite> create table foo (bar string);
sqlite> insert into foo values ("baz");
sqlite> select * from foo;
baz
sqlite> select typeof(bar) from foo;
text

> - But this trivial change still doesn't work, because we specify an
>   expression for the right-hand side of LIKE:
> 
> const char* const q_count_str = "SELECT COUNT(*) FROM records "
>     "WHERE zone_id=?1 AND rname LIKE (?2 || '%');";
> 
>   SQLite3 doesn't use indices for this query; it only uses indices
>   "a string literal or a parameter bound to a string literal that does
>   not begin with a wildcard character" (see again the above link).
>   So, instead of concatenating the parameter and wildcard (%) in the
>   template string, we should use a single parameter
> 
> const char* const q_count_str = "SELECT COUNT(*) FROM records "
>     "WHERE zone_id=?1 AND rname LIKE ?2;";
> 
>   and have the caller append '%' to the string passed to
>   sqlite3_bind_text().
> 
>   This is the subtle one.

I'm not 100% sure, but I think this is also not an issue. If I compare
the opcodes that SQLite produces I don't see a problem here:

sqlite> create index bar_idx on foo (bar);
sqlite> explain select * from foo where bar like 'xxx%';
0|Trace|0|0|0||00|
1|Goto|0|13|0||00|
2|OpenRead|0|2|0|1|00|
3|Rewind|0|11|0||00|
4|String8|0|2|0|xxx%|00|
5|Column|0|0|3||00|
6|Function|1|2|1|like(2)|02|
7|IfNot|1|10|1||00|
8|Column|0|0|4||00|
9|ResultRow|4|1|0||00|
10|Next|0|4|0||01|
11|Close|0|0|0||00|
12|Halt|0|0|0||00|
13|Transaction|0|0|0||00|
14|VerifyCookie|0|6|0||00|
15|TableLock|0|2|0|foo|00|
16|Goto|0|2|0||00|
sqlite> explain select * from foo where bar like 'xxx'||'%';
0|Trace|0|0|0||00|
1|String8|0|2|0|xxx|00|
2|String8|0|3|0|%|00|
3|Concat|3|2|1||00|
4|Goto|0|16|0||00|
5|OpenRead|0|2|0|1|00|
6|Rewind|0|14|0||00|
7|Copy|1|4|0||00|
8|Column|0|0|5||00|
9|Function|1|4|3|like(2)|02|
10|IfNot|3|13|1||00|
11|Column|0|0|6||00|
12|ResultRow|6|1|0||00|
13|Next|0|7|0||01|
14|Close|0|0|0||00|
15|Halt|0|0|0||00|
16|Transaction|0|0|0||00|
17|VerifyCookie|0|6|0||00|
18|TableLock|0|2|0|foo|00|
19|Goto|0|5|0||00|

It is possible that the parameter code within the SQLite C library
breaks this, but it seems like a strange anti-optimization to me. I
haven't timed things, but my theory is that this is a non-issue.

> But queries for a non existent name was still slow in my experiment,
> and I figured out that bottleneck too.  It was this query:
> 
> const char* const q_previous_str = "SELECT name FROM records "
>     "WHERE zone_id=?1 AND rdtype = 'NSEC' AND "
>     "rname < $2 ORDER BY rname DESC LIMIT 1";
> 
> Because (I suspect) the result of "rname < $2" can be big and there's
> no index on "rdtype", so SQLite3 needs to perform an expensive scan
> for the large search space.  (If the comparison for name or rname is
> an equality check it's reasonably fast - I guess this is because
> SQLite3 uses the expensive scan only for the index search result, and
> in the equality case the scan range is quite limited in practice).
> 
> This can be solved by adding this index:
> CREATE INDEX records_bytype_and_rname ON records (rdtype, rname)
> Note that the order of rdtype and rname is important.  Since the
> comparison for rname is an inequality check, the corresponding column
> in the index must be placed last (See section 1.0 of
> http://www.sqlite.org/optoverview.html).  Also, simply adding an index
> on "rdtype" doesn't really work well, because even though the indices
> are (seemingly) used for searches themselves, SQLite3 creates a
> temporary B-tree for the "ORDER BY" part.  So, while it's fast if the
> zone is unsigned (because the tree would be empty), it will be quite
> slow for a signed zone, for which this query is actually needed.

In the old trac324 branch the following indexes were created:

CREATE INDEX records_byname 
    ON records (zone_id, name, rdtype, sigtype, id, ttl, rdata);
CREATE INDEX records_byrname 
    ON records (zone_id, rname, rdtype, name);
CREATE INDEX nsec3_byhash 
    ON nsec3 (zone_id, hash, rdtype, id, ttl, rdata)

These address this problem, although in a very slightly different
manner. That's why I pointed them out in the comment on the ticket.

> I've made these updates to the trac324new branch and pushed it to the
> public repo (I've already started another task, so I'll release the
> ownership for now).  If someone wants to complete this task, please
> do.
> 
> Some additional notes:
> 
> - This query in sqlite3_accessor.cc (the new implementation) doesn't
>   seem to be good for the same reason(s):
>     "SELECT rdtype, ttl, sigtype, rdata " // ANY_SUB
>         "FROM records WHERE zone_id=?1 AND name LIKE (\"%.\" || ?2)",
>   it uses an expression, and it places the wildcard first ('%').
>   We'll have to fix that, too.

This is a more serious issue, and the reason we have the RNAME field.
It is possible in more sophisticated SQL like PostgreSQL (or presumably
Oracle) to index on a reverse string so a prefixed LIKE will be
indexed, but I doubt that this is true on SQLite, and we certainly
don't do it.

> - The bottleneck for "find previous" could be avoided for unsigned
>   zones.  We should probably first identify whether we need to know
>   the previous name before (i.e., whether the zone is NSEC signed)
>   actually trying to find it.

Certainly not doing a lookup at all seems like a reasonable
optimization. :) 

> - I also noticed this query is slow:
>     "SELECT rdtype, ttl, sigtype, rdata, name FROM records " //
> ITERATE "WHERE zone_id = ?1 ORDER BY rname, rdtype",
>   Because it builds a temporary B-tree for "ORDER BY".  Like the
>   "previous" case, we could solve this by adding an index on (rname,
>   rdtype) in this order.  But in this case I wonder we could rather
>   omit the "order by rdtype" part.  Adding another compound index
>   makes the DB file even bigger and would make update slower.  Unless
>   we need this ordering by the API contract it's probably better to be
>   less strict.

The indexes from the original trac324 should address this issue.

> I plan to make tickets for (some of) these points.

I don't recall seeing these... did you create these tickets? Certainly
fixing the LIKE '%'||?2 issue and avoiding any lookups at all if the
zone are unsigned seem useful.

Anyway, we'll discuss this on our call today.

Cheers,

--
Shane