[bind10-dev] missing C API for PostgreSQL: row-by-row query result retrieval

Wed Oct 3 12:52:31 UTC 2012

Jinmei,

On Monday, 2012-10-01 20:47:27 -0700, 
JINMEI Tatuya / 神明達哉 <jinmei at isc.org> wrote:
> While doing experiments I mentioned in the other thread(*), I noticed
> that the PostgreSQL's C API misses one important feature except in the
> very latest version, 9.2: the ability of incremental retrieving a
> query result.
> * https://lists.isc.org/pipermail/bind10-dev/2012-October/003866.html
> 
> It seems to be supported in 9.2
> http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html,
> but before that it seems we have to store all results of a query in
> memory at once and then process it.  It makes some usage effectively
> unusable (or at least very expensive) such as axfr-out or loading from
> DB in-memory for a very large zone.

Good that this is finally included, it seems to have been something
missing for quite a while:

http://archives.postgresql.org/pgsql-interfaces/2000-09/msg00082.php

> If our next target for the DB backend is PostgreSQL, we need to think
> about how to deal with that, including:
> 
> - Implement everything using pre-9.2 APIs, and say that it won't work
>   well for large scale environments
> - Support both the new and old APIs based on availability, and say
>   that for large scale environments 9.2 or higher will have to be
>   used.

Since 9.2 came out less than a month ago, we can safely assume that
nobody is using it yet. :( (Although perhaps the reported speed
improvements will encourage serious users to jump quickly.)

I definitely think we should use the 9.2 single row mode, although we
clearly can only use this where available.

A third option is:

- Identify the queries we have that can return a large data set, and
  force the results into a loop on the client side.

So, for example, we assuming we want to transfer a zone, we could do
something like:

    SELECT rdtype, ttl, sigtype, rdata, name, rname FROM records 
        WHERE zone_id = ?1
        ORDER by rname, rdtype
	LIMIT 100 

This would give us no more than 100 rows. 

We can then call again, using the last rname and rdtype, like this:

    SELECT rdtype, ttl, sigtype, rdata, name, rname FROM records 
        WHERE (zone_id = ?1) AND (rname >= ?2) AND (rdtype >= ?3)
        ORDER by rname, rdtype
	LIMIT 100 

We continue this until we get less than our maximum 100 rows back.

There is a bit of a corner case when our 100th record is in the middle
of the same rname/rdtype (which is why I have >= and not > there).

It's icky and I'm not sure what the performance penalty would be, but
perhaps better than loading yet another copy of an entire zone into
memory. 

> By the way, MySQL's C API has been supporting incremental retrieval
> for quite some time, so we don't have this particular issue with
> MySQL.

I guess the adage that "programmers like MySQL and DBA's like
PostgreSQL" came about for a reason. :)

--
Shane