[bind10-dev] initial ideas on the "difference" design

Wed Oct 12 11:20:48 UTC 2011

On 12/10/2011 11:07, Michal 'vorner' Vaner wrote:

> What is the reason for having the ID? We don't need to reference 
> exact row of database ever, so there's no need to have an 
> identifier and I guess the order inside the diff does not matter 
> provided deletions are first, right? Because the diff is atomic 
> anyway.

In the response packet, the RRs within a difference sequence have to
be ordered

SOA of old version
RRs to be deleted
SOA of new version
RRs to be added

... and the difference sequences within the packet have to be in order
of increasing versions.

Although the order of the RRs within each section does not matter, the
position of the SOAs do.  By assigning an ID to each RR and making
sure you add RRs (including the SOAs) to the difference table in the
correct order, ordering the selection by increasing ID will ensure
that the retrieved records are in the right order to go into the DNS
message.

>> select * from diffs where zone_id = Z and id >= (select id from 
>> diffs where version = B and operation = 0 order by id asc limit 
>> 1) and id <= (select id from diffs where version = E and 
>> operation = 1 order by id desc limit 1);

I think an additional ordering by ID will be needed to get the result
in the right order.

> Maybe: select * from diffs where zone_id = Z and version >= B and 
> version <= E order by (version, operation);
> 
> (With some tweaks, of course). This one looks less complicated and
>  might be faster. This expects the version, zone id and operation 
> are indices (not necessarily primary ones).

That was what I initially thought, but the problem is that there are
two sets of RRs associated with each version:

* The records you added to the previous version of the data to get to
this version.
* The records you deleted from this version of the data to get to the
next version.

These sets are distinguished by the "operation" field.  So to get from
version B to E, you want:

* Records deleted from version B to get to version B+1. (These are
marked as version B records.)
* Records associated with all versions between B+1 and E-1 inclusive.
* Records added to version E-1 to get to version E. (These are marked
as version E records.)

The original select statement selects those records.  The simplified
statement selects them as well but also includes:

* Records added to version B-1 to get to version B. (These are marked
as version B records)
* Records deleted from version E to get to version E+1. (These are
marked as version E records.)

In general, the statement will be simpler in most cases the versions
required will not be "B to E" but "B to latest".  In that case,
something like:

select * from
(select * from diffs where
zone_id = Z and version >= B
except
select * from diffs where
zone id = Z and version = B and operation = 1)
order by id asc

That is not quite complete in itself as an additional access will be
required as we need to check that there is difference information for
version B in the table.  If there isn't, we have to fall back to AXFR.

> I'm not sure I'd like a de-coupling of diff-commit from the normal
>  commit. What happens if the first succeeds and the second does 
> not? Should we delete the diff? Or try to re-apply it? Or call for 
> help? I think both storing the diff and the data themself should
> be part of the same transaction and become atomic.

I agree with that, although I think the decoupling is only in the
special case where the diffs are stored in a different database to the
zone.  Where they are in the same place, I think doing everything in a
single transaction is the best idea.

In general, the overall design looks good and reflects the way I was
thinking about the problem.

Stephen