[bind10-dev] Datasource discussion

Thu Jan 14 13:25:09 UTC 2010

On Wed, 2010-01-13 at 16:52 -0600, Michael Graff wrote:
> Shane Kerr wrote:
> > All,
> > 
> > On Wed, 2010-01-13 at 11:31 +0100, Jelte Jansen wrote:
> >> We started out talking a bit about updates, and transactions for that.
> >> Michael suggested earlier that we could look into using diffs; this
> >> could take the form of dynamic dns updates, i.e. a set of prerequisites,
> >> and a set of changes. Then we wouldn't need a full transaction
> >> interface, as the update methods can take care of that.
> > 
> > DNS is very "fuzzy" in general, so maybe not using transactions is
> > indeed the way to go. It might allow us to do faster processing on
> > updates, if we can lock only part of the tree based on knowing what all
> > operations in a set are. OTOH, techies are familiar with transactions,
> > and understand their benefits and limitations.
> 
> I think you misunderstand what was meant.

I think I understood exactly what you meant. :)

> One way to do this would be to have the upper-layer call:
> 
>    lock()
>    update(this_record)
>    update(that_record)
>    unlock()
> 
> Another would be to do this:
> 
>    record_change_set.add_change(this_record)
>    record_change_set.add_change(that_record)
>    update_atomically(record_change_set)
> 
> and let the lower level API decide how it manages its locking, if any.

Locking implies the underlying model needs locking. With an SQL
back-end, you don't need locks - you use transactions. Which may result
in locking, but often will not(*).

I understood your collected-updates model, which is why I foresaw
problems with queries, since subsequent queries depend on earlier
queries.

Again, I think it can be more in some cases because the underlying data
source will know what locking to do at the beginning. I do worry it may
not be a natural model for developers, who are used to SQL transactions.
There may also be unforeseen problems with this, but I'm willing to
count on our willingness to refactor if those crop up.

> > One specific problem with this approach is that it doesn't work so good
> > with queries, unless you introduce a basic query which says something
> > like "give me the NS RRSET for foo.example, plus also any A or AAAA
> > RRSET that you have for the name server names we're authoritative for".
> > Otherwise your server can end up replying with an NS RRSET that has
> > non-existent A/AAAA records even if the administrator never configured
> > it in such a way. Not a *huge* problem I suppose, but we need to at
> > least recognize and document the possibility.
> 
> This is an argument to mean a good DS would also know a bit about DNS 
> protocols.  I don't think this is a bad idea, and I also don't really 
> mind if the data changes from under the client IF there is a way to, for 
> DS drivers which support it, maintain version consistency.

I'd really like to avoid data sources implementers having to know a lot
about DNS. This will increase the bar for people adding their own. I
consider it a big win if we can get people to add data sources in ways
that we never considered!

> > Maybe we need to ask the question "how important is always giving a
> > completely consistent answer?" to users? Or rather, "how much
> > performance would you be willing to give up to get a completely
> > consistent answer?"
> 
> How does PowerDNS ensure sanity here?  I somehow doubt it bothers. 
> After all, if we don't respond with data the client can usually ask for 
> it.  If we provide old data, it's also ok so long as it's not "too" old. 
>   We're talking about sub-millisecond differences in data where a query 
> may come in and get something strange.  Note that my change-sets will 
> mitigate a lot of this as an IXFR will contain both updates and apply 
> them atomically.

PowerDNS ensures sanity by not supporting IXFR, DDNS, or any way to
update the database other than AXFR (or direct manipulation of the SQL
back-end). That is *one* possible approach - but not the one we can
take, since our users want IXFR, and DDNS, and so on. And when we show
PowerDNS users how much bandwidth and CPU time they'll save by not
having to AXFR their zones all the time, they'll insist on using BIND 10
everywhere! :)

The "fuzziness" I'm talking about is on query results, so for example:

     1. Admin does a DDNS update to update an NS RRSET: modify the NS
        RRSET to add a new name server, then add an A record for it.
        (All in theory atomic since this is DDNS.)
     2. User queries for this NS RRSET, and gets the updated NS RRSET,
        but the A record is not yet there, so it is not added to the
        additional section. This may result in an extra query
        (hopefully), or a confused user.

I think this isn't that bad. It should at least be written down. :)

Using change sets will solve this problem if we insist that the
underlying data store treat them as atomic - which is possible, but
removes the possibility for increased SQL performance in situations
where we don't care (think "transaction isolation" if nothing else).

Anyway, if everyone else wants to go ahead with the change set model, I
guess we can give it a go. :)

--
Shane

      * In a naive SQL implementation you'll always have locking,
        because we always need up update the SOA. But one can be smarter
        than this, once on recognizes that one can have multiple SOA
        records, as long as readers understand that only the one with
        the largest serial is important.

        In this case, one can simply insert the new SOA, which means no
        locking is needed. Cleanup of old SOA can occur in a lazy
        fashion, which with proper row-level locking shouldn't affect
        any queries or updates.

        On the relatively rare case where one is looking up an SOA, one
        can use SELECTS like this:

        	SELECT * FROM table_name
        	         WHERE increasing_id = MAX(increasing_id)

        This will get you the current SOA record.