[bind10-dev] proposed design of scalable in-memory zone loading/updating

Wed Jun 27 18:58:35 UTC 2012

At Tue, 26 Jun 2012 15:07:12 +0200,
Shane Kerr <shane at isc.org> wrote:

> Loading after startup is to be done in "chunks" of some reasonable
> size, presumably. I guess this is something we can limit based on
> wall-clock time to insure reasonably low delays, right? So, say check
> the time after each 100 (or 10 or 1000) operations and pause loading
> after 0.001 (or 0.01 or 0.0001) seconds have passed?

That would be a better approach, although realistically I suspect
we'll use some heuristics-based upper limit on the number of RRs (or
RRsets) to be updated in a single chunk.  BIND 9 works that way.

> Not really related to this particular proposal, but we may want to be
> careful with the IXFR event chain to insure that we've loaded our
> in-memory database before we send NOTIFY packets. Even so, it is
> possible that a secondary could get a newer version of the zone than we
> are serving. (I am assuming that b10-xfrout reads directly from the
> SQL database.) This is a truly edge case, so probably nothing worth
> engineering around, but perhaps worth noting.

Good point, and you're probably also right that it would be beyond
critical matters for the initial implementation, but I think we should
handle it in the xfr-ng work.  (even if b10-xfrout can provide the new
version, the preceding SOA query will be responded by auth and it
could be of the older version of the zone and mislead the secondary
servers).

> A final consideration is whether we want to have multiple Memory Event
> Handler instances per process. The use case here is when we are loading
> a large zone and don't want to block updates to other zones. It may
> complicate the design slightly, but if we don't include it now and add
> it later it seems like it may be quite a bit of work to refactor around
> this idea.

I believe it's possible with the initially proposed design and a
single Memory Event Handler:

1. auth forwards "replace(full load)" command of zone A to the handler
2. the handler starts the replace task, creates (some kind of)
   continuation context for it and returns it to auth
3. auth periodically resumes this incremental task.  everytime the
   handler makes some progress on replacement
4. before the replacement is completed, auth receives an "(partial)
   update" command of zone B.  auth forwards it to the handler.
5. handler immediately completes the update task and returns the
   result to auth, the in-memory client will use the new version from
   this point
6. the handler still continues the replacement task as it's resumed by
   auth

If A != B (which should be the normal case in this scenario), this
should be safe because the corresponding memory regions are completely
different.  If A == B we'll need to handle it as a kind of error
condition (rejecting the latter update or aborting the replacement,
etc).

---
JINMEI, Tatuya