[bind10-dev] revised/refactored data source design proposal

Fri Jun 3 06:45:11 UTC 2011

At Thu, 2 Jun 2011 14:51:49 +0200,
Michal 'vorner' Vaner <michal.vaner at nic.cz> wrote:

> I did read the wiki page (I didn't have a look at any of the
> experiment code), partly out of curiosity.

Thanks for the prompt feedback.

> I find it reasonable design as overall. But I have few points that worry me a
> bit:
> 
>  * Naming conventions (might be bikeshedding, but I find that good
>    naming makes a big difference in code readability). For one,
>    isn't database single word, so it should be written as Database,
>    not DataBase? The second looks like it would be some base in
>    which data are encoded, like base64, or so.
> 
>    Second, could we use something shorter, using namespaces, nested
>    classes or something? Names like DataBaseDataClientCreator are
>    quite long and unreadable, they kind of get confused one into
>    another (I was thinking „Oh god, this looks too like Java“).

In general, the overall naming is very tentative (the proposal more
focuses on the concept at the moment), and I don't think I'm good at
naming things, so suggestions are very much appreciated. (and I agree
naming can be crucial in some sense).

Regarding s/DataBase/Database/, I have no objection.  I myself
wondered whether the latter might be better.

The length of names was my own concern, too.  On one hand, I agree
with using shorter names, especially because I don't like having too
long lines (longer class/variable names are major contributors to
longer lines).  On the other hand, I also believe awkward abbreviation
is another source of reducing readability.  This is why some of the
proposed classes are named with "DataSource", instead of "DataSrc".
So, the current result is an incomplete attempt of trying to find a
good balance.  I'm quite open to suggestions on this point.

>  * You say that each thread will have it's own client, because some databases
>    can't share the connection. But on the other hand, we really don't want to
>    have multiple instances of the in-memory data loaded in memory, that would be
>    both waste of memory and performance penalty. Is it expected that the
>    in-memory clients will share the data?

Good question.  I noticed the problem of having redundant copies of
the in-memory case.  So far, I've not yet given much thought about in
this area, but some random thoughts include:
- we are still not sure how to benefit from multiple cores in
  b10-auth; whether to use threads or sub processes, etc.
- if we decide to use threads, we'll probably still use separate
  Client object for separate threads, but employ some techniques to
  share the underlying in-memory data.
- for longer terms, I hope we can use more sophisticated approach such
  as a single memory image on the system shared by multiple BIND 10
  processes (not only by auth, but also xfrin/out, etc), using
  techniques such as shared memory or mmap.

>  * You intend to not have transaction for reading operations. I find this wrong,
>    as we do multiple queries to the DB per single query. This could produce
>    inconsistent data.
> 
>    If we provided methods to start and end a read-only transaction, the
>    databases supporting this could implement it and the in-memory source would
>    just have empty methods and we would ensure the consistency in different way
>    for it.

Actually, I didn't consider transaction for reading at all in that
proposal, rather than intentionally excluded it.  As you said, it's
not difficult to use a single transaction for a single call to
ZoneHandle::find(), and I don't have an objection to that.  But if you
also wanted to have a transaction for the entire single DNS query
(which possibly consists of multiple calls to the find() method, one
for the answer, another for the authority, other for glue, etc), it
may become trickier.

Also, for the entire DNS query, there's another possibility of having
inconsistency: the answer may come from the hot spot cache (which
could be older) and data for other sections may not be cached and
require DB transaction.  What should we do in this case?

Personally, I don't (yet) have a strong opinion on this point either
way.  But I'd point out that (as far as I understand it) neither BIND
9 DLZ nor powerdns tries hard to ensure this type of consistency using
DB transactions.  The fact that two most widely-used implementations
with DB backend don't do this may mean something; maybe users don't
care much; maybe it's extremely difficult.  So, I'd first hear from
candidate users who'd seriously consider using DB-backend for DNS.

Another note in case it was not clear from the description: the
ZoneIterator would require a single DB transaction (essentially it's
"select * from records") and implicitly uses a kind of transaction
(this means for a very big zone AXFR-out would probably ineffective,
but that's a subject of another discussion).

>    As we talked about this on the last F2F, we probably don't want
>    the in-memory to be directly writable, but somehow reload it from
>    permanent storage when it changes. So we could do some kind of
>    atomic reload (be it big read-write log, loading in background
>    and replacing the tree, shared pointer tricks and consistency,
>    etc).
> 
>    So, would it be possible to add a method to it as well? Or some
>    kind of transaction-ish object or so?

This is another area where details are still quite open.  But I
thought what we discussed at the last F2F was that we'd use a DB
backend to manage IXFR/dynamic update diffs (instead of BIND 9-style
homemade journal files) and further details were still open.  I have a
feeling that we'll still need a way to update the in-memory image
dynamically, like as follows (in the case of IXFR-in):

- xfrin has its own DataSourceClient for the in-memory data source.
  When it performs IXFR, it stores the diff to the separate DB
  storage, and also updates the in-memory image (the latter would be
  necessary to eventually dump the latest zone to a separate file).
- xfrin then notifies some other processes (auth, xfrout, and perhaps
  more) of the changes.
- auth receives the notification, and updates its own in-memory image
  (either from the DB storage, or from the notification itself).

While not fully thought about it, I guess the proposed ZoneUpdater
class can be used for this scenario, too (at least in the case of
single thread operation).

>  * What about stuff that is not supported by a given backend? Would
>    the method throw a not-implemented and the above level would take
>    care of it somehow?  (depending on the feature, if it was attempt
>    to write to it, it would fail, but in case it would be some
>    DNSSEC support, it could just assume the zone is not signed if
>    the backend doesn't support it).

That would depend on the feature, yes.  To be honest I've not fully
considered these, while I've noticed there can be some possibilities
like having a non-writable data source.  At this moment it's pretty
open yet.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.