[bind10-dev] Data source configuration

Thu May 24 06:35:18 UTC 2012

Okay, I've thought about this more, referring to the original wiki
page: http://bind10.isc.org/wiki/datasrc_config
and this thread.

I thought it might be useful if we make a few steps back and consider
whether we can benefit from some simplification and/or revisiting the
concepts.

First, regarding the need for using multiple data sources at the same time.
Forgetting the special in-memory for now (and see below about that),
while I think we should allow that, I suspect it's an uncommon if not
unlikely operation.  I believe in the vast majority case there's just
one real data source (sqlite3, or mysql or postgres, or perhaps some
form of plain text).

Second, regarding the matching strategy in case we have multiple data
sources.  If the first observation is reasonable, the matching
strategy wouldn't be that important.  And, I tend to agree with Jelte
that anything other than best-match (which is upper compatible with
exact-match when only exact match is sought in the first place) can
lead to confusing behavior.  So, I wonder whether it makes sense we
only support best-match, and the admin needs to be willing to accept
the performance consequence when multiple data sources are used (for
not-so-busy server the overhead would probably be acceptable; for a
very busy server I guess the admin would like to avoid multiple
sources anyway to minimize any possible overhead due to that).

Third, about "in memory data source".  As we all know it's special in
several points, and its specialty is one of the reasons why this
configuration is a difficult problem (e.g., we need to find a clean
way to avoid using in-memory for certain applications).  Now I wonder
whether we can just consider the in-memory thing a "cache" that comes
with a specific concrete data source, rather than yet another instance
of data sources (even if the actual implementation is one of data
source client derived classes).   Also, I think it probably makes
sense if a single in-memory cache is specific to one particular data
source.  In fact, according to the first observation, it wouldn't be
so different from a single global cache (that can possibly cover
multiple sources).  Also, this model will eliminate the matching
consistency issue with or without using a cache.

Finally, about per-application issue.  With the concept of first-class
"cache", I'd simply let each application choose whether or not using a
cache.  At the moment, xfrout won't use caches while auth will.  ddns
or xfrin won't use them either.

The following are a possible configuration snippet using these
concepts:

'datasource': [
  {
    'type': 'sqlite3', 'database-file': 'zones.sqlite3',
    'cache-zones': 'all'
  },
  {
    'type': 'mysql', 'db': 'bind10', 'user': 'bind10', 'password': "xxx",
    'cache-zones': [ 'example.com' ]
  },
  {
    'type': 'static',
    'cache-zones': null
  }
]

'auth': { 'use-datasource-cache': true, ...}
'xfrout': { 'use-datasource-cache': false, ...}
'xfrin: { 'use-datasource-cache': false, ...}
'ddns: { 'use-datasource-cache': false, ...}

The 'cache-zones' config for the 'mysql' data source example could be
tricky, in such cases like the underlying data source has
'child.example.com' and auth gets a query for the child zone.  One
possible (maybe short term?) solution would be to allow only 'all' or
'none'.  A more flexible solution would be that a cache at least
contains a complete list of zones of the underlying data source
in-memory, and if a lookup detects the best matching zone is not
in-memory but in the underlying data base, it somehow forwards the
query request to the real data source.

This approach should meet the requirements (well, somehow):

- Be able to specify multiple data sources, including their specific
  configuration parameters.
=> Okay.
- Define some matching strategy deciding which data source should be
  used to handle each query.
=> Not really, but it simplifies the behavior with a single matching
  strategy.
- In case the in-memory is not going to be in shared memory (and it
  seems it will not come soon), allow for XfrOut not to use in-memory,
  but use the underlying source it was loaded from instead.
=> Okay.  Possible via the xfrout config.
- Allow iterating through all the possible data sources for
  XfrIn/DDNS to find the correct one for update.
=> Okay, because there's only best-match.
- Make it simple enough so users understand what is happening.
=> Hopefully, cannot be sure since "simple" is a subjective term:-)
- Make sure the configuration doesn't create serious performance
  problems. 
=> I hope so, although it might depend on what kind of performance is
  intended.  At least we don't have to list 1M zones in 'cache-zones'
  when that's the total number of zones in the underlying data source
  and we want to cache all of them, so the scalability due to the
  amount of written configuration should be acceptable.  Also, since a
  single-datasource usage is the primary usage, the overhead due to
  the best-match search is not an issue in that case.  And if
  in-memory cache is used, either because we only allow all/none or
  have the cache maintain a list of zones in-memory, there won't be
  any redundant lookup in the database if in-memory search finds an
  answer.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.