[bind10-dev] Data source configuration

Thu May 24 08:38:52 UTC 2012

Hello

On Wed, May 23, 2012 at 11:35:18PM -0700, JINMEI Tatuya / 神明達哉 wrote:
> First, regarding the need for using multiple data sources at the same time.
> Forgetting the special in-memory for now (and see below about that),
> while I think we should allow that, I suspect it's an uncommon if not
> unlikely operation.  I believe in the vast majority case there's just
> one real data source (sqlite3, or mysql or postgres, or perhaps some
> form of plain text).

Well, I could imagine having the mastered zones in master files and the
secondaried ones in some sql thing. I think such thing would be more convenient
for me, but it's probably because I have no „large“ deployment (actually, I
don't have none yet, but I'm thinking of getting a VPS).

Also, I'm not sure if some larger organization might have several types of some
data source, some for their own zones, some for zones for customers, but that's
only imagination.

Anyway, we probably might want to recommend using only a small number of data
sources.

My main concern was, if I want to have some high-profile zones in In-Memory and
then need some low-profile zones in the database, because they appear and
disappear often, so I can't really cache them reasonable. What I imagine here is
a hosting provider with large number of customers. Usual customer comes, buys a
domain and hosting from them for few dollars and places 5 web pages there and is
done. There are so many of them that caching all their zones into In-Memory
would not be practical. But they have few VIP customers with very very high
load, so they need to put their zones into In-Memory. And, if the query comes
for www.vip.com, they _know_ they did not sell it to anybody but vip.com, so if
they match this zone in In-Memory, they never want to touch the much slower
database. I'm not sure if this is a valid scenario, but I don't see why it
should not be.

> Second, regarding the matching strategy in case we have multiple data
> sources.  If the first observation is reasonable, the matching
> strategy wouldn't be that important.  And, I tend to agree with Jelte
> that anything other than best-match (which is upper compatible with
> exact-match when only exact match is sought in the first place) can
> lead to confusing behavior.  So, I wonder whether it makes sense we
> only support best-match, and the admin needs to be willing to accept
> the performance consequence when multiple data sources are used (for
> not-so-busy server the overhead would probably be acceptable; for a
> very busy server I guess the admin would like to avoid multiple
> sources anyway to minimize any possible overhead due to that).

See above. I would discourage use of other than best-match unless people really
need it and know what they are doing. But I don't think we should not support
it.

> Third, about "in memory data source".  As we all know it's special in
> several points, and its specialty is one of the reasons why this
> configuration is a difficult problem (e.g., we need to find a clean
> way to avoid using in-memory for certain applications).  Now I wonder
> whether we can just consider the in-memory thing a "cache" that comes
> with a specific concrete data source, rather than yet another instance
> of data sources (even if the actual implementation is one of data
> source client derived classes).   Also, I think it probably makes
> sense if a single in-memory cache is specific to one particular data
> source.  In fact, according to the first observation, it wouldn't be
> so different from a single global cache (that can possibly cover
> multiple sources).  Also, this model will eliminate the matching
> consistency issue with or without using a cache.

I agree here.

> 'auth': { 'use-datasource-cache': true, ...}
> 'xfrout': { 'use-datasource-cache': false, ...}
> 'xfrin: { 'use-datasource-cache': false, ...}
> 'ddns: { 'use-datasource-cache': false, ...}

Would things work if the user tweaks? Or would DDNS/XfrIn just break and XfrOut
misbehave? If the second, we probably want to hardcode this instead of providing
the options.

> The 'cache-zones' config for the 'mysql' data source example could be
> tricky, in such cases like the underlying data source has
> 'child.example.com' and auth gets a query for the child zone.  One
> possible (maybe short term?) solution would be to allow only 'all' or
> 'none'.  A more flexible solution would be that a cache at least
> contains a complete list of zones of the underlying data source
> in-memory, and if a lookup detects the best matching zone is not
> in-memory but in the underlying data base, it somehow forwards the
> query request to the real data source.

All this has a slight problem. How do we know the list of zones changed, to
reload it? The database can be filled/modified with some other system (I think
it is the main driving need of having a database in the first place, the company
has some system where users click in the eshop and add their zone, so it goes to
a database right away).

So, I'm not rejecting your proposals, but I'm mostly not sure we can make such
assumptions as you do.

With regards

-- 
Disclaimer: this message may contain information.

Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20120524/dedb86f9/attachment.bin>