[bind10-dev] meta data source vs data source container (detail)

Tue May 8 00:18:47 UTC 2012

This is a more detailed discussion of the subject topic, following the
summary message I've just sent to the list.  As in the summary, I hope
I've described it objectively, but I don't deny it if it looks biased.

First, I show a conception implementation of both approaches (these
are just for explaining the concept in the form of code, not a
proposed implementation):

// On construction it's given a data source selection policy (first
// match, longest match, etc).
class MetaDataSourceClient : public DataSourceClient {
public:
    // Examine the clients_ vector elements, calling their findZone()
    // method for the given name until it gets a result that meets the
    // selection policy.
    virtual FindResult findZone(const Name& name);

    // The rest of the public method works similarly.
    virtual ZoneIteratorPtr getIterator(const Name& name, ...);
    virtual ZoneUpdaterPtr getUpdater(const Name& name, ...);
    virtual pair<Result, ZoneJournalReaderPtr>
    getJournalReader(const Name& zone);

private:
    vector<DataSourceClient*> clients_;
};

class DataSourceClientSet {
public:
    // Similar to the "meta data source" version, but works as an
    // independent method.  This works as a shortcut of iterating over the
    // stored clients and calling findZone() on the appropriate client
    // (depending on some selection policy, which might be passed as
    // an optional argument to this method and/or specified on construction).
    DataSourceClient::FindResult findZone(const Name& name);

    // We can add other shortcut methods.  We can also add some kind
    // of iterator so the application can have greater flexibility of examining
    // the set of clients.

private:
    vector<DataSourceClient*> clients_;
};

As far as I see (and remember the discussion) the major points to
consider in terms of which approach we should take are the following:

1. Configuration

In either way the user level configuration (i.e. what's stored in
b10-config.db) won't be much different in essence: We'd specify a list
of data sources, each element of which gives configuration information
of the corresponding data source:

"data_source": {
  "type": "meta",
  "match-policy": "first",
  "backends": [
     {"type": "memory", ...},
     {"type": "sqlite3", "database_file": "zones.sqlite3", ...},
     ...
  ]
}

Of course, "type" won't be necessary for the container-class
approach.  "match-policy" may also be omitted if we specify it in
its methods.

With the "meta" approach, the type may not necessarily be "meta" if
the user wants to use one concrete data source and only that one.

(I'm not sure what should happen if "backend" also contains "meta"
type; a naive configuration and implementation would subsequently
cause an infinite loop).

2. Instantiation from an application

For the "meta" approach, we'd pass the configuration parameters as
shown above to the polymorphic factory.  The factory will instantiate
MetaDataSource object and returns a (shared) pointer to it as a
pointer to a base class object.  For the application it's
type-independent operation.  Existing instantiation code that uses the
polymorphic factory doesn't have to be changed (although, currently,
we hardcode type-specific instantiation in most part of the code).

For the "container" approach, the application would pass the same
configuration parameter to the DataSourceClientSet constructor and
explicitly instantiates the container object.  Existing code that
instantiates data source client objects will have to be updated so
that they will instead instantiate a DataSourceClientSet object.  (But
as noted in the previous paragraph, such existing code will generally
have to be modified anyway, as they are generally type-specific code).

3. Use from an application

Existing code that uses the abstract data source client won't have to
be modified if we adopt the "meta" approach.  For example, we won't
have to modify the auth/Query class (in particular, its process()
method).

If we adopt the "container" approach, we'll need to adjust method
signatures and do some cleanups.  In the case of auth/Query, we'll at
least need to pass DataSourceClientSet instead of DataSourceClient to
the process() method and replace all reference to the latter to the
former.  We'll also need to rename variable like "datasrc_client" as a
cleanup.  Existing Python code will also have to be updated in a
similar way, although it may be a bit easier because of the duck
typing (as long as we provide compatible interfaces).

With the "container" approach, we can provide applications with
greater options of manipulating/examining the underlying data
source(client)s (of course, at the cost of having applications be
aware of the concept of the container).  For example, it will be
easier to find a specific data source (client) that has a zone that
exactly matches a given zone name.  This will be necessary for xfrin
to check the SOA serial of the zone before attempting the transfer.

If we want to do it with the "meta" approach, we'll need to introduce
an ability to specify such less common and possibly complicated search
policy in the "match-policy" of the above example, or to break the
abstraction and have the application refer to the meta data source
(client) directly, maybe using specialized methods.

4. Python wrapper

We don't have to add a new Python wrapper for the "meta" approach.

We'll need to write a wrapper for the DataSourceClientSet class in the
"container" approach.

5. Performance

In general, performance implications should be marginal whichever
approach we take.

Probably the only case it may matter is the "findZone()" operation in
a performance sensitive context, e.g., for b10-auth to identify the
zone to search for an incoming query.

Both "meta" and "container" approach would involve one top-level call,
one iteration over the internal list (set) of data source(client)s,
and one (or maybe more for search) call to the findZone() method on
the identified data source(client).  So the overhead should be
generally comparable.

The "container" version might be slightly faster because the first
call isn't for a virtual method and could be further optimized by
inlining if we really care about this level of performance.

On the other hand, with the "meta" approach, if the user knows it only
needs exactly one specific data source and configures the server that
way, it would only need one single call to the virtual findZone(), and
will be slightly faster than the container version that contains
exactly one data source(client).

But in any event, these differences will probably be marginal in the
larger context of query processing.  Even the overhead of the
underlying findZone() call itself may be dominant in the entire "meta"
findZone() process.

6. Extendability

The "meta" approach will introduce a new characteristic to the
abstract data source(client) concept: its object may now encapsulate
multiple different data source(client)s, and will restrict future
extension to the base class.  For example, it will be not well defined
if we want to add a "get capability" method (e.g., to see whether the
underlying data source is writable or not).  We'd either need to
define it in some ad hoc way (e.g., consider a meta data source is
writable iff one/all of the contained data sources is/are writable),
which may or may not be the application would expect at the abstract
level, or reject it by throwing a "not implemented" exception.  We
actually already do the latter for some cases, but it's generally
considered a bad practice; the more such exceptional cases are added
to derived classes, the more broken the abstract level design will be.

Obviously the "container" approach doesn't have this issue.  We can
also freely add convenient interfaces to it as a "container" as we see
the need, such as a way to allow the application to iterate over the
underlying data source(client)s or to get a data source (client) that
matches a given condition.  We could add the same interface to the
MetaDataSource class, but if the application needs to directly refer
to the specific derived class, we'll lose most of the advantages of
the abstraction by inheritance.

The concept of "container" can be implemented using a "meta data
source" class, so we can switch from the "container" approach to the
"meta" approach later without breaking the application that relies on
the container class, if we find the "meta" approach really more
sensible.  On the other hand, once we introduce the "meta data
source(client)" and applications/users start relying on the abstract
interface containing it, we cannot extend the base class without
worrying about breaking the existing deployment.

---
JINMEI, Tatuya