[bind10-dev] Datasource API

Wed Jan 6 23:30:32 UTC 2010

At Tue, 05 Jan 2010 23:47:17 +0100,
Jelte Jansen <jelte at isc.org> wrote:

> > - in general, I'd like to separate the abstract base class from a
> >   concrete implementation class.  in that sense, I'd separate the
> >   default implementation class from the base DataSource class.
> 
> as opposed to providing an implementation in the base class for those
> functions that don't necessarily need to be changed for different data
> sources?

Providing common behavior in the base class is okay, but I'd prohibit
the base class from being instantiated...hmm, the proposed base class
is actually already abstract: getRRset() is a pure virtual function.
I missed that:-)

> > - how would we represent non trivial match?  For example, matching a
> >   CNAME, partial match at a zone cut?  Wildcard match (may be it's not
> >   special in this context)?
> > 
> > - What about RFC2181 data ranking?  Should we separate glue and
> >   authoritative data in matching?  How would that affect the API?
> 
> that's a big part of my question on where the intelligence should be :)

And another example is a query for type "ANY".

Basically, the proposed plan seems to make sense to me, that is, we
provide low-level primitive without much intelligence about DNS
specific transactions like above, because we'll be able to support
more variants of backend.  But at the same time, if the middle or
upper layer to support these advanced stuff is too complicated or very
inefficient, that wouldn't be a good architecture.

Right now I don't have a clear image about the best balance on this
point.  So, let's begin with what we've come up with so far, keeping
in our mind these corner cases.

> > - on a related note with the previous point.  we may have to provide
> >   more detailed information in the result codes.  on another related
> >   note: we'll need to represent "name exists but no data"
> 
> actually, in this example, noerror/nodata would just end up with an
> empty target rrset, but a success return code

Okay.  I think this is another point to revisit as we know how we use
this interface in the applications.

> > - interface to handle SIGs: should we separate this from getRRset()?
> >   FYI, BIND9 uses a unified "find" interface to get both the answer
> >   and its SIG by a single lookup.  If my memory is correct NSD does
> >   the same thing.
> 
> yep, and probably everything efficient should, but that's the reason i
> was thinking of lowlevel and highlevel functions, where if they are
> encapsulated in one class, the efficient end-result would have those
> high-level ones take quite a lot of shortcuts. Though i'm open to
> suggestions or loud no-shouting :)

I'm not sure what you mean by "the efficient end-result would..."

Anyway, aside from efficiency, another thing to consider is how to
ensure the "real" RRset and its RRSIG belong to the "same version".

With this method signature, 

    virtual result getRRSigs(RRsetPtr target, const RRsetPtr rrset);

it would be impossible unless "rrset" holds the version information.
But, since RRset{,Ptr} is designed to be a generic "RRset" class,
independent from how they are used, it wouldn't be the case.

One possibility is to add an explicit method to retrieve the "current
version"

    virtual const DataSourceVersion& getVersion();

and pass it to findRRset() and its friends:

version = datasrc.getVersion();
datasrc.findRRset(version, target, Name("www.example.com"), IN, A);
dtasrc.getRRSigs(version, sgitarget, target);

Of course, if we can ensure that the sequence of findRRset() and
getRRsigs() calls is atomic, we could omit the "version" information.
But I'm afraid it's too optimistic even if we don't support multi
threads.

> > - addRR: why not adding RR? (we have the "RR" class).  Also why not
> >   RRset?  it's also not clear whether this "add" is "replace" or
> >   "merge" if there's data of the same name/type exists.
> 
> good point, for single rr add(), i'd say always add, and use
> remove/readd to change. For rrsets this might be a bit more complicated
> (although simply replacing them would certainly be an option, you could
> always get them before)

FYI, BIND9 separates "replace" and "merge" by a flag argument to
dns_db_addrdataset(): if we set the DNS_DBADD_MERGE flag, the added
records will be merged into (any) existing RRset.  Otherwise the new
data will replace existing one, unless the latter has higher (RFC2181)
trust level.

Now, this is another consideration point of how intelligent the
low-level methods should be, that is, whether or not the method
considers the trust level in an add operation or whether it's caller's
responsibility.

> > 
> > - if this is a compound source of data (i.e., containing multiple
> >   zones), do we need to provide/add to an interface to indicate which
> >   zone a find matches?
> 
> yeah, at first i thought of trying to find the corresponding zone first,
> then use that as a handle for the actual data search, but perhaps we
> could also provide it all in passed arguments
> 
> > - in BIND9 (for example), we model the "data source" as consisting of
> >   + zones
> >   + zone DBs (1-to-1 mapping between zone and DB)
> >   + DB nodes (searched by domain names)
> >   + rdatasets in a DB node (per RR type)
> > 
> >   And we can search for either a "node" (key = name) or an rdataset
> >   (key = name + type).   If we get a "node", we can then iterate over
> >   all rdatasets in the node to examine all (RR) types of data.
> >   should we provide the same level of search granularity?
> > 
> > - about the high level method (getData()): I'm afraid this will make
> >   the class too monolithic.  maybe we can begin with a non member
> >   function only using other low-level public methods.
> 
> well see my second and fifth answer (on intelligence and efficiency),
> i'm certainly not set in stone on the current approach (nor for other
> choices made so far)

I don't have a strong suggestion right now.  I'll think about it more.

BTW, I'm not sure the getNSECs()' signature makes sense:

    virtual result getNSECs(RRsetPtr target, const RRsetPtr rrset);

what's the "rrset" argument in this context?  For example, if the
corresponding findRRset() resulted in name_not_found, I'd not expect
an RRset object to be returned.

A related random idea: if we provide generic primitives of something like
findPreviousName(), we might even be able to hide the NSEC related
details from the low-level API implementations.  Further, we could
also extend the notion of "RRtype" so that an extended version
encapsulates both the "real" type and covered type (the latter is
meaningful only for RRSIG).

For example, we could code the NXDOMAIN logic as follows:

if (datasrc.findRRset(target, qname, IN, A) == name_not_found) {
    prevname = datasrc.findPreviousName(qname);
    datasrc.findRRset(nsectarget, prevname, IN, NSEC);
    datasrc.findRRset(sigtarget, prevname, IN, ExtendedRRType(RRSIG, A));
    // use nsectarget and sigtarget for the proof of nxdomain
}
// ignoring the "version" issue I mentioned above for simplicity

This way, the underlying low level implementation doesn't have to know
anything about DNSSEC specifics.  Its only responsibility is to
maintain the names in the order of DNSSEC comparison, and search by
extended RRtypes.

Right now, I'm not necessarily advocating this idea though.  Just a
food for thought.

---
JINMEI, Tatuya