[bind10-dev] flaws in reversed name-based approach for getting the "previous name"

Thu Oct 6 07:04:19 UTC 2011

At Wed, 05 Oct 2011 10:40:31 +0100,
Stephen Morris <stephen at isc.org> wrote:

> > According to today's biweekly call, the consensus seems we should 
> > address this issue.  In that case I personally think Shane's
> > proposal is simple and effective enough.  Stephen mentioned some
> > possible alternatives in the call, but I didn't fully understand
> > them at that time, much less whether they are better than Shane's.
> > If think they are better, could you explain your ideas again here?
> > > Stephen
> 
> There were several ideas:
> 
> First, internally we should encode the \nnn format characters as
> single bytes in the domain name.  The conversion between internal
> representation and \nnn should be done on input and output -
> internally we should not have to worry about them.
> 
> For the "-" v "." problem, we should be able to alter the collating
> function used by sort functions.  In Sqlite3, the collation function
> can be set by the sqlite3_create_collationXxx  functions.  In-memory
> sorts generally require a comparison function.
> 
> Finally, the problem of \046 (the period) within a label could be
> handled by associating the name with another structure holding the
> location of these characters within the name.  The structure is
> created when the name is converted to the internal string and used
> when rendering the name into readable format.

There may be some confusion here, so please let me check: In most part
of our code there shouldn't be no problem about \nnn, \046, etc.  We
use generic Name class objects, which internally keep the labels in
the binary format and take care of any variation issues (some of our
python programs tend to convert them to a string and are vulnerable to
these issues, but let's forget them for now).  So, the
only place where this matters is the interface between database
backends and the rest of the BIND 10 code (specifically, the
"database" level of the data source API).  Are we on the same page
here?

If we are, the "internal" is all about how we represent names and
compare them in each database.  So far, we handle everything as normal
string (with case insensitive comparison).  This is advantages in that
it would be more readable when the operator directly examines the data
from the database (e.g., by "select * from records").  It would also
be easier to provide a backend interface (in our new API, the
"accessor" class) if everything is represented as bare string.  On the
other hand, this could result in a confusing or even problematic
behavior because a domain name can be represented in multiple ways.
The DNSSEC issue we are discussing is one serious case, but even a
normal lookup could be confused: if a name is stored as
\119\119\1119.example. (possibly "by hand"), a DNS query for
"www.example." would fail.

In any case, as long as we allow the non-captive mode (at least for
some types of databases), either approach can fail, so I don't see a
strong reason for introducing further complexity.  With a dedicated
collation function it may be a bit faster, but I suspect that the
resulting higher level performance (lookup performance in qps)
wouldn't change much.

> > Some other related notes: - The consensus on the slightly related
> > issue of whether to allow non-captive mode seemed that we should
> > allow it, at least for some backends.  But as I said in my other
> > message in this thread, I personally don't think that affects the
> > solution to this problem (if someone disagrees please speak up)
> 
> The fact that there are two representations of the data would seem to
> suggest a captive SQL database. I would suggest that we store both
> both the internal and external  representation of the data in the
> database as that would make it easier for applications that just want
> to read the data. (Although if we do that, we would need some form of
> check/repair utility to ensure that the two representations remain in
> sync.)

This would be a perfect solution in terms of accuracy.  But I'm not so
sure if it's the way to go in terms of the balance of cost and
benefit.  (In my understanding) what we are currently (maybe
implicitly) assuming is:

- we basically assume that users modify the database through our APIs
- we still allow the user to modify the database directly.  but we
  specify the requirement of the data formant (e.g., a domain name
  must be stored in the form of the output of Name.to_text()), and
  note the user should do this at their own risk.
- In the intermediate layer (our database data source client) we
  validate the data from the data source, and if it's bogus it would
  result in something like SERVFAIL with a log message.  then the user
  would correct the offending data themselves.

Personally I think this is a reasonable middleground.  But others may
have different opinions.  If the consensus is to ensure accuracy or
disable non-captive, I wouldn't necessarily object to that.

---
JINMEI, Tatuya