BIND 10 #1062: add a base for DatabaseZoneHandle::find()

Mon Aug 8 08:45:21 UTC 2011

#1062: add a base for DatabaseZoneHandle::find()
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:  major  |  Sprint-20110816
                  Component:  data   |            Resolution:
  source                             |             Sensitive:  0
                   Keywords:         |           Sub-Project:  DNS
            Defect Severity:  N/A    |  Estimated Difficulty:  5.0
Feature Depending on Ticket:         |           Total Hours:  0
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 First off, I made a few minor suggested changes to the branch.  I
 believe they are trivial and non controversial, but please check.

 '''General Discussions'''

 - I agree with changing getNextRecord() to a non const member function
   for the reason you explained.  But this would also mean find()
   shouldn't be a cost member function of DatabaseClient::Finder (and
   therefore, of ZoneFinder).  It's not really good if we give up the
   constness at that abstract level (because for other types of data
   source clients such as the in memory one we'd be able to keep them
   const), but we shouldn't pretend that it's immutable when it's
   simply hidden inside a pointer (which gives the caller a false sense
   of safety).

 - Don't we need to use loggers in database and sqlite3_connection?

 - in database.cc, I guess the exception/error handling needs to be
   revised.  If I read the current implementation correctly, if some
   data in the database is broken and libdns++ throws an exception, the
   Finder can keep an ongoing transaction (e.g., in the case of
   sqlite3, it can perform sqlite3_step(), encounter an exception, and
   exit without doing sqlite3_reset() (btw I think such a case should
   be explicitly tested).  Also, depending on the underlying database
   and the connection implementation, searchForRecords() may also
   result in an exception with some incomplete state (even though it's
   not the case for sqlite3).  So, I guess we need something like this:
   - introduce a cleanup method to DatabaseConnection
   - enclose the entire block that performs any connection operations
     with try-catch
 {{{
 DatabaseClient::Finder::find(...) {
     try {
         connection_->searchForRecords(zone_id_, name.toText());

         while (connection_->getNextRecord(columns)) {
         ...
         }
     } catch (const datasrc::DatabaseError& ex) {
         // this corresponds to a database system error such as connection
         // reset.  not sure how to react to it at this moment, but we
 should
         // probably separate this case with other isc::Exception's.
     } catch (const isc::Exception& ex) {
         // call the cleanup
         isc_throw(DataSourceError, ...);
     } catch (const std::exception& ex) {
         // call the cleanup
         throw;
     }
 }}}
   (we may want to perform the cleanup in an RAII-like manner instead
   of calling the cleanup method explicitly with the catch block)

 '''database.h/cc'''

 - maybe we want to use a plain old array instead of vector for
   columns, if we are sure it has a fixed size?  Even though we'd rely
   on hot spot caching for performance, find() is performance sensitive
   and would generally be nicer if it could be efficient.  Using a
   plain old type might also be beneficial because it's used as an
   interface with the database backend, which would often rely on
   C-based library (like libsqlite3, libmysql, etc).  On the other
   hand, it may not matter much especially because the same columns
   object will be used multiple times in find(), and once its used the
   allocated space will be reused.  So I don't have a strong opinion
   about this, but if we keep using a vector, maybe it's better to
   reserve the space on initialization:
 {{{
     std::vector<std::string> columns(4);
 }}}
   (but see also below on using the magic number).

 - (general) I think it's better to define an enum or something to
   represent the semantics of the vector used in getNextRecord(),
   rather than hardcoding the magic numbers:
 {{{
         if (columns.size() != 4) {
 ...
             const isc::dns::RRType cur_type(columns[0]);
             const isc::dns::RRTTL cur_ttl(columns[1]);
 }}}

 - I suggest removing the indentation before the beginning of lines in
   the unnamed namespace:
 {{{
 namespace {
     // Adds the given Rdata to the given RRset
 ...
 }}}
   to keep the lines shorter, especially if you don't like to rely on
   "using" directives.

 - getNextRecord() documentation in .h: since the semantics of
   "columns" vector is a public contract to add support for
   other databases (possibly by an external contributor), I think we
   should be a bit more detailed, e.g.,
 {{{
   The \c columns must have exactly four entries, and these entries
   (in the appearing order) must meet the following conditions:
   - rrtype: The type of the RR.  The text must be accepted by the
     dns::RRType() constructor.
   - ttl: The TTL of the RR.  The text must be accepted by the
     dns::RRTTL() constructor.
   ...
 }}}
  (btw, I'd call the first entry "rrtype" instead of "rdtype".  rdtype
   is a BIND 9 term, which has the notion of "rdataset", but
   protocol-wise this is not really the right term (IMO).  For the same
   reason, I personally think we should rename it in the sqlite3 schema
   accordingly before it's too late, but that's definitely beyond the
   scope of this ticket).

 - addOrCreate(): it's not clear how it's possible that type !=
 rrset->getType():
 {{{
             // make sure the type is correct
             if (type != rrset->getType()) {
 }}}
   Is it intended to exclude the case of CNAME + other type?  If so,
   I'd personally catch that case explicitly in find() and would rather
   assert() the type equality here.  But it's probably a matter of
   preference, so I don't insist.  If you want to handle it here,
   however, please clarify the expected case(s) in a comment.
   (BTW, lcov indicates this case isn't covered by tests)

 - addOrCreate(): according to lcov, this case isn't covered by tests:
 {{{
             if (ttl < rrset->getTTL()) {
                 rrset->setTTL(ttl);
             }
 }}}

 - addOrCreate(): when could rdata_str be validly an empty string?
 {{{
         if (rdata_str != "") {
 }}}
   For an RR that may have an empty RDATA?  In any case, I don't think
   we can simply ignore the case of an empty string because it may not
   be allowed depending on the RR type (actually in many cases it
   shouldn't be allowed).  It seems we should simply pass rdata_str to
   createRdata() unconditionally and let the function check whether an
   empty string is acceptable.

 - addOrCreate(): you don't need toText() for name or type:
 {{{
                 isc_throw(DataSourceError,
                           "bad rdata in database for " << name.toText() <<
 " "
                           << type.toText() << " " << ivrt.what());
 }}}

 - RRsigStore class: I'd use ConstRdataPtr instead of RdataPtr
   throughout this class (and its caller).  Unfortunately, however, it
   probably wouldn't work because RRset::addRRsig() expects a non const
   parameter.  There doesn't seem to be a reason why it needs a mutable
   RdataPtr and I suspect we should be able to change that, but that
   would be beyond the scope of this ticket.

 - addSig(): it seems the insertion logic could be much more
   simplified:
 {{{
             sigs[type_covered].push_back(sig_rdata);
 #ifdef no_need_for_this
             if (!haveSigsFor(type_covered)) {
                 sigs[type_covered] =
 std::vector<isc::dns::rdata::RdataPtr>();
             }
             sigs.find(type_covered)->second.push_back(sig_rdata);
 #endif
 }}}
   Note also that if we do this we don't need a separate method of
   haveSigsFor() any more.  (I'd also note that haveSigsFor() would
   better be a private function if we want to keep it, especially if
   and when we want to use this class publicly)

 - addSig(): if and when we want to use it publicly, I'd change
   static_cast to dynamic_cast or have the caller pass type_covered
   information and avoid casting.

 - appendSignatures(): maybe a matter of taste, but I'd rather simply
   get the iterator via find() unconditionally:
 {{{
             std::map<isc::dns::RRType,
 std::vector<isc::dns::rdata::RdataPtr> >::const_iterator found =
                 sigs.find(rrset->getType());
             if (found != sigs.end()) {
                 BOOST_FOREACH(isc::dns::rdata::RdataPtr sig,
 found->second) {
                     rrset->addRRsig(sig);
                 }
             }
 }}}
   (Although the first line looks too weird and we'd need some shortcut
   typedef).  When find() fails, the additional cost is just the call
   overhead of end(), which should be cheap.  If it succeeds we need
   the matching iterator anyway so it wouldn't be much costly
   (actually, in this case the original code performs the search twice
   and might be a bit more expensive).  And, this way, we could
   eliminate the additional haveSigsFor() method.

 - find(): what if columns[2] is not an empty string for columns[0]
   other than RRSIG?  Likewise, what if columns[2] for RRSIG is not
   equal to the actual "type covered" in the RDATA?  Actually, I
   personally this is a flaw in the schema, not an issue at this level,
   and I hope we'll fix the schema so that we won't have to bother this
   type of question.  But that'd be beyond the scope of this ticket.
   For now I suggest leaving some comments about that and doing
   nothing.

 - find(): I'd note why this is commented out or simply remove it for now:
 {{{
             //cur_sigtype(columns[2]);
 }}}

 - find(): this comment is incomplete, or the second "," should be "."?
 {{{
                 // There should be no other data, so cur_rrset should be
 empty,
 }}}
   and I suspect "cur_rrset" should be "result_rrset".

 - find(): I'd avoid using an additional variable in terms of
   preferring shorter length of methods:
 {{{
                 isc::dns::rdata::RdataPtr cur_rrsig(
                     isc::dns::rdata::createRdata(cur_type, getClass(),
                                                  columns[3]));
                 sig_store.addSig(cur_rrsig);
 }}}
   but that's probably a matter of taste.

 - find(): there's a more critical point in this code: createRdata()
   can throw an exception, and it's not caught.  But I think this
   should be addressed in a higher level context of how we generally
   handle error cases in find().  (See the higher level discussions
   above)

 '''sqlite3_connection.cc'''

 - In searchForRecords(), sqlite3_bind_text() can fail (and
   sqlite3_bind_int() may also fail in theory?), and we should catch
   the case even though it would be a rare event.

 - searchForRecords(): it assumes int for zone_id.  Since this is
   derived from the base class, we are stating all DB backend uses an
   integer to represent a specific zone.  maybe it's okay, but we may
   want to make it a bit more abstract (or we may revisit this point as
   we support other types of databases).  at this moment this is just a
   comment, not (necessarily) requesting to change the
   interface/implementation.

 - convertToPlainChar(): (I'm asking partly because I forgot:-) in
   which case can ucp be NULL?  Also, the main purpose of this helper
   function is to work around the C++ strictness on the difference
   between 'unsigned char*' and 'char *' (sqlite3 uses the former
   throughout its API, while std::string accepts the latter).  And,
   from a point of type safe purist this conversion isn't correct
   (because in theory it's not guaranteed that these two pointer types
   are convertible).  I'd note these points as comments.

 - getNextRecord(): although it should be rare and might be purely in
   theory, this method can internally throw an exception (bad_alloc).
   we should probably catch that within this method and convert it to
   DataSourceError so that the application can handle that situation
   more gracefully (e.g., exit after closing the DB connection).

 - getNextRecord(): this implementation does not provide the strong
   exception guarantee.  Considering the typical usage, I think it's
   acceptable, but I'd note the fact in the DatabaseConnection
   documentation.

 '''sqlite3_connection_unittest.cc'''

 - why do we need to include <memory>?  (strangely, we would have
   needed it before this branch because it previously used auto_ptr.
   but this branch actually removed it)

 - not a big deal especially in tests, but if I understand it correctly
   "conn" can be scoped_ptr instead of shared_ptr:
 {{{
 +    boost::shared_ptr<SQLite3Connection> conn;
 }}}
   As far as I can see the only reason why this is defined as a
   shared_ptr is because it's passed to countRecords(), but in this use
   case we can pass *conn (and countRecords() accepts it as
   SQLite3Connection&).

 - I'd examine the contents of the returned vector (not just the number
   of matching records)

 - if possible, I'd like to test the case where getNextRecord() results
   in DataSourceError().

 - if possible and making sense, I'd test searchForRecords() directly.

 - doFindTest(): toText() can be omitted here:
 {{{
     ASSERT_EQ(expected_result, result.code) << name.toText() << " " <<
 type.toText();
 }}}
   (besides, the current line is a bit too long)

 - DatabaseClientTest::find: I'd check some additional cases:
   - some possible cases were already mentioned above
   - CNAME then A (in addition to A then CNAME)
   - multiple CNAMEs
   - mixed order RRs without RRSIGs, such as "A, AAAA, then A", and try
     to find the A RRset
   - if not very hard, I'd check RDATA(s), not only the number of
     RDATAs and counts.  maybe we could use rrset[s]Check defined in
     lib/testutils/dnsmessage_test.h.
   - have createRdata() for RRSIG throw an exception (as pointed out
     for the main code the exception seems to be uncaught right now)
   - explicitly find a CNAME (which shouldn't result in
   - ZoneFinder::CNAME)
   - "sigtype for non RRSIG", sigtype mismatches RRSIG type covered
    (see the corresponding comment for find().)  This would be a
    placeholder test and would actually simply ignore the oddity for
    now.

-- 
Ticket URL: <http://bind10.isc.org/ticket/1062#comment:6>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development