BIND 10 #1062: add a base for DatabaseZoneHandle::find()

Wed Aug 10 14:04:26 UTC 2011

#1062: add a base for DatabaseZoneHandle::find()
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:  major  |  Sprint-20110816
                  Component:  data   |            Resolution:
  source                             |             Sensitive:  0
                   Keywords:         |           Sub-Project:  DNS
            Defect Severity:  N/A    |  Estimated Difficulty:  5.0
Feature Depending on Ticket:         |           Total Hours:  0
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by jelte):

 * owner:  jelte => jinmei

Comment:

 Replying to [comment:6 jinmei]:
 > First off, I made a few minor suggested changes to the branch.  I
 > believe they are trivial and non controversial, but please check.
 >

 look good, thanks

 > '''General Discussions'''
 >
 > - I agree with changing getNextRecord() to a non const member function
 >   for the reason you explained.  But this would also mean find()
 >   shouldn't be a cost member function of DatabaseClient::Finder (and
 >   therefore, of ZoneFinder).  It's not really good if we give up the
 >   constness at that abstract level (because for other types of data
 >   source clients such as the in memory one we'd be able to keep them
 >   const), but we shouldn't pretend that it's immutable when it's
 >   simply hidden inside a pointer (which gives the caller a false sense
 >   of safety).
 >

 ok, unconsted it

 > - Don't we need to use loggers in database and sqlite3_connection?
 >

 hmm, yes, completely forgot about those :/

 I added some in find(), it'll log either a successful return, or one of
 the
 errors as caught by the big try/catch introduced in this revision.

 I'm at this point not sure whether we need more lowlevel logging (in
 theory the
 problems in sqlite3 should bubble up as exceptions which are then logged.
 We may
 find that this is not the case and that we do need to go deeper).

 > - in database.cc, I guess the exception/error handling needs to be
 >   revised.  If I read the current implementation correctly, if some
 >   data in the database is broken and libdns++ throws an exception, the
 >   Finder can keep an ongoing transaction (e.g., in the case of
 >   sqlite3, it can perform sqlite3_step(), encounter an exception, and
 >   exit without doing sqlite3_reset() (btw I think such a case should
 >   be explicitly tested).  Also, depending on the underlying database
 >   and the connection implementation, searchForRecords() may also
 >   result in an exception with some incomplete state (even though it's
 >   not the case for sqlite3).  So, I guess we need something like this:
 >   - introduce a cleanup method to DatabaseConnection
 >   - enclose the entire block that performs any connection operations
 >     with try-catch
 > {{{
 > DatabaseClient::Finder::find(...) {
 >     try {
 >         connection_->searchForRecords(zone_id_, name.toText());
 >
 >         while (connection_->getNextRecord(columns)) {
 >         ...
 >         }
 >     } catch (const datasrc::DatabaseError& ex) {
 >         // this corresponds to a database system error such as
 connection
 >         // reset.  not sure how to react to it at this moment, but we
 should
 >         // probably separate this case with other isc::Exception's.
 >     } catch (const isc::Exception& ex) {
 >         // call the cleanup
 >       isc_throw(DataSourceError, ...);
 >     } catch (const std::exception& ex) {
 >         // call the cleanup
 >       throw;
 >     }
 > }}}
 >   (we may want to perform the cleanup in an RAII-like manner instead
 >   of calling the cleanup method explicitly with the catch block)
 >

 I've implemented this approach for now, doing it with RAII is probably
 better,
 but may mean some other interface changes as well, and probably an entire
 different approach to the way we use the sqlite stmt variables right now
 (although i guess some of the potential problems i see would also be
 present
 now)

 I did add some tests that make sure resetSearch() (the cleanup method) is
 called, both on completion of a call to find() and on any exception that
 is
 raised.

 > '''database.h/cc'''
 >
 > - maybe we want to use a plain old array instead of vector for
 >   columns, if we are sure it has a fixed size?  Even though we'd rely
 >   on hot spot caching for performance, find() is performance sensitive
 >   and would generally be nicer if it could be efficient.  Using a
 >   plain old type might also be beneficial because it's used as an
 >   interface with the database backend, which would often rely on
 >   C-based library (like libsqlite3, libmysql, etc).  On the other
 >   hand, it may not matter much especially because the same columns
 >   object will be used multiple times in find(), and once its used the
 >   allocated space will be reused.  So I don't have a strong opinion
 >   about this, but if we keep using a vector, maybe it's better to
 >   reserve the space on initialization:
 > {{{
 >     std::vector<std::string> columns(4);
 > }}}
 >   (but see also below on using the magic number).
 >

 Hmm, yes, a direct array is probably better, though also slightly more
 dangerous.

 Therefore I also added a second argument, the vector size. Doesn't provide
 full
 protection against passing in a smaller array, but at least it's
 something; the
 dabaseimplementation should check it.

 > - (general) I think it's better to define an enum or something to
 >   represent the semantics of the vector used in getNextRecord(),
 >   rather than hardcoding the magic numbers:
 > {{{
 >         if (columns.size() != 4) {
 > ...
 >             const isc::dns::RRType cur_type(columns[0]);
 >             const isc::dns::RRTTL cur_ttl(columns[1]);
 > }}}

 ack, done

 >
 > - I suggest removing the indentation before the beginning of lines in
 >   the unnamed namespace:
 > {{{
 > namespace {
 >     // Adds the given Rdata to the given RRset
 > ...
 > }}}
 >   to keep the lines shorter, especially if you don't like to rely on
 >   "using" directives.

 it's true, i don't :) (then again, I personally don't have much problems
 with
 wide lines, which you have probably noticed)
 anyhew removed indentation (and in one test, i found an anonymous
 namespace
 inside an anonymous namespace (we need to go deeper!), removed it)

 >
 > - getNextRecord() documentation in .h: since the semantics of
 >   "columns" vector is a public contract to add support for
 >   other databases (possibly by an external contributor), I think we
 >   should be a bit more detailed, e.g.,
 > {{{
 >   The \c columns must have exactly four entries, and these entries
 >   (in the appearing order) must meet the following conditions:
 >   - rrtype: The type of the RR.  The text must be accepted by the
 >     dns::RRType() constructor.
 >   - ttl: The TTL of the RR.  The text must be accepted by the
 >     dns::RRTTL() constructor.
 >   ...
 > }}}
 >  (btw, I'd call the first entry "rrtype" instead of "rdtype".  rdtype
 >   is a BIND 9 term, which has the notion of "rdataset", but
 >   protocol-wise this is not really the right term (IMO).  For the same
 >   reason, I personally think we should rename it in the sqlite3 schema
 >   accordingly before it's too late, but that's definitely beyond the
 >   scope of this ticket).

 ack, added documentation.

 >
 > - addOrCreate(): it's not clear how it's possible that type !=
 rrset->getType():
 > {{{
 >             // make sure the type is correct
 >             if (type != rrset->getType()) {
 > }}}
 >   Is it intended to exclude the case of CNAME + other type?  If so,

 No, CNAME+other type is already explicitely caught. This is to catch our
 own
 code in find() doing bad things :)

 EDIT; due to the missing test case you caught for CNAMEs, the 'other'
 cname
 error did indeed end up being caught by this, fixed the caller.

 >   I'd personally catch that case explicitly in find() and would rather
 >   assert() the type equality here.  But it's probably a matter of
 >   preference, so I don't insist.  If you want to handle it here,
 >   however, please clarify the expected case(s) in a comment.
 >   (BTW, lcov indicates this case isn't covered by tests)

 yes well with the current code it shouldn't be possible, and since
 addOrCreate
 is in anonymous namespace we can't directly call it (or did we have a
 clean way
 to do that?), so it's not covered by a test...

 made it an assert.

 >
 > - addOrCreate(): according to lcov, this case isn't covered by tests:
 > {{{
 >             if (ttl < rrset->getTTL()) {
 >                 rrset->setTTL(ttl);
 >             }
 > }}}
 >

 ohright. added.

 > - addOrCreate(): when could rdata_str be validly an empty string?
 > {{{
 >         if (rdata_str != "") {
 > }}}
 >   For an RR that may have an empty RDATA?  In any case, I don't think
 >   we can simply ignore the case of an empty string because it may not
 >   be allowed depending on the RR type (actually in many cases it
 >   shouldn't be allowed).  It seems we should simply pass rdata_str to
 >   createRdata() unconditionally and let the function check whether an
 >   empty string is acceptable.
 >

 ok, removed

 > - addOrCreate(): you don't need toText() for name or type:
 > {{{
 >                 isc_throw(DataSourceError,
 >                           "bad rdata in database for " << name.toText()
 << " "
 >                           << type.toText() << " " << ivrt.what());
 > }}}
 >

 ack, removed

 > - RRsigStore class: I'd use ConstRdataPtr instead of RdataPtr
 >   throughout this class (and its caller).  Unfortunately, however, it
 >   probably wouldn't work because RRset::addRRsig() expects a non const
 >   parameter.  There doesn't seem to be a reason why it needs a mutable
 >   RdataPtr and I suspect we should be able to change that, but that
 >   would be beyond the scope of this ticket.
 >

 right. Should we make that a ticket, we should add a note to update this
 code :)

 > - addSig(): it seems the insertion logic could be much more
 >   simplified:
 > {{{
 >             sigs[type_covered].push_back(sig_rdata);
 > #ifdef no_need_for_this
 >             if (!haveSigsFor(type_covered)) {
 >                 sigs[type_covered] =
 std::vector<isc::dns::rdata::RdataPtr>();
 >             }
 >             sigs.find(type_covered)->second.push_back(sig_rdata);
 > #endif
 > }}}
 >   Note also that if we do this we don't need a separate method of
 >   haveSigsFor() any more.  (I'd also note that haveSigsFor() would
 >   better be a private function if we want to keep it, especially if
 >   and when we want to use this class publicly)
 >

 ack. less code good. (originally i didn't do appendSignatures

 > - addSig(): if and when we want to use it publicly, I'd change
 >   static_cast to dynamic_cast or have the caller pass type_covered
 >   information and avoid casting.

 in this case the caller would have to cast anyway (though it's closer to
 the
 creation and probably safer. Added a note that we should do that if we
 move it
 out)

 >
 > - appendSignatures(): maybe a matter of taste, but I'd rather simply
 >   get the iterator via find() unconditionally:
 > {{{
 >             std::map<isc::dns::RRType,
 std::vector<isc::dns::rdata::RdataPtr> >::const_iterator found =
 >                 sigs.find(rrset->getType());
 >             if (found != sigs.end()) {
 >                 BOOST_FOREACH(isc::dns::rdata::RdataPtr sig,
 found->second) {
 >                     rrset->addRRsig(sig);
 >                 }
 >             }
 > }}}
 >   (Although the first line looks too weird and we'd need some shortcut
 >   typedef).  When find() fails, the additional cost is just the call
 >   overhead of end(), which should be cheap.  If it succeeds we need
 >   the matching iterator anyway so it wouldn't be much costly
 >   (actually, in this case the original code performs the search twice
 >   and might be a bit more expensive).  And, this way, we could
 >   eliminate the additional haveSigsFor() method.
 >

 ok, taken over your code. For one single local use I don't mind having no
 typedef btw.

 > - find(): what if columns[2] is not an empty string for columns[0]
 >   other than RRSIG?  Likewise, what if columns[2] for RRSIG is not
 >   equal to the actual "type covered" in the RDATA?  Actually, I
 >   personally this is a flaw in the schema, not an issue at this level,
 >   and I hope we'll fix the schema so that we won't have to bother this
 >   type of question.  But that'd be beyond the scope of this ticket.
 >   For now I suggest leaving some comments about that and doing
 >   nothing.
 >

 yes. I was wondering what to do about sigtype anyway, since we technically
 don't
 need it (though it could be a good optimization), and don't use it here
 (yet).

 which is why i left this in:

 > {{{
 >             //cur_sigtype(columns[2]);
 > }}}

 I added some comments.

 >
 > - find(): this comment is incomplete, or the second "," should be "."?
 > {{{
 >                 // There should be no other data, so cur_rrset should be
 empty,
 > }}}
 >   and I suspect "cur_rrset" should be "result_rrset".
 >

 Ack, should be "." and result_rrset. Changed.

 > - find(): I'd avoid using an additional variable in terms of
 >   preferring shorter length of methods:
 > {{{
 >                 isc::dns::rdata::RdataPtr cur_rrsig(
 >                     isc::dns::rdata::createRdata(cur_type, getClass(),
 >                                                  columns[3]));
 >                 sig_store.addSig(cur_rrsig);
 > }}}
 >   but that's probably a matter of taste.
 >

 no you are right, it can be done wihtout the var, changed.

 > - find(): there's a more critical point in this code: createRdata()
 >   can throw an exception, and it's not caught.  But I think this
 >   should be addressed in a higher level context of how we generally
 >   handle error cases in find().  (See the higher level discussions
 >   above)
 >

 Actually, while the higher-level catch should at clean this up correctly,
 i also added a specific exception here (next to InvalidRRType and
 InvalidRRTTL),
 where we still have access to the source data, for a better exception
 message.

 > '''sqlite3_connection.cc'''
 >
 > - In searchForRecords(), sqlite3_bind_text() can fail (and
 >   sqlite3_bind_int() may also fail in theory?), and we should catch
 >   the case even though it would be a rare event.
 >

 ack, done

 > - searchForRecords(): it assumes int for zone_id.  Since this is
 >   derived from the base class, we are stating all DB backend uses an
 >   integer to represent a specific zone.  maybe it's okay, but we may
 >   want to make it a bit more abstract (or we may revisit this point as
 >   we support other types of databases).  at this moment this is just a
 >   comment, not (necessarily) requesting to change the
 >   interface/implementation.
 >

 Yes I wondered about this as well. I thought about changing this to some
 abstract Handle class, but did not want to change that part of the API in
 this
 ticket. We could either make this just a simple container (for, in the
 case of
 sqlite3 and probably most sqls, an integer), but possibly this could even
 do the
 RAII for the current transaction. (though probably not, and we should keep
 that
 to searchForRecords and getNextRecord)

 > - convertToPlainChar(): (I'm asking partly because I forgot:-) in
 >   which case can ucp be NULL?  Also, the main purpose of this helper
 >   function is to work around the C++ strictness on the difference
 >   between 'unsigned char*' and 'char *' (sqlite3 uses the former
 >   throughout its API, while std::string accepts the latter).  And,
 >   from a point of type safe purist this conversion isn't correct
 >   (because in theory it's not guaranteed that these two pointer types
 >   are convertible).  I'd note these points as comments.
 >

 IIRC the return value from sqlite_column_text() can be NULL if the
 database
 field itself was NULL or if it has run out of memory. But the
 documentation
 seems to be a bit unclear as to this specific value (it does mention NULL
 for
 BLOB, but not explicitely for string).

 I've added some comments. BTW, I can also get around the conversion by
 using
 string::assign(), which makes the compiler do its magic on its own. But
 explicit
 casts are probably better :)

 > - getNextRecord(): although it should be rare and might be purely in
 >   theory, this method can internally throw an exception (bad_alloc).
 >   we should probably catch that within this method and convert it to
 >   DataSourceError so that the application can handle that situation
 >   more gracefully (e.g., exit after closing the DB connection).
 >

 done

 > - getNextRecord(): this implementation does not provide the strong
 >   exception guarantee.  Considering the typical usage, I think it's
 >   acceptable, but I'd note the fact in the DatabaseConnection
 >   documentation.
 >

 added a bit

 > '''sqlite3_connection_unittest.cc'''
 >
 > - why do we need to include <memory>?  (strangely, we would have
 >   needed it before this branch because it previously used auto_ptr.
 >   but this branch actually removed it)
 >

 oops, leftover from an earlier bugfix problem (related to the inclusion of
 <memory>). Removed.

 > - not a big deal especially in tests, but if I understand it correctly
 >   "conn" can be scoped_ptr instead of shared_ptr:
 > {{{
 > +    boost::shared_ptr<SQLite3Connection> conn;
 > }}}
 >   As far as I can see the only reason why this is defined as a
 >   shared_ptr is because it's passed to countRecords(), but in this use
 >   case we can pass *conn (and countRecords() accepts it as
 >   SQLite3Connection&).
 >

 ack, changed. Actually, for countRecords, see below.

 > - I'd examine the contents of the returned vector (not just the number
 >   of matching records)
 >

 Done. The change made getRecordCount unnecessary, so i removed it (there
 is,
 however, now a new helper that checks the columns).

 > - if possible, I'd like to test the case where getNextRecord() results
 >   in DataSourceError().
 >

 as a side-effect of adding a column-count, i had also added one check for
 that.
 But I assume you mean for the DataSourceError that was already in there :)

 I do not know of any dependable way to make sqlite3_step return an error,
 though...

 > - if possible and making sense, I'd test searchForRecords() directly.
 >

 I don't know what to test; success is implicitely tested by the
 search+getnextrecord tests. We could perhaps try to force the now-checked
 bind()
 calls to error, but I don't know how :)

 > - doFindTest(): toText() can be omitted here:
 > {{{
 >     ASSERT_EQ(expected_result, result.code) << name.toText() << " " <<
 type.toText();
 > }}}
 >   (besides, the current line is a bit too long)
 >

 ack, changed.

 > - DatabaseClientTest::find: I'd check some additional cases:
 >   - some possible cases were already mentioned above
 >   - CNAME then A (in addition to A then CNAME)

 added (badcname2.example.org)

 >   - multiple CNAMEs

 added (badcname3.example.org)

 >   - mixed order RRs without RRSIGs, such as "A, AAAA, then A", and try
 >     to find the A RRset

 added ('www2.example.org')

 >   - if not very hard, I'd check RDATA(s), not only the number of
 >     RDATAs and counts.  maybe we could use rrset[s]Check defined in
 >     lib/testutils/dnsmessage_test.h.
 >   - have createRdata() for RRSIG throw an exception (as pointed out
 >     for the main code the exception seems to be uncaught right now)

 added ('badsig.example.org')

 >   - explicitly find a CNAME (which shouldn't result in
 >   ZoneFinder::CNAME)

 added, (second test for cname.example.org)

 >   - "sigtype for non RRSIG", sigtype mismatches RRSIG type covered
 >    (see the corresponding comment for find().)  This would be a
 >    placeholder test and would actually simply ignore the oddity for
 >    now.
 >

 btw, i've removed emptyvector; it would only test the fake test
 database...

-- 
Ticket URL: <http://bind10.isc.org/ticket/1062#comment:8>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development