BIND 10 #2178: Logging for Out-of-zone data

Fri Aug 24 06:51:58 UTC 2012

#2178: Logging for Out-of-zone data
-------------------------------------+-------------------------------------
                   Reporter:  jreed  |                 Owner:  jinmei
                       Type:         |                Status:  reviewing
  defect                             |             Milestone:
                   Priority:         |  Sprint-20120904
  medium                             |            Resolution:
                  Component:  data   |             Sensitive:  0
  source                             |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  3
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 In short, I'm okay with merging the branch.  I've fixed some small
 things, and there are some other small open points, which I'd leave to
 you.  But larger discussion points occurred to me while reviewing the
 code (not only for the changes in this branch).

 Things already addressed:
 - client_list_unittest.cc didn't compile for me because the result
   class doesn't have operator<< (Does it work for a newer version of
   gtest?)  I've fixed it myself and committed it.  And, while doing
   so, I've renamed negativeResult_ to negative_result_ so it confirms
   to the variable naming convention.
 - I did some other minor editorial fixes.

 Possible minor issues:
 - `MasterLoadError` can only happen from here:
 {{{#!cpp
                         if (type == "MasterFiles") {
 finder->load(paramConf->get(*it)->stringValue());
 }}}
   so we can narrow the try block.
 - I'd log the zone name and class in the unified form here:
 {{{#!cpp
                     } catch (const isc::dns::MasterLoadError& mle) {
                         LOG_ERROR(logger, DATASRC_MASTERLOAD_ERROR)
                             .arg(mle.what());
 }}}

 About Changelog:
 I think this is primarily a fix to lib(b10-)datasrc, and the change in
 b10-auth is an effect of it that is just most visible (and maybe the
 only at this moment) to the user.  So, I'd say something like this:

 {{{
 libdatasrc: the data source client list class now ignores zone content
 problems (such as out-of-zone data) in MasterFiles type zones, instead
 of aborting the entire configuration.  It only logs an error, and all
 other zones and datasources are still loaded. The error log message
 has been improved to include the zone origin and source file name.  As
 a result of this change, b10-auth no longer exits upon encountering
 such errors.
 }}}

 Bigger issues:
 - The current error handling in ConfigurableClientList::configure
   seems quite ad hoc and policyless.  It's not clear in which case we
   (intend to) abort the configuration, accept partial error in a
   single data source, accept total error in a single data source, etc.
   What I really would like to see is a general policy about these,
   introduce higher level exceptions to represent the concept, and let
   the underlying data source (client) implementation throw them
   according to the severity of specific errors.  I think it's also
   related to your second "addition" thing.
 - Related to the previous point, catching `MasterLoadError` seems
   fragile, because it's not even clear in client_list that the data
   source uses masterLoad() for loading the zone.
 - Also related, I guess it's probably too strict to refuse loading
   completely when we see an out-of-zone name.  For example, BIND 9's
   zone parser warns about it and continue parsing.  We should probably
   do the same.

 But I don't request addressing these in this ticket.  Some of them are
 clearly beyond the scope of this ticket, and some may be too big
 anyway, and we'll substantially change the loader and parser pretty
 soon, so making big changes in this area immaturely wouldn't be a good
 idea.  It would still be a good idea to make some comments about these
 points so we can remember it later.

 Replying to [comment:3 jelte]:

 > - the config update hack in auth's main.cc (line 205) appears to be
 unnecessary; an update is immediately called anyway.

 Is this ("unnecessary") true?  My understanding is that even if the
 update callback is called it doesn't provide the default
 configuration.  Has it been fixed, or is it different for remote
 configuration?

 > - a potentially bigger problem that is shown by this ticket is that
 reconfigure() in the DataSourceConfigurator can throw, which it shouldn't,
 as it is used as a remote config callback; we should probably catch and
 log std::exception and maybe even (...) as well (in
 reconfigureInternal()).

 See above.  I agree we should revisit the error handling policy (and
 implementation).  But as for "it is used as a remote config callback",
 I guess it's part of another big problem of what we should do if
 remote config update fails in general.  Also, I think std or even
 unexpected type of exceptions should actually kill (maybe gracefully)
 the process just like in other cases.  We generally consider these
 an unrecoverable error, and rather than trying to deal with it, just
 let the process die and restart.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2178#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development