BIND 10 #2439: update xfrin so it performs post transfer checks

Mon Jan 28 13:35:43 UTC 2013

#2439: update xfrin so it performs post transfer checks
-------------------------------------+-------------------------------------
            Reporter:  jinmei        |                        Owner:
                Type:  task          |  jinmei
            Priority:  medium        |                       Status:
           Component:  xfrin         |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130205
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  3             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |  loadzone-ng
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Replying to [comment:12 jinmei]:
 > So, yes, please send emails.  Chances are we may not be so lucky to
 > get satisfiable answer, at which point I'm okay with moving forward
 > with our guess.

 I just wish bind9 code would be commented with reasons for such obviously
 strange behaviours.

 Anyway, the email was sent on Friday and no answer yet.

 > Okay, and I think that requires a larger discussion.  For middle term,
 > I think it's relatively minor because it's less likely to receive this
 > type of broken zone data from xfrin in the first place.

 I agree that this one is minor, but I don't really like having invalid
 data in the database, and have them there forever. Anyway, I'll try to
 initiate discussion on the ML about this.

 > > Anyway, I don't feel a log message description is the right place for
 describing these kinds of differences. Maybe we should have a section in
 the guide somewhere for this?
 >
 > Perhaps.

 I'll ask Jeremy, if there's such place now or if we should add one.

 > First, as for the level of importance of this particular case.  Maybe
 > the difference is quite minor and subtle as we wouldn't normally see
 > this level of severe errors in the first place.  But I'd personally
 > note the fact as long as we notice it.
 >
 > Regarding the rest, I see several things to discuss.  As for why (I
 > think) we should document differences from BIND 9: because many of the
 > potential users of BIND 10 will be current BIND 9 users, and they will
 > generally expect compatible behaviors.  As you said, some level of
 > differences will be expected, but we should be responsible for making
 > them "informed" differences (and, when possible, it's better to
 > minimize differences, but that's another topic).

 One thing is informed differences. But I'm worried about drowning the
 users in too much information either. If we write too many little
 unimportant details, nobody will read it and the effect would be the same
 as if we don't document anything.

 In this case, I think this will happen:
  * Most people will never get a rejected zone by XfrIn, since the master
 will perform checks itself and won't ever send such data. Therefore
 reading about before will be waste of time for them. Even if they read it,
 they'd probably forget soon, because the difference doesn't really matter
 much in the event when it happens.
  * When it happens, everything important (eg. what happened) is described
 in the log message. In this case, the admin knows what to fix and what
 happened. It is not important at this moment what would happen in bind9,
 because they are running bind10 at the time.

 I don't think anybody relies on the specific bind9 behaviour, because that
 is such a rare case and if it happened to someone often, they'd probably
 try to prevent it sooner than on xfr.

 So, I'm not against documenting the difference, but I guess we'll have
 many more differences that are more interesting and not documented now.

 > But these opinions of mine may not be shared in the project.  Maybe we
 > should discuss it at the team call.

 I'll add the topic once there's an etherpad page.

 > {{{#!python
 >             # FIXME: Why is this .info? Even the messageID contains
 "ERROR".
 > }}}
 >   because this is not something you can (always) fix yourself.  In
 >   general, we don't log protocol errors on incoming data because
 >   otherwise they can be too noisy (while not easily be fixed by the
 >   admin that sees the message) and can hide other critical issues.
 >   I believe it's a widely adopted convention (although I'm afraid you
 >   don't like to follow widely adopted conventions:-).  But, on a
 >   closer look at the BIND 9 implementation, I found it log these types
 >   of event at the error level...hmm, maybe the rationale here is that
 >   xfrin doesn't happen too often in the first place and the sender is
 >   generally limited, predictable, and even often controllable.  We
 >   should probably discuss it at the dev list and/or the team call.

 I'm not against conventions. But if I look at the code and see a message
 marked as „ERROR“ logged under the „INFO“ level, I smell something is
 wrong, since that is inconsistent. I think if we're explicitly breaking
 code consistency even because of a convention, it needs to be commented.
 It's not clear from the code. After another 10 years, once someone will
 start writing bind11, they'll send emails asking why we do such strange
 things, the same as we send emails asking why bind9 does what it does.

 So, I'll ask tomorrow on the call. Then I'll either change the log level
 or add the comment.

 > Were these previous comments addressed?
 > - Confirm warnings (by themselves) don't result in rejection.

 This wasn't addressed in the unit tests (with the mock check_zone), but it
 was in the lettuce tests. I added call to the callbacks to see they don't
 prevent the zone from being used, which is more or less the same as
 testing the warning doesn't make the zone unusable. Checking the behaviour
 of check_zone is out of scope of unit tests for xfrin.

 > - IXFR case

 This was addressed, in `test_ixfr_response_fail_validation`.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2439#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development