BIND 10 #2439: update xfrin so it performs post transfer checks

Fri Jan 25 21:16:09 UTC 2013

#2439: update xfrin so it performs post transfer checks
-------------------------------------+-------------------------------------
            Reporter:  jinmei        |                        Owner:
                Type:  task          |  jinmei
            Priority:  medium        |                       Status:
           Component:  xfrin         |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130205
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  3             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |  loadzone-ng
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:11 vorner]:

 > > - I've noticed a slight gap from BIND 9 (based on code inspection, so
 > >   maybe I'm wrong): [...]
 >
 > Well, I spent something like an hour trying to get through all the
 indirect
 > calls, macros and gotos of bind9. After that time I gave up, deciding I
 have no
 > clue what is done with the database or with the in-memory image.
 >
 > I'll write an email about that. But I suspect it was just easier
 solution for bind9, while this is easier for us. At least, I don't think
 we have a way to store it in the database and not use it now and storing
 the wrong one seems incorrect. We probably could wipe the whole zone out,
 but that doesn't feel right either. And it doesn't matter much. Anyway,
 having the db version and the in-memory version inconsistent seems like a
 bad idea.

 I admit the BIND 9 implementation is quite complicated and sometimes
 we have to just give up spending more time for understanding it.  It's
 also quite possible that we can't get any response even if we ask for
 clarification from the BIND 9 developers.  Still, I'd like to make a
 reasonable amount of such efforts rather than just guessing what/why
 it does.  Of course, not everything that BIND 9 does would make sense,
 and sometimes we may just find our guess is correct.  But since BIND 9
 has a long history of addressing real world subtle issues and has been
 updated based on such accumulated knowledge, IMO it's not only stupid
 but also dangerous not try to learn from them.

 So, yes, please send emails.  Chances are we may not be so lucky to
 get satisfiable answer, at which point I'm okay with moving forward
 with our guess.

 > > - check at least one rejection case with sqlite3.  it's redundant in
 > >   some sense, but sometimes we (I) find real-world cases have an issue
 > >   that aren't revealed just with mocks.
 >
 > I did these as lettuce tests. And I did find a bug in xfrin, but it
 seems unrelated to the new changes, so I'll create a new ticket for it.
 When the download fails (tested with validation error, but I think it
 happens if it fails for other reason too) and the zone didn't exist
 before, an empty zone (that doesn't work) is created and makes the server
 SERVFAIL on query. We should either roll back (eg. remove the zone again)
 like in b10-loadzone or restart the discussion about interface for
 creating zones, and provide a way to create the zone in the same
 transaction as filling in the data, so we can just rollback the
 transaction as a whole.

 Okay, and I think that requires a larger discussion.  For middle term,
 I think it's relatively minor because it's less likely to receive this
 type of broken zone data from xfrin in the first place.

 > > Also, wherever we note that, I'd also describe some more things:
 > > - depending on what to do with the gap from BIND 9 discussed above,
 > >   explain the difference from BIND 9 and the rationale.
 > > - related, although not directly related to this task: there's another
 > >   gap from BIND 9: we don't commit IXFR unless all IXFR responses are
 > >   processed.  So there's another type of difference in the DB state
 > >   if IXFR is rejected in the middle of it.  I'd explain the difference
 > >   and the rationale here, too.
 >
 > This one was discussed and was decided we need to do it this way,
 because there might be TSIG-unsigned messages in the middle and we can't
 really commit them until the whole session is properly signed. I don't
 know if bind9 supports the mode where some messages in the middle aren't
 signed or how it is solved there.
 >
 > Anyway, I don't feel a log message description is the right place for
 describing these kinds of differences. Maybe we should have a section in
 the guide somewhere for this?

 Perhaps.

 > Also, is that difference important enough? And, at how many places does
 bind9 describe differences from bind8? I mean, we are creating new, stand-
 alone product, not a mere substitute for bind9. In that sense, I'm not
 against describing differences, but some level of differences could be
 expected. Also, if doing so, it would make sense to describe differences
 to NSD or Knot or other servers.

 First, as for the level of importance of this particular case.  Maybe
 the difference is quite minor and subtle as we wouldn't normally see
 this level of severe errors in the first place.  But I'd personally
 note the fact as long as we notice it.

 Regarding the rest, I see several things to discuss.  As for why (I
 think) we should document differences from BIND 9: because many of the
 potential users of BIND 10 will be current BIND 9 users, and they will
 generally expect compatible behaviors.  As you said, some level of
 differences will be expected, but we should be responsible for making
 them "informed" differences (and, when possible, it's better to
 minimize differences, but that's another topic).

 Secondly, as for whether BIND 9 documents differences from BIND 8: it
 sometimes does.  See, for example, the statistics section of BIND 9
 ARM.  But that actually doesn't matter to me; in my understanding one
 background motivation of BIND 10 against BIND 9 was BIND 9 often lacks
 transparency, including in that of documentation about why it behavior
 that particular way (I thought Shane talked about it in a public
 interview or something, but I couldn't find a link).  So, IMO, this is
 something we should do even if BIND 9 doesn't against BIND 8.

 Thirdly, as for differences with other implementations: yes, I think
 that's useful, and I sometimes document such differences myself.  See
 the doxygen comment for the `TSIGRecord` class of libdns++.  But
 that's relatively less important based on why I think it's important
 to describe differences with BIND 9 as I explained above.

 But these opinions of mine may not be shared in the project.  Maybe we
 should discuss it at the team call.

 > > - As for this:
 > > {{{#!python
 > >         # Any idea why python doesn't agree with inheriting from
 > >         # dns.RRsetCollection?
 > > }}}
 > >   I don't know what you wanted to do by inheriting from
 > >   `dns.RRsetCollection` (not from the base), but if it's really
 > >   necessary you should specify `Py_TPFLAGS_BASETYPE` in the `tp_flags`
 > >   field of the corresponding `PyTypeObject`.  See the difference of
 > >   this member between `rrset_collection_base_type` and
 > >   `rrset_collection_type`.  But in any event, in this particular case
 > >   I don't see the need for that trick: the current mock seems to be
 > >   sufficient.
 >
 > Actually, I was slightly confused (I'm still not used to calling the
 base class „Base“ in its name, I'm used to calling the base class by
 generic name and then differing the implementations) and wrongly tried to
 used that one. And I was surprised by the error message and because I
 didn't know about the existence of the flag, so I guessed all the wrapper
 classes are not possible to inherit. I did use the „Base“ one now, which
 acts the way I want (with a little catch that I had to create the
 constructor and not call the inherited one).

 Yeah, actually, the naming convention isn't consistent throughout our
 code, and that's probably one source of confusion.  Ideally we should
 make them consistent, but I'm afraid that will be given lower
 priority.

 Now, comments on the revised branch:

 '''xfrin.py.in'''
 - docstring for XfrinZoneError: "contain invalid data" doesn't sound
   very accurate in that the error includes things like *missing*,
   rather than contain, NS (and actually that's the only serious error
   for now).
 {{{
     An exception raised when the received zone contain invalid data.
 }}}
   I'd say something like "when the received zone is considered
   sufficiently broken".
 - not related to this ticket, but I noticed there's a "TBD".  it would
   be nice to fix it at this opportunity:
 {{{
 class XfrinZoneUptodate(Exception):
     '''TBD
     '''
     pass
 }}}
 - about this:
 {{{#!python
             # FIXME: Why is this .info? Even the messageID contains
 "ERROR".
 }}}
   because this is not something you can (always) fix yourself.  In
   general, we don't log protocol errors on incoming data because
   otherwise they can be too noisy (while not easily be fixed by the
   admin that sees the message) and can hide other critical issues.
   I believe it's a widely adopted convention (although I'm afraid you
   don't like to follow widely adopted conventions:-).  But, on a
   closer look at the BIND 9 implementation, I found it log these types
   of event at the error level...hmm, maybe the rationale here is that
   xfrin doesn't happen too often in the first place and the sender is
   generally limited, predictable, and even often controllable.  We
   should probably discuss it at the dev list and/or the team call.

 '''xfrin_test.py'''

 Were these previous comments addressed?
 - Confirm warnings (by themselves) don't result in rejection.
 - IXFR case
 I think these should be done in the unittest level.

 '''xfrin_messages.mes'''

 - %1 is missing.  I also noticed this isn't logged via unit tests.
 {{{
 % XFRIN_INVALID_ZONE_DATA zone %2 received from %3 contains invalid data
 }}}

 '''diff_tests.py'''

 - This should be `super().__init__()`
 {{{
             so we need to provide our own to shadow it -- and make sure
             not to call the parent().__init__().
 }}}
 - should this be `isinstance(collection, self.Collection)`?
 {{{#!python
         self.assertTrue(collection, self.Collection)

 }}}
   (I didn't notice that in the first review)

-- 
Ticket URL: <http://bind10.isc.org/ticket/2439#comment:12>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development