BIND 10 #2439: update xfrin so it performs post transfer checks
BIND 10 Development
do-not-reply at isc.org
Fri Jan 25 21:16:09 UTC 2013
#2439: update xfrin so it performs post transfer checks
-------------------------------------+-------------------------------------
Reporter: jinmei | Owner:
Type: task | jinmei
Priority: medium | Status:
Component: xfrin | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20130205
Sub-Project: DNS | Resolution:
Estimated Difficulty: 3 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| loadzone-ng
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:11 vorner]:
> > - I've noticed a slight gap from BIND 9 (based on code inspection, so
> > maybe I'm wrong): [...]
>
> Well, I spent something like an hour trying to get through all the
indirect
> calls, macros and gotos of bind9. After that time I gave up, deciding I
have no
> clue what is done with the database or with the in-memory image.
>
> I'll write an email about that. But I suspect it was just easier
solution for bind9, while this is easier for us. At least, I don't think
we have a way to store it in the database and not use it now and storing
the wrong one seems incorrect. We probably could wipe the whole zone out,
but that doesn't feel right either. And it doesn't matter much. Anyway,
having the db version and the in-memory version inconsistent seems like a
bad idea.
I admit the BIND 9 implementation is quite complicated and sometimes
we have to just give up spending more time for understanding it. It's
also quite possible that we can't get any response even if we ask for
clarification from the BIND 9 developers. Still, I'd like to make a
reasonable amount of such efforts rather than just guessing what/why
it does. Of course, not everything that BIND 9 does would make sense,
and sometimes we may just find our guess is correct. But since BIND 9
has a long history of addressing real world subtle issues and has been
updated based on such accumulated knowledge, IMO it's not only stupid
but also dangerous not try to learn from them.
So, yes, please send emails. Chances are we may not be so lucky to
get satisfiable answer, at which point I'm okay with moving forward
with our guess.
> > - check at least one rejection case with sqlite3. it's redundant in
> > some sense, but sometimes we (I) find real-world cases have an issue
> > that aren't revealed just with mocks.
>
> I did these as lettuce tests. And I did find a bug in xfrin, but it
seems unrelated to the new changes, so I'll create a new ticket for it.
When the download fails (tested with validation error, but I think it
happens if it fails for other reason too) and the zone didn't exist
before, an empty zone (that doesn't work) is created and makes the server
SERVFAIL on query. We should either roll back (eg. remove the zone again)
like in b10-loadzone or restart the discussion about interface for
creating zones, and provide a way to create the zone in the same
transaction as filling in the data, so we can just rollback the
transaction as a whole.
Okay, and I think that requires a larger discussion. For middle term,
I think it's relatively minor because it's less likely to receive this
type of broken zone data from xfrin in the first place.
> > Also, wherever we note that, I'd also describe some more things:
> > - depending on what to do with the gap from BIND 9 discussed above,
> > explain the difference from BIND 9 and the rationale.
> > - related, although not directly related to this task: there's another
> > gap from BIND 9: we don't commit IXFR unless all IXFR responses are
> > processed. So there's another type of difference in the DB state
> > if IXFR is rejected in the middle of it. I'd explain the difference
> > and the rationale here, too.
>
> This one was discussed and was decided we need to do it this way,
because there might be TSIG-unsigned messages in the middle and we can't
really commit them until the whole session is properly signed. I don't
know if bind9 supports the mode where some messages in the middle aren't
signed or how it is solved there.
>
> Anyway, I don't feel a log message description is the right place for
describing these kinds of differences. Maybe we should have a section in
the guide somewhere for this?
Perhaps.
> Also, is that difference important enough? And, at how many places does
bind9 describe differences from bind8? I mean, we are creating new, stand-
alone product, not a mere substitute for bind9. In that sense, I'm not
against describing differences, but some level of differences could be
expected. Also, if doing so, it would make sense to describe differences
to NSD or Knot or other servers.
First, as for the level of importance of this particular case. Maybe
the difference is quite minor and subtle as we wouldn't normally see
this level of severe errors in the first place. But I'd personally
note the fact as long as we notice it.
Regarding the rest, I see several things to discuss. As for why (I
think) we should document differences from BIND 9: because many of the
potential users of BIND 10 will be current BIND 9 users, and they will
generally expect compatible behaviors. As you said, some level of
differences will be expected, but we should be responsible for making
them "informed" differences (and, when possible, it's better to
minimize differences, but that's another topic).
Secondly, as for whether BIND 9 documents differences from BIND 8: it
sometimes does. See, for example, the statistics section of BIND 9
ARM. But that actually doesn't matter to me; in my understanding one
background motivation of BIND 10 against BIND 9 was BIND 9 often lacks
transparency, including in that of documentation about why it behavior
that particular way (I thought Shane talked about it in a public
interview or something, but I couldn't find a link). So, IMO, this is
something we should do even if BIND 9 doesn't against BIND 8.
Thirdly, as for differences with other implementations: yes, I think
that's useful, and I sometimes document such differences myself. See
the doxygen comment for the `TSIGRecord` class of libdns++. But
that's relatively less important based on why I think it's important
to describe differences with BIND 9 as I explained above.
But these opinions of mine may not be shared in the project. Maybe we
should discuss it at the team call.
> > - As for this:
> > {{{#!python
> > # Any idea why python doesn't agree with inheriting from
> > # dns.RRsetCollection?
> > }}}
> > I don't know what you wanted to do by inheriting from
> > `dns.RRsetCollection` (not from the base), but if it's really
> > necessary you should specify `Py_TPFLAGS_BASETYPE` in the `tp_flags`
> > field of the corresponding `PyTypeObject`. See the difference of
> > this member between `rrset_collection_base_type` and
> > `rrset_collection_type`. But in any event, in this particular case
> > I don't see the need for that trick: the current mock seems to be
> > sufficient.
>
> Actually, I was slightly confused (I'm still not used to calling the
base class „Base“ in its name, I'm used to calling the base class by
generic name and then differing the implementations) and wrongly tried to
used that one. And I was surprised by the error message and because I
didn't know about the existence of the flag, so I guessed all the wrapper
classes are not possible to inherit. I did use the „Base“ one now, which
acts the way I want (with a little catch that I had to create the
constructor and not call the inherited one).
Yeah, actually, the naming convention isn't consistent throughout our
code, and that's probably one source of confusion. Ideally we should
make them consistent, but I'm afraid that will be given lower
priority.
Now, comments on the revised branch:
'''xfrin.py.in'''
- docstring for XfrinZoneError: "contain invalid data" doesn't sound
very accurate in that the error includes things like *missing*,
rather than contain, NS (and actually that's the only serious error
for now).
{{{
An exception raised when the received zone contain invalid data.
}}}
I'd say something like "when the received zone is considered
sufficiently broken".
- not related to this ticket, but I noticed there's a "TBD". it would
be nice to fix it at this opportunity:
{{{
class XfrinZoneUptodate(Exception):
'''TBD
'''
pass
}}}
- about this:
{{{#!python
# FIXME: Why is this .info? Even the messageID contains
"ERROR".
}}}
because this is not something you can (always) fix yourself. In
general, we don't log protocol errors on incoming data because
otherwise they can be too noisy (while not easily be fixed by the
admin that sees the message) and can hide other critical issues.
I believe it's a widely adopted convention (although I'm afraid you
don't like to follow widely adopted conventions:-). But, on a
closer look at the BIND 9 implementation, I found it log these types
of event at the error level...hmm, maybe the rationale here is that
xfrin doesn't happen too often in the first place and the sender is
generally limited, predictable, and even often controllable. We
should probably discuss it at the dev list and/or the team call.
'''xfrin_test.py'''
Were these previous comments addressed?
- Confirm warnings (by themselves) don't result in rejection.
- IXFR case
I think these should be done in the unittest level.
'''xfrin_messages.mes'''
- %1 is missing. I also noticed this isn't logged via unit tests.
{{{
% XFRIN_INVALID_ZONE_DATA zone %2 received from %3 contains invalid data
}}}
'''diff_tests.py'''
- This should be `super().__init__()`
{{{
so we need to provide our own to shadow it -- and make sure
not to call the parent().__init__().
}}}
- should this be `isinstance(collection, self.Collection)`?
{{{#!python
self.assertTrue(collection, self.Collection)
}}}
(I didn't notice that in the first review)
--
Ticket URL: <http://bind10.isc.org/ticket/2439#comment:12>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list