BIND 10 #2252: Implement counters into Xfrin (1/3)
BIND 10 Development
do-not-reply at isc.org
Thu Apr 11 05:08:40 UTC 2013
#2252: Implement counters into Xfrin (1/3)
-------------------------------------+-------------------------------------
Reporter: naokikambe | Owner:
Type: enhancement | jinmei
Priority: medium | Status:
Component: xfrin | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20130423
Sub-Project: DNS | Resolution:
Estimated Difficulty: 5 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Changes (by naokikambe):
* owner: naokikambe => jinmei
Comment:
Hello,
Replying to [comment:20 jinmei]:
> Please read tests/lettuce/configurations/DO_NOT_USE_127.0.0.1:47807
> and adjusted the configuration. That's seemingly the reason for the
> failure I've seen.
Sorry, I missed the notes. I changed the port number to 47809. Could you
try the lettuce test again? The commit is:
{{{
71e0b2b [2252] use 127.0.0.1:47809 as address and port of the primary
server in lettuce
}}}
Replying to [comment:22 jinmei]:
> > {{{
> > ff8d95f [2252] update descriptions
> > }}}
> > However I don't think this could be so confusing for users. Because I
think users could see whether a real XFR is started from other counters
like axfrreqv4 or ixfrreqv4.
>
> They *could*, but it's inconvenient and error prone. That's the whole
> point. I can easily imagine a naive user writes (buggy) monitoring
> code like this:
> {{{#!python
> def check_xfr_status():
> # retrieve last_axfr_duration using cmdctl interface or stats_httpd
> last_duration = get_last_axfr_duration()
> if last_duration == 0: # BUG: it should also check axfrreqv6 and
axfrreqv4
> # statistics insist there has been no xfr-in successful attempt;
we know
> # the primary server has responded to xfr request. that should
mean
> # an evil attack. We need to retaliate immediately.
> launch_nuclear_missiles()
> }}}
> Sure, it's the user's fault that he/she doesn't read the documentation
> carefully and triggers a nuclear war as a result. But we know people
> don't read document, right (see my previous ticket comment, btw)?
>
> So, IMO, overloading like this simply means we are causing troubles
> for users due to our laziness.
(I would be so happy if such a user used statistics counters much as
above. But in fact, ..)
I think the user probably needs another trigger like `axfrreqv4` or
`axfrreqv6` or bind10 log or the primary server's log or something.
Because he/she must realize that the script would not work at first.
`last_axfr_duration` must keep being 0.0 until the first xfr is done. Also
in a real case, it would be a very rare that a real xfr is done in less
than a microsecond. Can we fix the issue on another ticket after this
ticket? Anyway ..
>
> In any case, if this is still not convincing and you think overloading
> is better, the latest description doesn't parse well to me:
>
> {{{
> Duration of the last IXFR. 0.0 means no successful
IXFR done
> in greater than or equal to a microsecond. If a
started
> timer is never stopped because of failure, start
time of
> duration will be reset next time.
> }}}
>
> Also, it's awkward to see concepts like a "timer" here because it's
> just an implementation detail.
I tried to revise the descriptions. Could you take a look at the commit?
{{{
0732633 [2252] update descriptions of last_ixfr_duration and
last_axfr_duration
}}}
>
> > > > > > > '''xfrin.py.in'''
>
> > Thank you for creating the ticket. I meant that "in future" was "in
near future" at that time. Sorry for confusion. I also think this issue is
serious. So shall we start #2884 after this ticket? Anyway, according to
this, I've updated description for avoiding confusion from users. Could
you see the following commit?
> > {{{
> > 20df0b0 [2252] update descriptions of zonename
> > }}}
>
> I'd say "e.g." before (IN, CH, HS).
That's right. I've revised at:
{{{
a50e0a4 [2252] revise description of zonename (add 'e.g.' before the zone
classes)
}}}
> > > - some class attributes (ipver, name_to_counter, master_addrinfo),
> > > are introduced while they are essentially instance attributes.
> > > using class attributes when they don't have to be so is generally
> > > not a good practice because change in one test instance could
affect
> > > another test. (This is also related to the issue that led to
> > > #2883). I suggest making them instance attributes.
> >
> > Hmm..., sorry, I cannot still understand what you worry about so much.
For example, if we intentionally change the class attribute 'ipver' to
'v7' inside of the test case like the following:
>
> First, this should be considered a principle matter. I hope we agree
> that global objects/variable, etc, are bad in general. singleton or
> class attributes are just modernized form of global, so, their use
> should generally be considered bad, and must be justified with a very
> strong reason if that's inevitable.
>
> In this specific case, completely overriding `ipver` actually creates
> an instance variable so the problem wouldn't be apparent. updating
> name_to_counter dict is probably a better example. But, again, this
> is basically a principle matter; the fact that it uses global without
> a reason is itself a problem.
We cannot cause such a problem as well by updating either
`name_to_counter` or `master_addrinfo` in that case. Because both of them
were defined as a tuple. I'd not like to disagree with using a global
variable. But I'm still not sure about difference between using a pure
global variable and using a class-scope reference.
Anyway I've removed them and changed to instance objects or getter
properties. A getter property cannot be changed to another reference
without a setter. They must be treated as read-only. The commit is:
{{{
084f1f5 [2252] remove some constant definitions from the class scope
}}}
>
> > BTW on my environment, checking initial statistics of Xfrout sometimes
failed. I suspect this is because a socket of Xfrout is already opened
before b10-stats asks to b10-xfrout. So I added exceptional cases to
checking. Could you see this change?
> >
> > {{{
> > d7885ec [2252] add an exceptional case when checking initial
statistics of Xfrout
> > }}}
>
> I'm not sure about this as it didn't happen for me. But, in general,
> I'd try to figure out how that actually happened, and then address the
> issue either by fixing the real issue or (if it's not easily
> avoidable) introducing a work around like this.
I'm not really sure but in fact I couldn't reproduce this issue. I
couldn't investigate further. So I've reverted such exceptional cases. If
the issue happens again, I would try to figure out the reason. The commit
is:
{{{
61ec72f [2252] remove exceptional cases when checking initial statistics
of Xfrout
}}}
>
> Specific code comments:
>
> '''xfrin_test.py'''
>
> - I'd define these as "private" by using double-underscore:
> `_orig_datetime`, `_orig_start_timer`, `_const_sec`.
I couldn't do only for `_const_sec` because I realized it was referred
from another child class. The commit is:
{{{
04e248b [2252] make the references class-local: _orig_datetime,
_orig_start_timer
}}}
> - I'd use camel convention for `fakedatatime`. I also suspect 'data'
> should be 'date'. So it would be:
> {{{#!python
> class FakeDateTime:
> @classmethod
> def now(cls): return time2
> }}}
I've revised as you suggested. Thanks. The commit is:
{{{
1cef659 [2252] update a fake class name to camel convention including a
typo fix
}}}
> - test_axfrreq_xfrfail: it only covers a subset of failure cases.
> since we don't have to reproduce the exact xfr processing in this
> context (which is done in tests for the protocol itself), I'd
> simplify the test by stealing a method and broaden the coverage:
> {{{#!python
> def test_axfrreq_xfrfail(self):
> """DON'T FORGET DOCUMENT IT"""
> self._check_init_statistics()
> count = 0
> for ex in [XfrinZoneError, XfrinProtocolError, XfrinException,
> Exception]:
> def exception_raiser():
> raise ex()
> self.conn._handle_xfrin_responses = exception_raiser
> self.assertEqual(self.conn.do_xfrin(False), XFRIN_FAIL)
> count += 1
> self._check_updated_statistics({'axfrreq' + self.ipver:
count,
> 'xfrfail': count})
> }}}
>
> - test_ixfrreq_xfrfail: first, same comment as axfrreq_xfrfail
> applies. secondly (while it may make it moot), it's not obvious
> why xfr is expected to fail in this case from the test code.
> Third, note that ixfr can result in `XfrinZoneUptodate` exception.
> Is it okay to consider it a failure (see also below)?
I've revised the two test cases by following your instruction. Thanks.
I've also added comments. Could you check that? The commit is:
{{{
5db9144 [2252] check an xfrfail incremented with multiple exceptions
}}}
>
> '''xfrin.py.in'''
>
> - In this revised code, if IXFR results in `XfrinZoneUptodate` xfrfail
> is counted. Is it intentional, and if so, is it reasonable? From a
> quick look, BIND 9 seems to consider it a success. In any case,
> this should be clarified in the documentation: what exactly
> "succeed/fail" mean.
I revised the codes to treat an xfr request as success in a case of raised
`XfrinZoneUptodate`. For consistency between BIND 9 and 10, I think we
should define this case as success. In fact, `do_xfrin()` already defines
so. The commit is:
{{{
8544c8e [2252] treat an xfr request as success with an XfrinZoneUptodate
raised
}}}
Regards,
--
Ticket URL: <http://bind10.isc.org/ticket/2252#comment:23>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list