BIND 10 #2252: Implement counters into Xfrin (1/3)
BIND 10 Development
do-not-reply at isc.org
Sat Apr 6 02:07:13 UTC 2013
#2252: Implement counters into Xfrin (1/3)
-------------------------------------+-------------------------------------
Reporter: naokikambe | Owner:
Type: enhancement | naokikambe
Priority: medium | Status:
Component: xfrin | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20130423
Sub-Project: DNS | Resolution:
Estimated Difficulty: 5 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:19 naokikambe]:
> > > > > > - as for time_to_ixfr, what if there's never been an xfr for
the zone?
> > > > > It shows still the default value, 0.0.
> > > >
> > > > It seems to be suboptimal in that using a magic number (especially
in
> > > > the valid range - [...]
> > >
> > > I updated descriptions about the default value `0.0`. Could you see
the following commit?
> > > {{{
> > > 26a9ee9 [2252] update descriptions of latest_ixfr_duration and
latest_axfr_duration about their default values
> > > }}}
> >
> > In that case this description isn't accurate. It should be something
> > like:
> >
> > {{{
> > <listitem><simpara>
> > Duration of the latest IXFR in seconds. 0.0
means it was
> > shorter than 0.05 seconds or there was no
> > successful IXFR.
> > </simpara></listitem>
> > }}}
> >
> > and, as this description suggests, it's generally a bad idea to
> > overload a specific value that way; such ambiguity will eventually
> > bite us(ers). Better approach is to actually revise the code to
> > eliminate the ambiguity. Do you still think the overloading is the
> > way to go?
>
> I believe the timer has microsecond accuracy on the most platforms. So I
think it is not 0.05 but 0.000001. I've update description at the
following commit according to this.
Hmm, I don't remember why I referred to 0.05. Maybe from the
representation of "0.0" I assumed that it would be represented as
`%.1f`.
> {{{
> ff8d95f [2252] update descriptions
> }}}
> However I don't think this could be so confusing for users. Because I
think users could see whether a real XFR is started from other counters
like axfrreqv4 or ixfrreqv4.
They *could*, but it's inconvenient and error prone. That's the whole
point. I can easily imagine a naive user writes (buggy) monitoring
code like this:
{{{#!python
def check_xfr_status():
# retrieve last_axfr_duration using cmdctl interface or stats_httpd
last_duration = get_last_axfr_duration()
if last_duration == 0: # BUG: it should also check axfrreqv6 and
axfrreqv4
# statistics insist there has been no xfr-in successful attempt;
we know
# the primary server has responded to xfr request. that should
mean
# an evil attack. We need to retaliate immediately.
launch_nuclear_missiles()
}}}
Sure, it's the user's fault that he/she doesn't read the documentation
carefully and triggers a nuclear war as a result. But we know people
don't read document, right (see my previous ticket comment, btw)?
So, IMO, overloading like this simply means we are causing troubles
for users due to our laziness.
In any case, if this is still not convincing and you think overloading
is better, the latest description doesn't parse well to me:
{{{
Duration of the last IXFR. 0.0 means no successful
IXFR done
in greater than or equal to a microsecond. If a
started
timer is never stopped because of failure, start
time of
duration will be reset next time.
}}}
Also, it's awkward to see concepts like a "timer" here because it's
just an implementation detail.
> > > > > > '''xfrin.py.in'''
> Thank you for creating the ticket. I meant that "in future" was "in near
future" at that time. Sorry for confusion. I also think this issue is
serious. So shall we start #2884 after this ticket? Anyway, according to
this, I've updated description for avoiding confusion from users. Could
you see the following commit?
> {{{
> 20df0b0 [2252] update descriptions of zonename
> }}}
I'd say "e.g." before (IN, CH, HS).
> > > > > > - do_xfrin: stop_timer wouldn't be called if exception is
raised
> > > > > > during the process.
[...]
> > By "document", I (probably ... not quite remember after some period
> > since last comment cycle) meant the man page or something, not (only)
> > the code comment.
>
> I also wrote a comment on the man page. Could you see the commit?
> {{{
> 4d5c20e [2252] update descriptions of last_*xfr_duration
> }}}
See above. We need to explain it without using implementation
specific terms.
> > I'm not sure if I don't understand this: "I guess the timer wouldn't
> > probably started yet in case of `XfrinZoneUptodate`". You mean it
> > should have come from `_check_soa_serial`? But that can also happen
> > in IXFR.
>
> Yes, I misread the code. The condition can be changed in case of IXFR as
you mentioned.
I'm not sure if that's reasonable. See below.
> > - some class attributes (ipver, name_to_counter, master_addrinfo),
> > are introduced while they are essentially instance attributes.
> > using class attributes when they don't have to be so is generally
> > not a good practice because change in one test instance could affect
> > another test. (This is also related to the issue that led to
> > #2883). I suggest making them instance attributes.
>
> Hmm..., sorry, I cannot still understand what you worry about so much.
For example, if we intentionally change the class attribute 'ipver' to
'v7' inside of the test case like the following:
First, this should be considered a principle matter. I hope we agree
that global objects/variable, etc, are bad in general. singleton or
class attributes are just modernized form of global, so, their use
should generally be considered bad, and must be justified with a very
strong reason if that's inevitable.
In this specific case, completely overriding `ipver` actually creates
an instance variable so the problem wouldn't be apparent. updating
name_to_counter dict is probably a better example. But, again, this
is basically a principle matter; the fact that it uses global without
a reason is itself a problem.
> > - this synchronization failed
> > {{{
> > Then wait for new bind10 stderr message XFRIN_XFR_TRANSFER_STARTED
> > }}}
> > even with Jelte's patch. On master, I've never seen this happen if
> > that patch is applied.
>
> Hmm..., sorry, I'm not sure about this. I have merged the newer master
on to the branch. Could you try lettuce test again? If you still have same
problem, please let me know.
I think I figured it out. Please see the previous comment.
> BTW on my environment, checking initial statistics of Xfrout sometimes
failed. I suspect this is because a socket of Xfrout is already opened
before b10-stats asks to b10-xfrout. So I added exceptional cases to
checking. Could you see this change?
>
> {{{
> d7885ec [2252] add an exceptional case when checking initial statistics
of Xfrout
> }}}
I'm not sure about this as it didn't happen for me. But, in general,
I'd try to figure out how that actually happened, and then address the
issue either by fixing the real issue or (if it's not easily
avoidable) introducing a work around like this.
Specific code comments:
'''xfrin_test.py'''
- I'd define these as "private" by using double-underscore:
`_orig_datetime`, `_orig_start_timer`, `_const_sec`.
- I'd use camel convention for `fakedatatime`. I also suspect 'data'
should be 'date'. So it would be:
{{{#!python
class FakeDateTime:
@classmethod
def now(cls): return time2
}}}
- test_axfrreq_xfrfail: it only covers a subset of failure cases.
since we don't have to reproduce the exact xfr processing in this
context (which is done in tests for the protocol itself), I'd
simplify the test by stealing a method and broaden the coverage:
{{{#!python
def test_axfrreq_xfrfail(self):
"""DON'T FORGET DOCUMENT IT"""
self._check_init_statistics()
count = 0
for ex in [XfrinZoneError, XfrinProtocolError, XfrinException,
Exception]:
def exception_raiser():
raise ex()
self.conn._handle_xfrin_responses = exception_raiser
self.assertEqual(self.conn.do_xfrin(False), XFRIN_FAIL)
count += 1
self._check_updated_statistics({'axfrreq' + self.ipver: count,
'xfrfail': count})
}}}
- test_ixfrreq_xfrfail: first, same comment as axfrreq_xfrfail
applies. secondly (while it may make it moot), it's not obvious
why xfr is expected to fail in this case from the test code.
Third, note that ixfr can result in `XfrinZoneUptodate` exception.
Is it okay to consider it a failure (see also below)?
'''xfrin.py.in'''
- In this revised code, if IXFR results in `XfrinZoneUptodate` xfrfail
is counted. Is it intentional, and if so, is it reasonable? From a
quick look, BIND 9 seems to consider it a success. In any case,
this should be clarified in the documentation: what exactly
"succeed/fail" mean.
--
Ticket URL: <http://bind10.isc.org/ticket/2252#comment:22>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list