BIND 10 #2252: Implement counters into Xfrin (1/3)

Thu Apr 11 05:08:40 UTC 2013

#2252: Implement counters into Xfrin (1/3)
-------------------------------------+-------------------------------------
            Reporter:  naokikambe    |                        Owner:
                Type:  enhancement   |  jinmei
            Priority:  medium        |                       Status:
           Component:  xfrin         |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130423
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  5             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by naokikambe):

 * owner:  naokikambe => jinmei

Comment:

 Hello,

 Replying to [comment:20 jinmei]:
 > Please read tests/lettuce/configurations/DO_NOT_USE_127.0.0.1:47807
 > and adjusted the configuration.  That's seemingly the reason for the
 > failure I've seen.

 Sorry, I missed the notes. I changed the port number to 47809. Could you
 try the lettuce test again? The commit is:
 {{{
 71e0b2b [2252] use 127.0.0.1:47809 as address and port of the primary
 server in lettuce
 }}}

 Replying to [comment:22 jinmei]:
 > > {{{
 > > ff8d95f [2252] update descriptions
 > > }}}
 > > However I don't think this could be so confusing for users. Because I
 think users could see whether a real XFR is started from other counters
 like axfrreqv4 or ixfrreqv4.
 >
 > They *could*, but it's inconvenient and error prone.  That's the whole
 > point.  I can easily imagine a naive user writes (buggy) monitoring
 > code like this:
 > {{{#!python
 > def check_xfr_status():
 >     # retrieve last_axfr_duration using cmdctl interface or stats_httpd
 >     last_duration = get_last_axfr_duration()
 >     if last_duration == 0: # BUG: it should also check axfrreqv6 and
 axfrreqv4
 >         # statistics insist there has been no xfr-in successful attempt;
 we know
 >         # the primary server has responded to xfr request.  that should
 mean
 >         # an evil attack.  We need to retaliate immediately.
 >         launch_nuclear_missiles()
 > }}}
 > Sure, it's the user's fault that he/she doesn't read the documentation
 > carefully and triggers a nuclear war as a result.  But we know people
 > don't read document, right (see my previous ticket comment, btw)?
 >
 > So, IMO, overloading like this simply means we are causing troubles
 > for users due to our laziness.
 (I would be so happy if such a user used statistics counters much as
 above. But in fact, ..)
 I think the user probably needs another trigger like `axfrreqv4` or
 `axfrreqv6` or bind10 log or the primary server's log or something.
 Because he/she must realize that the script would not work at first.
 `last_axfr_duration` must keep being 0.0 until the first xfr is done. Also
 in a real case, it would be a very rare that a real xfr is done in less
 than a microsecond. Can we fix the issue on another ticket after this
 ticket? Anyway ..

 >
 > In any case, if this is still not convincing and you think overloading
 > is better, the latest description doesn't parse well to me:
 >
 > {{{
 >                       Duration of the last IXFR. 0.0 means no successful
 IXFR done
 >                       in greater than or equal to a microsecond. If a
 started
 >                       timer is never stopped because of failure, start
 time of
 >                       duration will be reset next time.
 > }}}
 >
 > Also, it's awkward to see concepts like a "timer" here because it's
 > just an implementation detail.

 I tried to revise the descriptions. Could you take a look at the commit?
 {{{
 0732633 [2252] update descriptions of last_ixfr_duration and
 last_axfr_duration
 }}}

 >
 > > > > > > > '''xfrin.py.in'''
 >
 > > Thank you for creating the ticket. I meant that "in future" was "in
 near future" at that time. Sorry for confusion. I also think this issue is
 serious. So shall we start #2884 after this ticket? Anyway, according to
 this, I've updated description for avoiding confusion from users. Could
 you see the following commit?
 > > {{{
 > > 20df0b0 [2252] update descriptions of zonename
 > > }}}
 >
 > I'd say "e.g." before (IN, CH, HS).

 That's right. I've revised at:
 {{{
 a50e0a4 [2252] revise description of zonename (add 'e.g.' before the zone
 classes)
 }}}

 > > > - some class attributes (ipver, name_to_counter, master_addrinfo),
 > > >   are introduced while they are essentially instance attributes.
 > > >   using class attributes when they don't have to be so is generally
 > > >   not a good practice because change in one test instance could
 affect
 > > >   another test.  (This is also related to the issue that led to
 > > >   #2883).  I suggest making them instance attributes.
 > >
 > > Hmm..., sorry, I cannot still understand what you worry about so much.
 For example, if we intentionally change the class attribute 'ipver' to
 'v7' inside of the test case like the following:
 >
 > First, this should be considered a principle matter.  I hope we agree
 > that global objects/variable, etc, are bad in general.  singleton or
 > class attributes are just modernized form of global, so, their use
 > should generally be considered bad, and must be justified with a very
 > strong reason if that's inevitable.
 >
 > In this specific case, completely overriding `ipver` actually creates
 > an instance variable so the problem wouldn't be apparent.  updating
 > name_to_counter dict is probably a better example.  But, again, this
 > is basically a principle matter; the fact that it uses global without
 > a reason is itself a problem.

 We cannot cause such a problem as well by updating either
 `name_to_counter` or `master_addrinfo` in that case. Because both of them
 were defined as a tuple. I'd not like to disagree with using a global
 variable. But I'm still not sure about difference between using a pure
 global variable and using a class-scope reference.

 Anyway I've removed them and changed to instance objects or getter
 properties. A getter property cannot be changed to another reference
 without a setter. They must be treated as read-only. The commit is:
 {{{
 084f1f5 [2252] remove some constant definitions from the class scope
 }}}

 >
 > > BTW on my environment, checking initial statistics of Xfrout sometimes
 failed. I suspect this is because a socket of Xfrout is already opened
 before b10-stats asks to b10-xfrout. So I added exceptional cases to
 checking. Could you see this change?
 > >
 > > {{{
 > > d7885ec [2252] add an exceptional case when checking initial
 statistics of Xfrout
 > > }}}
 >
 > I'm not sure about this as it didn't happen for me.  But, in general,
 > I'd try to figure out how that actually happened, and then address the
 > issue either by fixing the real issue or (if it's not easily
 > avoidable) introducing a work around like this.

 I'm not really sure but in fact I couldn't reproduce this issue. I
 couldn't investigate further. So I've reverted such exceptional cases. If
 the issue happens again, I would try to figure out the reason. The commit
 is:
 {{{
 61ec72f [2252] remove exceptional cases when checking initial statistics
 of Xfrout
 }}}

 >
 > Specific code comments:
 >
 > '''xfrin_test.py'''
 >
 > - I'd define these as "private" by using double-underscore:
 >   `_orig_datetime`, `_orig_start_timer`, `_const_sec`.

 I couldn't do only for `_const_sec` because I realized it was referred
 from another child class. The commit is:
 {{{
 04e248b [2252] make the references class-local: _orig_datetime,
 _orig_start_timer
 }}}

 > - I'd use camel convention for `fakedatatime`.  I also suspect 'data'
 >   should be 'date'.  So it would be:
 > {{{#!python
 >         class FakeDateTime:
 >             @classmethod
 >             def now(cls): return time2
 > }}}
 I've revised as you suggested. Thanks. The commit is:
 {{{
 1cef659 [2252] update a fake class name to camel convention including a
 typo fix
 }}}
 > - test_axfrreq_xfrfail: it only covers a subset of failure cases.
 >   since we don't have to reproduce the exact xfr processing in this
 >   context (which is done in tests for the protocol itself), I'd
 >   simplify the test by stealing a method and broaden the coverage:
 > {{{#!python
 >     def test_axfrreq_xfrfail(self):
 >         """DON'T FORGET DOCUMENT IT"""
 >         self._check_init_statistics()
 >         count = 0
 >         for ex in [XfrinZoneError, XfrinProtocolError, XfrinException,
 >                    Exception]:
 >             def exception_raiser():
 >                 raise ex()
 >             self.conn._handle_xfrin_responses = exception_raiser
 >             self.assertEqual(self.conn.do_xfrin(False), XFRIN_FAIL)
 >             count += 1
 >             self._check_updated_statistics({'axfrreq' + self.ipver:
 count,
 >                                             'xfrfail': count})
 > }}}
 >
 > - test_ixfrreq_xfrfail: first, same comment as axfrreq_xfrfail
 >   applies.  secondly (while it may make it moot), it's not obvious
 >   why xfr is expected to fail in this case from the test code.
 >   Third, note that ixfr can result in `XfrinZoneUptodate` exception.
 >   Is it okay to consider it a failure (see also below)?

 I've revised the two test cases by following your instruction. Thanks.
 I've also added comments. Could you check that? The commit is:
 {{{
 5db9144 [2252] check an xfrfail incremented with multiple exceptions
 }}}

 >
 > '''xfrin.py.in'''
 >
 > - In this revised code, if IXFR results in `XfrinZoneUptodate` xfrfail
 >   is counted.  Is it intentional, and if so, is it reasonable?  From a
 >   quick look, BIND 9 seems to consider it a success.  In any case,
 >   this should be clarified in the documentation: what exactly
 >   "succeed/fail" mean.

 I revised the codes to treat an xfr request as success in a case of raised
 `XfrinZoneUptodate`. For consistency between BIND 9 and 10, I think we
 should define this case as success. In fact, `do_xfrin()` already defines
 so. The commit is:
 {{{
 8544c8e [2252] treat an xfr request as success with an XfrinZoneUptodate
 raised
 }}}

 Regards,

-- 
Ticket URL: <http://bind10.isc.org/ticket/2252#comment:23>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development