BIND 10 #2252: Implement counters into Xfrin (1/3)

BIND 10 Development do-not-reply at isc.org
Sat Apr 6 02:07:13 UTC 2013


#2252: Implement counters into Xfrin (1/3)
-------------------------------------+-------------------------------------
            Reporter:  naokikambe    |                        Owner:
                Type:  enhancement   |  naokikambe
            Priority:  medium        |                       Status:
           Component:  xfrin         |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130423
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  5             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:19 naokikambe]:

 > > > > > > - as for time_to_ixfr, what if there's never been an xfr for
 the zone?
 > > > > > It shows still the default value, 0.0.
 > > > >
 > > > > It seems to be suboptimal in that using a magic number (especially
 in
 > > > > the valid range - [...]
 > > >
 > > > I updated descriptions about the default value `0.0`. Could you see
 the following commit?
 > > > {{{
 > > > 26a9ee9 [2252] update descriptions of latest_ixfr_duration and
 latest_axfr_duration about their default values
 > > > }}}
 > >
 > > In that case this description isn't accurate.  It should be something
 > > like:
 > >
 > > {{{
 > >                     <listitem><simpara>
 > >                       Duration of the latest IXFR in seconds. 0.0
 means it was
 > >                       shorter than 0.05 seconds or there was no
 > >                       successful IXFR.
 > >                     </simpara></listitem>
 > > }}}
 > >
 > > and, as this description suggests, it's generally a bad idea to
 > > overload a specific value that way; such ambiguity will eventually
 > > bite us(ers).  Better approach is to actually revise the code to
 > > eliminate the ambiguity.  Do you still think the overloading is the
 > > way to go?
 >
 > I believe the timer has microsecond accuracy on the most platforms. So I
 think it is not 0.05 but 0.000001. I've update description at the
 following commit according to this.

 Hmm, I don't remember why I referred to 0.05.  Maybe from the
 representation of "0.0" I assumed that it would be represented as
 `%.1f`.

 > {{{
 > ff8d95f [2252] update descriptions
 > }}}
 > However I don't think this could be so confusing for users. Because I
 think users could see whether a real XFR is started from other counters
 like axfrreqv4 or ixfrreqv4.

 They *could*, but it's inconvenient and error prone.  That's the whole
 point.  I can easily imagine a naive user writes (buggy) monitoring
 code like this:
 {{{#!python
 def check_xfr_status():
     # retrieve last_axfr_duration using cmdctl interface or stats_httpd
     last_duration = get_last_axfr_duration()
     if last_duration == 0: # BUG: it should also check axfrreqv6 and
 axfrreqv4
         # statistics insist there has been no xfr-in successful attempt;
 we know
         # the primary server has responded to xfr request.  that should
 mean
         # an evil attack.  We need to retaliate immediately.
         launch_nuclear_missiles()
 }}}
 Sure, it's the user's fault that he/she doesn't read the documentation
 carefully and triggers a nuclear war as a result.  But we know people
 don't read document, right (see my previous ticket comment, btw)?

 So, IMO, overloading like this simply means we are causing troubles
 for users due to our laziness.

 In any case, if this is still not convincing and you think overloading
 is better, the latest description doesn't parse well to me:

 {{{
                       Duration of the last IXFR. 0.0 means no successful
 IXFR done
                       in greater than or equal to a microsecond. If a
 started
                       timer is never stopped because of failure, start
 time of
                       duration will be reset next time.
 }}}

 Also, it's awkward to see concepts like a "timer" here because it's
 just an implementation detail.

 > > > > > > '''xfrin.py.in'''

 > Thank you for creating the ticket. I meant that "in future" was "in near
 future" at that time. Sorry for confusion. I also think this issue is
 serious. So shall we start #2884 after this ticket? Anyway, according to
 this, I've updated description for avoiding confusion from users. Could
 you see the following commit?
 > {{{
 > 20df0b0 [2252] update descriptions of zonename
 > }}}

 I'd say "e.g." before (IN, CH, HS).

 > > > > > > - do_xfrin: stop_timer wouldn't be called if exception is
 raised
 > > > > > >   during the process.
 [...]
 > > By "document", I (probably ... not quite remember after some period
 > > since last comment cycle) meant the man page or something, not (only)
 > > the code comment.
 >
 > I also wrote a comment on the man page. Could you see the commit?
 > {{{
 > 4d5c20e [2252] update descriptions of last_*xfr_duration
 > }}}

 See above.  We need to explain it without using implementation
 specific terms.

 > > I'm not sure if I don't understand this: "I guess the timer wouldn't
 > > probably started yet in case of `XfrinZoneUptodate`".  You mean it
 > > should have come from `_check_soa_serial`?  But that can also happen
 > > in IXFR.
 >
 > Yes, I misread the code. The condition can be changed in case of IXFR as
 you mentioned.

 I'm not sure if that's reasonable.  See below.

 > > - some class attributes (ipver, name_to_counter, master_addrinfo),
 > >   are introduced while they are essentially instance attributes.
 > >   using class attributes when they don't have to be so is generally
 > >   not a good practice because change in one test instance could affect
 > >   another test.  (This is also related to the issue that led to
 > >   #2883).  I suggest making them instance attributes.
 >
 > Hmm..., sorry, I cannot still understand what you worry about so much.
 For example, if we intentionally change the class attribute 'ipver' to
 'v7' inside of the test case like the following:

 First, this should be considered a principle matter.  I hope we agree
 that global objects/variable, etc, are bad in general.  singleton or
 class attributes are just modernized form of global, so, their use
 should generally be considered bad, and must be justified with a very
 strong reason if that's inevitable.

 In this specific case, completely overriding `ipver` actually creates
 an instance variable so the problem wouldn't be apparent.  updating
 name_to_counter dict is probably a better example.  But, again, this
 is basically a principle matter; the fact that it uses global without
 a reason is itself a problem.

 > > - this synchronization failed
 > > {{{
 > >     Then wait for new bind10 stderr message XFRIN_XFR_TRANSFER_STARTED
 > > }}}
 > >   even with Jelte's patch.  On master, I've never seen this happen if
 > >   that patch is applied.
 >
 > Hmm..., sorry, I'm not sure about this. I have merged the newer master
 on to the branch. Could you try lettuce test again? If you still have same
 problem, please let me know.

 I think I figured it out.  Please see the previous comment.

 > BTW on my environment, checking initial statistics of Xfrout sometimes
 failed. I suspect this is because a socket of Xfrout is already opened
 before b10-stats asks to b10-xfrout. So I added exceptional cases to
 checking. Could you see this change?
 >
 > {{{
 > d7885ec [2252] add an exceptional case when checking initial statistics
 of Xfrout
 > }}}

 I'm not sure about this as it didn't happen for me.  But, in general,
 I'd try to figure out how that actually happened, and then address the
 issue either by fixing the real issue or (if it's not easily
 avoidable) introducing a work around like this.

 Specific code comments:

 '''xfrin_test.py'''

 - I'd define these as "private" by using double-underscore:
   `_orig_datetime`, `_orig_start_timer`, `_const_sec`.
 - I'd use camel convention for `fakedatatime`.  I also suspect 'data'
   should be 'date'.  So it would be:
 {{{#!python
         class FakeDateTime:
             @classmethod
             def now(cls): return time2
 }}}
 - test_axfrreq_xfrfail: it only covers a subset of failure cases.
   since we don't have to reproduce the exact xfr processing in this
   context (which is done in tests for the protocol itself), I'd
   simplify the test by stealing a method and broaden the coverage:
 {{{#!python
     def test_axfrreq_xfrfail(self):
         """DON'T FORGET DOCUMENT IT"""
         self._check_init_statistics()
         count = 0
         for ex in [XfrinZoneError, XfrinProtocolError, XfrinException,
                    Exception]:
             def exception_raiser():
                 raise ex()
             self.conn._handle_xfrin_responses = exception_raiser
             self.assertEqual(self.conn.do_xfrin(False), XFRIN_FAIL)
             count += 1
             self._check_updated_statistics({'axfrreq' + self.ipver: count,
                                             'xfrfail': count})
 }}}

 - test_ixfrreq_xfrfail: first, same comment as axfrreq_xfrfail
   applies.  secondly (while it may make it moot), it's not obvious
   why xfr is expected to fail in this case from the test code.
   Third, note that ixfr can result in `XfrinZoneUptodate` exception.
   Is it okay to consider it a failure (see also below)?

 '''xfrin.py.in'''

 - In this revised code, if IXFR results in `XfrinZoneUptodate` xfrfail
   is counted.  Is it intentional, and if so, is it reasonable?  From a
   quick look, BIND 9 seems to consider it a success.  In any case,
   this should be clarified in the documentation: what exactly
   "succeed/fail" mean.

-- 
Ticket URL: <http://bind10.isc.org/ticket/2252#comment:22>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list