BIND 10 #2252: Implement counters into Xfrin (1/3)

BIND 10 Development do-not-reply at isc.org
Sat Mar 16 02:36:43 UTC 2013


#2252: Implement counters into Xfrin (1/3)
-------------------------------------+-------------------------------------
            Reporter:  naokikambe    |                        Owner:
                Type:  enhancement   |  jinmei
            Priority:  medium        |                       Status:
           Component:  xfrin         |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130319
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  5             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:13 naokikambe]:

 > > '''general'''
 > >
 > > - Don't we need a changelog entry?
 > Of course, we need. Please see [comment:2 this].

 Oops, sorry, I overlooked it.  On the proposed text, I suspect
 enumerating all counters is a bit too verbose.  I'd simplify it to,
 e.g.:

 {{{
 nnn.    [func]          naokikambe
         Added Xfrin statistics items such as the number of successful
         transfers.  These are per-zone type counters.  Their values can be
         obtained with zone names by invoking "Stats show Xfrin" via
 bindctl
         while Xfrin is running.
         (Trac #2252, git TBD)
 }}}

 > > - after struggling with figuring out how `XfrinConnection` statistics
 > >   work I realized it relied on a global (singleton) statistics object
 > >   for the module.  I don't know what kind of prior discussion brought
 > >   that decision, but I still think that's a bad practice. [...]
 >
 > Sorry, but I can't understand well whether there are such harmful things
 from the current code. However, regarding tests of the statistics library,
 they work well so far on the build farm as long as I see. Can we consider
 about that if some problem occurs which you worry about in the future?

 It's more about design matter so the harm may not be so obvious or
 even a subjective matter.  I'm okay with excluding it for this ticket,
 but I'd argue this should be at least discussed (and if we agree)
 resolved before using the current design further.  So I'd suggest
 creating a ticket to discuss and fix it, and do it before the
 subsequent counter tickets for xfrin or others.

 '''b10-xfrin.xml'''

 - about latest_*xfr_duration: I missed the currently ongoing one
   doesn't show up in the statistics.  In that case
   "last_*xfr_duration" would be better than latest.

 > > - as for time_to_ixfr, what if there's never been an xfr for the zone?
 > It shows still the default value, 0.0.

 It seems to be suboptimal in that using a magic number (especially in
 the valid range - 0.0 should still be possible if everything is done
 super quickly) to present a kind of "None" is always a source of
 trouble.  I'd consider something that can explicitly mean "no
 completed xfrs".  And, especially if we use a magic value, document it
 somewhere.

 > > '''xfrin.py.in'''
 > >
 > > - get_ipver_str: I suggest making it "protected" by renaming
 > >   `_get_ipver_str`.  I'd also clarify it's mostly private except for
 > >   tests (which is my understanding).
 > I made it protected as you mentioned. Could you see the following
 commit?
 > {{{
 > a1b3cfd [2252] rename the helper method name
 > }}}

 By "clarify" I meant update the docstring too.

 > > - get_ipver_str: what if family is neither INET6 nor INET?
 > I considered the case where `None` value is returned. Could you see the
 following commit?
 > {{{
 > 4de51f8 [2252] care for the socket family which is neither INET6 nor
 INET
 > }}}

 It's mostly obvious that None is returned, but the point is whether we
 should expect it to happen or treat it as an error.  I suspect our
 implementation should have other cases, so it seems to me more
 reasonable to throw an exception in that case.

 > > - `_check_soa_serial`: shouldn't this be distinguished per name and
 > >   class? both class IN and CH can have "example.com", for example.
 > > {{{#!python
 > >         # count soaoutv4 or soaoutv6 requests
 > >         self._counters.inc('zones', self._zone_name.to_text(),
 > >                            'soaout' + self.get_ipver_str())
 > > }}}
 > >   This comment applies to all other counters and timers.
 > Probably should be considered. As an example, we might need to introduce
 a `classes` layer over the current `zones` layer in the statistics data
 structure. But this is too big to handle in this ticket. Could we consider
 in other ticket?

 I'm okay with differing it, but it seems to be a quite fundamental
 design error.  So I'd argue that should be fixed before we rely on the
 current format further.  The longer we postpone it, the more painful
 it is to fix.

 > > - do_xfrin: stop_timer wouldn't be called if exception is raised
 > >   during the process.
 > I think it is okay that the timer count would be discarded if exception
 is raised.

 Is it really discarded?  Shouldn't it be left until next time we set
 the timer for the same zone?   If it's left, that doesn't seem to be
 clear.  Also, if the intent is to only count the duration of
 successfully completed transfers (even excluding the case of
 `XfrinZoneUptodate`), I believe that should be documented.

 > > - do_xfrin: maybe a matter of taste, but the `final` clause could be
 > >   simplified:
 > > {{{#!python
 > >             counter_dict = {XFRIN_OK: 'xfrsuccess', XFRIN_FAIL,
 'xfrfail'}
 > >             self._counters.inc('zones', self._zone_name.to_text(),
 > >                                counter_dict[ret])
 > > }}}
 > I revised the code as you proposed. Could you see the following commit?
 > {{{
 > 2cfdda6 [2252] simplify complicated if-elif lines using a dictionary
 > }}}

 Why does it check if `ret` is in counter_dict?
 {{{#!python
             counter_dict = {XFRIN_OK: 'xfrsuccess', XFRIN_FAIL: 'xfrfail'}
             if ret in counter_dict:
                 self._counters.inc('zones', self._zone_name.to_text(),
                                    counter_dict[ret])
 }}}

 Is there a case where this `if` doesn't hold?

 > > - command_handler: why do we bother to log it here?
 [...]
 > >   I'm not necessarily opposed to logging this event per se, but it's
 > >   awkward to see only this command is logged especially if it's at a
 > >   debug level.
 > The message is used in the lettuce tests. but it's too annoying, so I
 removed `{ poll-interval: 1}` from stats config in lettuce. Could you see
 the following commit?
 > {{{
 > 4d94e85 [2252] remove poll-interval from stats config
 > }}}

 It doesn't really answer my point.  Imagine someone reviews this
 code.  It'd be quite awkward only this case of command has a debug
 log.  I'd rather generalize it with a command line parameter or add
 log messages to all command separately (former is probably better
 because the latter approach isn't very much scalable).

 > > '''xfrin_test.py'''
 > >
 > > - in general, I see many similar patters repeated like this:
 > > - test_soacheck doesn't confirm soaoutv6 is incremented.
 > > - test_do_xfrin: likewise.  it's redundant, and yet incomplete.

 Regarding these, I guess we can even more unified them, and there are
 still missing cases that are not tested: at least ixfrreqv4
 and ixfrreqv6.  If I were to implement the test, I'd introduce a
 separate test case, not extending an existing do_xfrin test, and only
 concentrate on statistics changes.  So, I'd, for example, steal
 `_send_query` or `_handle_xfrin_responses` (because what it does
 doesn't matter in the statistics tests), and only checks statistics
 counter values after calling the tweaked do_xfrin.  I'd parameterize
 everything so it's clearer all counters are actually tested without
 much duplicates. (am I clear?).

 > I included in the above commits.
 >
 > > - isn't this mostly trivial?
 > > {{{#!python
 > >         self.assertGreaterEqual(self.conn._counters.get('zones',
 > >
 TEST_ZONE_NAME_STR,
 > >
 'time_to_axfr'),
 > >                                 0.0)
 > > }}}
 [...]
 > Thank you for implying. but regarding this part, I changed it to a bit
 more accurate check by using `assertAlmostEqual`. Could you see the
 following commit?
 > {{{
 > fcf8b9d [2252] replace assertGreaterEqual with assertAlmostEqual
 > }}}

 It doesn't seem to be improved much.  If we think a complete tests are
 too much for this task, I'd rather remove this counter and complete
 the implementation and tests separately.  Otherwise, I'll do a better
 test covering various cases such as not counting it on exception, etc.
 What I don't like to do is to allow poorly test code to be merged with
 the excuse of "testing it is too heavy".

 '''lettuce tests'''

 - First off, test failed.  I thought it might be related to the issue
   of #2790, but at least this one doesn't seem to be related to it:
 {{{
     When I query statistics zones of bind10 module Xfrin with cmdctl
 # features/terrain/bind10_control.py:367
     The statistics counters are 0 in category .Xfrin.zones except for the
 follow    The statistics counters are 0 in category .Xfrin.zones except
 for the following items                           #
 features/terrain/bind10_control.py:383
       | item_name                         | item_value | min_value |
       | _SERVER_.soaoutv6                 | 1          |           |
       | _SERVER_.axfrreqv6                | 1          |           |
       | _SERVER_.xfrsuccess               | 1          |           |
       | _SERVER_.latest_axfr_duration     |            | 0.0       |
       | example.org..soaoutv6             | 1          |           |
       | example.org..axfrreqv6            | 1          |           |
       | example.org..xfrsuccess           | 1          |           |
       | example.org..latest_axfr_duration |            | 0.0       |
     Traceback (most recent call last):
       File "/Library/Python/2.7/site-packages/lettuce/core.py", line 125,
 in __call__
         ret = self.function(self.step, *args, **kw)
       File
 "/Users/jinmei/src/isc/git/bind10-2252/tests/lettuce/features/terrain/bind10_control.py",
 line 419, in check_statistics_items
         (name, found, value)
     AssertionError: Statistics item .Xfrin.zones._SERVER_.soaoutv6 has
 unexpected value 2 (expect 1)
 }}}
   and some others still fail even if I applied the patch Jelte
   mentioned.

 - check_statistics_items: it seems that the docstring has to be
   updated.

 - check_init_statistics: if it's xfr specific, it seems it should be
   named accordingly.  Also, it seems to have been somewhere other than
   bind10_control.py (from a quick look it's for more generic modules;
   xfrs are too specific in that sense).

 - check_init_statistics: I'm not sure if these are 100% correct:
 {{{#!python
         step.given(check_str)
 ...
         step.given(check_str)
 }}}
   check_str would contain a trailing '%s' at this point, correct?  if
   so, is that intentional?

 - xfrin_notify_handling.feature: I don't understand the purpose of this
 test:
 {{{
     check initial statistics for Xfrout
     When I query statistics socket of bind10 module Xfrout with cmdctl
 port 47804
     The statistics counters are 0 in category .Xfrout.socket.unixdomain
 except for the following items
       | item_name | min_value | max_value |
       | open      |         0 |         1 |
 }}}
   why xfrout test in an "xfrin feature"?  why not a specific expected
   value but min-max?  (I then noticed there are other xfrout tests
   here...why in xfrin tests?)

 - it's not obvious why these are specified with "min".  please add
   comments on why.
 {{{
     The statistics counters are 0 in category .Xfrout.zones except for the
 following items
       | item_name                | item_value | min_value |
       | _SERVER_.notifyoutv4     |          5 |           |
       | _SERVER_.xfrrej          |            |         1 |
       | example.org..notifyoutv4 |          5 |           |
       | example.org..xfrrej      |            |         1 |
 }}}
 - same for this:
 {{{
       | item_name              | min_value | max_value |
       | _SERVER_.soaoutv4      |         1 |         3 |
       | _SERVER_.axfrreqv4     |         1 |         3 |
 ...
 }}}

-- 
Ticket URL: <http://bind10.isc.org/ticket/2252#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list