BIND 10 #2252: Implement counters into Xfrin (1/3)

BIND 10 Development do-not-reply at isc.org
Thu Mar 21 11:15:39 UTC 2013


#2252: Implement counters into Xfrin (1/3)
-------------------------------------+-------------------------------------
            Reporter:  naokikambe    |                        Owner:
                Type:  enhancement   |  jinmei
            Priority:  medium        |                       Status:
           Component:  xfrin         |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130402
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  5             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------
Changes (by naokikambe):

 * owner:  naokikambe => jinmei


Comment:

 Replying to [comment:14 jinmei]:
 > Oops, sorry, I overlooked it.  On the proposed text, I suspect
 > enumerating all counters is a bit too verbose.  I'd simplify it to,
 > e.g.:

 Thank you! That sounds good to me.

 > It's more about design matter so the harm may not be so obvious or
 > even a subjective matter.  I'm okay with excluding it for this ticket,
 > but I'd argue this should be at least discussed (and if we agree)
 > resolved before using the current design further.  So I'd suggest
 > creating a ticket to discuss and fix it, and do it before the
 > subsequent counter tickets for xfrin or others.

 I'm okay with redesigning if the current library has some problem. But
 what ticket should I create? What contents should I write in the ticket?
 Sorry, I couldn't understand well about problems. Could you create it
 yourself?

 >
 > '''b10-xfrin.xml'''
 >
 > - about latest_*xfr_duration: I missed the currently ongoing one
 >   doesn't show up in the statistics.  In that case
 >   "last_*xfr_duration" would be better than latest.
 >
 > > > - as for time_to_ixfr, what if there's never been an xfr for the
 zone?
 > > It shows still the default value, 0.0.
 >
 > It seems to be suboptimal in that using a magic number (especially in
 > the valid range - 0.0 should still be possible if everything is done
 > super quickly) to present a kind of "None" is always a source of
 > trouble.  I'd consider something that can explicitly mean "no
 > completed xfrs".  And, especially if we use a magic value, document it
 > somewhere.

 I updated descriptions about the default value `0.0`. Could you see the
 following commit?
 {{{
 26a9ee9 [2252] update descriptions of latest_ixfr_duration and
 latest_axfr_duration about their default values
 }}}

 > > > '''xfrin.py.in'''
 > > >
 > > > - get_ipver_str: I suggest making it "protected" by renaming
 > > >   `_get_ipver_str`.  I'd also clarify it's mostly private except for
 > > >   tests (which is my understanding).
 > > I made it protected as you mentioned. Could you see the following
 commit?
 > > {{{
 > > a1b3cfd [2252] rename the helper method name
 > > }}}
 >
 > By "clarify" I meant update the docstring too.

 I updated the docstring. Could you see this?
 {{{
 ad8e7ea [2252] update docstring of _get_ipver_str() about accessing
 }}}

 > > > - get_ipver_str: what if family is neither INET6 nor INET?
 > > I considered the case where `None` value is returned. Could you see
 the following commit?
 > > {{{
 > > 4de51f8 [2252] care for the socket family which is neither INET6 nor
 INET
 > > }}}
 >
 > It's mostly obvious that None is returned, but the point is whether we
 > should expect it to happen or treat it as an error.  I suspect our
 > implementation should have other cases, so it seems to me more
 > reasonable to throw an exception in that case.

 I updated the method and the related test to raise an exception. Could see
 the following commit for details?
 {{{
 d1d37e1 [2252] _get_ipver_str() raises a ValueError exception on address
 families other than AF_INET or AF_INET6
 }}}

 > > > - `_check_soa_serial`: shouldn't this be distinguished per name and
 > > >   class? both class IN and CH can have "example.com", for example.
 > > > {{{#!python
 > > >         # count soaoutv4 or soaoutv6 requests
 > > >         self._counters.inc('zones', self._zone_name.to_text(),
 > > >                            'soaout' + self.get_ipver_str())
 > > > }}}
 > > >   This comment applies to all other counters and timers.
 > > Probably should be considered. As an example, we might need to
 introduce a `classes` layer over the current `zones` layer in the
 statistics data structure. But this is too big to handle in this ticket.
 Could we consider in other ticket?
 >
 > I'm okay with differing it, but it seems to be a quite fundamental
 > design error.  So I'd argue that should be fixed before we rely on the
 > current format further.  The longer we postpone it, the more painful
 > it is to fix.

 The current data structure is already implemented in Xfrout. I think we
 should repair altogether at a time in future.

 > > > - do_xfrin: stop_timer wouldn't be called if exception is raised
 > > >   during the process.
 > > I think it is okay that the timer count would be discarded if
 exception is raised.
 >
 > Is it really discarded?  Shouldn't it be left until next time we set
 > the timer for the same zone?   If it's left, that doesn't seem to be
 > clear.  Also, if the intent is to only count the duration of
 > successfully completed transfers (even excluding the case of
 > `XfrinZoneUptodate`), I believe that should be documented.

 Strictly speaking, that would be discarded when the next start time is
 set. I guess the timer wouldn't probably started yet in case of
 `XfrinZoneUptodate`. I described about discarding at the following commit.
 {{{
 dc2960e [2252] add notes about the case that the timer is already started
 but not yet stopped
 }}}

 > > > - do_xfrin: maybe a matter of taste, but the `final` clause could be
 > > >   simplified:
 > > > {{{#!python
 > > >             counter_dict = {XFRIN_OK: 'xfrsuccess', XFRIN_FAIL,
 'xfrfail'}
 > > >             self._counters.inc('zones', self._zone_name.to_text(),
 > > >                                counter_dict[ret])
 > > > }}}
 > > I revised the code as you proposed. Could you see the following
 commit?
 > > {{{
 > > 2cfdda6 [2252] simplify complicated if-elif lines using a dictionary
 > > }}}
 >
 > Why does it check if `ret` is in counter_dict?
 > {{{#!python
 >             counter_dict = {XFRIN_OK: 'xfrsuccess', XFRIN_FAIL:
 'xfrfail'}
 >             if ret in counter_dict:
 >                 self._counters.inc('zones', self._zone_name.to_text(),
 >                                    counter_dict[ret])
 > }}}

 I realized that we don't need to take care of any other status than
 `XFRIN_OK` and `XFRIN_FAIL`. I misread the code. I revised at the
 following commit.
 {{{
 5ac7de7 [2252] remove the if condition because we don't need to consider
 any other status than XFRIN_OK and XFRIN_FAIL
 }}}

 > Is there a case where this `if` doesn't hold?
 >
 > > > - command_handler: why do we bother to log it here?
 > [...]
 > > >   I'm not necessarily opposed to logging this event per se, but it's
 > > >   awkward to see only this command is logged especially if it's at a
 > > >   debug level.
 > > The message is used in the lettuce tests. but it's too annoying, so I
 removed `{ poll-interval: 1}` from stats config in lettuce. Could you see
 the following commit?
 > > {{{
 > > 4d94e85 [2252] remove poll-interval from stats config
 > > }}}
 >
 > It doesn't really answer my point.  Imagine someone reviews this
 > code.  It'd be quite awkward only this case of command has a debug
 > log.  I'd rather generalize it with a command line parameter or add
 > log messages to all command separately (former is probably better
 > because the latter approach isn't very much scalable).

 I generalized the log id not to be specific to the `getstats` command.
 Could you see the following commit?
 {{{
 b0f2aca [2252] replace the log message XFRIN_RECEIVED_COMMAND to the more
 general id XFRIN_RECEIVED_COMMAND with the com
 }}}

 > > > '''xfrin_test.py'''
 > > >
 > > > - in general, I see many similar patters repeated like this:
 > > > - test_soacheck doesn't confirm soaoutv6 is incremented.
 > > > - test_do_xfrin: likewise.  it's redundant, and yet incomplete.
 >
 > Regarding these, I guess we can even more unified them, and there are
 > still missing cases that are not tested: at least ixfrreqv4
 > and ixfrreqv6.  If I were to implement the test, I'd introduce a
 > separate test case, not extending an existing do_xfrin test, and only
 > concentrate on statistics changes.  So, I'd, for example, steal
 > `_send_query` or `_handle_xfrin_responses` (because what it does
 > doesn't matter in the statistics tests), and only checks statistics
 > counter values after calling the tweaked do_xfrin.  I'd parameterize
 > everything so it's clearer all counters are actually tested without
 > much duplicates. (am I clear?).

 I removed once all statistics related tests from the existing Xfrin tests.
 And I added them again as new tests. They are based on
 `TestXfrinConnection`. So the change becomes big. Could you see the
 following commits?
 {{{
 bae4e49 [2252] remove checks of statistics counters from the existing test
 classes unrelated to statistics tests
 208c318 [2252] add tests dedicated to statistics
 }}}

 > > I included in the above commits.
 > >
 > > > - isn't this mostly trivial?
 > > > {{{#!python
 > > >         self.assertGreaterEqual(self.conn._counters.get('zones',
 > > >
 TEST_ZONE_NAME_STR,
 > > >
 'time_to_axfr'),
 > > >                                 0.0)
 > > > }}}
 > [...]
 > > Thank you for implying. but regarding this part, I changed it to a bit
 more accurate check by using `assertAlmostEqual`. Could you see the
 following commit?
 > > {{{
 > > fcf8b9d [2252] replace assertGreaterEqual with assertAlmostEqual
 > > }}}
 >
 > It doesn't seem to be improved much.  If we think a complete tests are
 > too much for this task, I'd rather remove this counter and complete
 > the implementation and tests separately.  Otherwise, I'll do a better
 > test covering various cases such as not counting it on exception, etc.
 > What I don't like to do is to allow poorly test code to be merged with
 > the excuse of "testing it is too heavy".

 I tried to change the above assertion to more accurate one by using
 `assertGreaterEqual`. It compares at 3 decimal places. This change is
 included in the above commits.

 > '''lettuce tests'''
 >
 > - First off, test failed.  I thought it might be related to the issue
 >   of #2790, but at least this one doesn't seem to be related to it:
 > {{{
 >     When I query statistics zones of bind10 module Xfrin with cmdctl
 # features/terrain/bind10_control.py:367
 >     The statistics counters are 0 in category .Xfrin.zones except for
 the follow    The statistics counters are 0 in category .Xfrin.zones
 except for the following items                           #
 features/terrain/bind10_control.py:383
 >       | item_name                         | item_value | min_value |
 >       | _SERVER_.soaoutv6                 | 1          |           |
 >       | _SERVER_.axfrreqv6                | 1          |           |
 >       | _SERVER_.xfrsuccess               | 1          |           |
 >       | _SERVER_.latest_axfr_duration     |            | 0.0       |
 >       | example.org..soaoutv6             | 1          |           |
 >       | example.org..axfrreqv6            | 1          |           |
 >       | example.org..xfrsuccess           | 1          |           |
 >       | example.org..latest_axfr_duration |            | 0.0       |
 >     Traceback (most recent call last):
 >       File "/Library/Python/2.7/site-packages/lettuce/core.py", line
 125, in __call__
 >         ret = self.function(self.step, *args, **kw)
 >       File
 "/Users/jinmei/src/isc/git/bind10-2252/tests/lettuce/features/terrain/bind10_control.py",
 line 419, in check_statistics_items
 >         (name, found, value)
 >     AssertionError: Statistics item .Xfrin.zones._SERVER_.soaoutv6 has
 unexpected value 2 (expect 1)
 > }}}
 >   and some others still fail even if I applied the patch Jelte
 >   mentioned.

 Even if I didn't apply the Jelte's patch, the related tests passed on my
 environment. And I haven't changed these thresholds since the first time
 the codes could be reviewed. Did that always fail on your environment? Can
 I see the full logs of lettuce tests and stderr? Depending on the results,
 we might need to reconsider the thresholds.

 > - check_statistics_items: it seems that the docstring has to be
 >   updated.

 I updated the docstring. Could you see the following commit?
 {{{
 be426ec [2252] update description of check_statistics_items() about
 min_value and max_value
 }}}

 > - check_init_statistics: if it's xfr specific, it seems it should be
 >   named accordingly.  Also, it seems to have been somewhere other than
 >   bind10_control.py (from a quick look it's for more generic modules;
 >   xfrs are too specific in that sense).

 I generalized it not to be specific to Xfrin nor Xfrout. Could you see the
 following commits?
 {{{
 c37fd56 [2252] generalize the module name as an argument of
 check_init_statistics()
 0eec3b5 [2252] add a port number of cmdctl as an option of
 check_init_statistics()
 }}}

 > - check_init_statistics: I'm not sure if these are 100% correct:
 > {{{#!python
 >         step.given(check_str)
 > ...
 >         step.given(check_str)
 > }}}
 >   check_str would contain a trailing '%s' at this point, correct?  if
 >   so, is that intentional?

 That was wrong tests. I updated at the above commits. Thank you for
 suggestion.

 > - xfrin_notify_handling.feature: I don't understand the purpose of this
 test:
 > {{{
 >     check initial statistics for Xfrout
 >     When I query statistics socket of bind10 module Xfrout with cmdctl
 port 47804
 >     The statistics counters are 0 in category .Xfrout.socket.unixdomain
 except for the following items
 >       | item_name | min_value | max_value |
 >       | open      |         0 |         1 |
 > }}}

 I realized that we don't really need to test them. Initial statistics are
 already checked just before the test. I removed them at the following
 commit. Thank you for suggestion.
 {{{
 c11f993 [2252] remove several initial checks for unixdomain socket
 counters of Xfrout
 }}}

 >   why xfrout test in an "xfrin feature"?  why not a specific expected
 >   value but min-max?  (I then noticed there are other xfrout tests
 >   here...why in xfrin tests?)

 It might be a feature test specific to Xfrin before, but now it includes
 various tests including statistics counter tests. So I updated the
 description in the header of `xfrin_notify_handling.feature` file. Could
 you see the following commit?
 {{{
 b448dab [2252] update description of xfrin_notify_handling.feature
 }}}

 > - it's not obvious why these are specified with "min".  please add
 >   comments on why.
 > {{{
 >     The statistics counters are 0 in category .Xfrout.zones except for
 the following items
 >       | item_name                | item_value | min_value |
 >       | _SERVER_.notifyoutv4     |          5 |           |
 >       | _SERVER_.xfrrej          |            |         1 |
 >       | example.org..notifyoutv4 |          5 |           |
 >       | example.org..xfrrej      |            |         1 |
 > }}}

 We cannot expected these rejection counters to be specific values. I add
 notes about the reason. Please see the following commit?
 {{{
 9344c41 [2252] set the expected max value of rejection counters to 3
 }}}

 > - same for this:
 > {{{
 >       | item_name              | min_value | max_value |
 >       | _SERVER_.soaoutv4      |         1 |         3 |
 >       | _SERVER_.axfrreqv4     |         1 |         3 |
 > ...
 > }}}

 On my environment, they cannot be expected to a specific value. They seem
 to depend on timing and environment. I added notes about this. Please see
 the following commit?
 {{{
 14ee53d [2252] add notes about counters of Xfrin: soaoutv4, axfrreqv4,
 soaoutv6, axfrreqv6, and xfrfail
 }}}

 Regards,

-- 
Ticket URL: <http://bind10.isc.org/ticket/2252#comment:16>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development


More information about the bind10-tickets mailing list