BIND 10 #2252: Implement counters into Xfrin (1/3)
BIND 10 Development
do-not-reply at isc.org
Thu Mar 21 11:15:39 UTC 2013
#2252: Implement counters into Xfrin (1/3)
-------------------------------------+-------------------------------------
Reporter: naokikambe | Owner:
Type: enhancement | jinmei
Priority: medium | Status:
Component: xfrin | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20130402
Sub-Project: DNS | Resolution:
Estimated Difficulty: 5 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Changes (by naokikambe):
* owner: naokikambe => jinmei
Comment:
Replying to [comment:14 jinmei]:
> Oops, sorry, I overlooked it. On the proposed text, I suspect
> enumerating all counters is a bit too verbose. I'd simplify it to,
> e.g.:
Thank you! That sounds good to me.
> It's more about design matter so the harm may not be so obvious or
> even a subjective matter. I'm okay with excluding it for this ticket,
> but I'd argue this should be at least discussed (and if we agree)
> resolved before using the current design further. So I'd suggest
> creating a ticket to discuss and fix it, and do it before the
> subsequent counter tickets for xfrin or others.
I'm okay with redesigning if the current library has some problem. But
what ticket should I create? What contents should I write in the ticket?
Sorry, I couldn't understand well about problems. Could you create it
yourself?
>
> '''b10-xfrin.xml'''
>
> - about latest_*xfr_duration: I missed the currently ongoing one
> doesn't show up in the statistics. In that case
> "last_*xfr_duration" would be better than latest.
>
> > > - as for time_to_ixfr, what if there's never been an xfr for the
zone?
> > It shows still the default value, 0.0.
>
> It seems to be suboptimal in that using a magic number (especially in
> the valid range - 0.0 should still be possible if everything is done
> super quickly) to present a kind of "None" is always a source of
> trouble. I'd consider something that can explicitly mean "no
> completed xfrs". And, especially if we use a magic value, document it
> somewhere.
I updated descriptions about the default value `0.0`. Could you see the
following commit?
{{{
26a9ee9 [2252] update descriptions of latest_ixfr_duration and
latest_axfr_duration about their default values
}}}
> > > '''xfrin.py.in'''
> > >
> > > - get_ipver_str: I suggest making it "protected" by renaming
> > > `_get_ipver_str`. I'd also clarify it's mostly private except for
> > > tests (which is my understanding).
> > I made it protected as you mentioned. Could you see the following
commit?
> > {{{
> > a1b3cfd [2252] rename the helper method name
> > }}}
>
> By "clarify" I meant update the docstring too.
I updated the docstring. Could you see this?
{{{
ad8e7ea [2252] update docstring of _get_ipver_str() about accessing
}}}
> > > - get_ipver_str: what if family is neither INET6 nor INET?
> > I considered the case where `None` value is returned. Could you see
the following commit?
> > {{{
> > 4de51f8 [2252] care for the socket family which is neither INET6 nor
INET
> > }}}
>
> It's mostly obvious that None is returned, but the point is whether we
> should expect it to happen or treat it as an error. I suspect our
> implementation should have other cases, so it seems to me more
> reasonable to throw an exception in that case.
I updated the method and the related test to raise an exception. Could see
the following commit for details?
{{{
d1d37e1 [2252] _get_ipver_str() raises a ValueError exception on address
families other than AF_INET or AF_INET6
}}}
> > > - `_check_soa_serial`: shouldn't this be distinguished per name and
> > > class? both class IN and CH can have "example.com", for example.
> > > {{{#!python
> > > # count soaoutv4 or soaoutv6 requests
> > > self._counters.inc('zones', self._zone_name.to_text(),
> > > 'soaout' + self.get_ipver_str())
> > > }}}
> > > This comment applies to all other counters and timers.
> > Probably should be considered. As an example, we might need to
introduce a `classes` layer over the current `zones` layer in the
statistics data structure. But this is too big to handle in this ticket.
Could we consider in other ticket?
>
> I'm okay with differing it, but it seems to be a quite fundamental
> design error. So I'd argue that should be fixed before we rely on the
> current format further. The longer we postpone it, the more painful
> it is to fix.
The current data structure is already implemented in Xfrout. I think we
should repair altogether at a time in future.
> > > - do_xfrin: stop_timer wouldn't be called if exception is raised
> > > during the process.
> > I think it is okay that the timer count would be discarded if
exception is raised.
>
> Is it really discarded? Shouldn't it be left until next time we set
> the timer for the same zone? If it's left, that doesn't seem to be
> clear. Also, if the intent is to only count the duration of
> successfully completed transfers (even excluding the case of
> `XfrinZoneUptodate`), I believe that should be documented.
Strictly speaking, that would be discarded when the next start time is
set. I guess the timer wouldn't probably started yet in case of
`XfrinZoneUptodate`. I described about discarding at the following commit.
{{{
dc2960e [2252] add notes about the case that the timer is already started
but not yet stopped
}}}
> > > - do_xfrin: maybe a matter of taste, but the `final` clause could be
> > > simplified:
> > > {{{#!python
> > > counter_dict = {XFRIN_OK: 'xfrsuccess', XFRIN_FAIL,
'xfrfail'}
> > > self._counters.inc('zones', self._zone_name.to_text(),
> > > counter_dict[ret])
> > > }}}
> > I revised the code as you proposed. Could you see the following
commit?
> > {{{
> > 2cfdda6 [2252] simplify complicated if-elif lines using a dictionary
> > }}}
>
> Why does it check if `ret` is in counter_dict?
> {{{#!python
> counter_dict = {XFRIN_OK: 'xfrsuccess', XFRIN_FAIL:
'xfrfail'}
> if ret in counter_dict:
> self._counters.inc('zones', self._zone_name.to_text(),
> counter_dict[ret])
> }}}
I realized that we don't need to take care of any other status than
`XFRIN_OK` and `XFRIN_FAIL`. I misread the code. I revised at the
following commit.
{{{
5ac7de7 [2252] remove the if condition because we don't need to consider
any other status than XFRIN_OK and XFRIN_FAIL
}}}
> Is there a case where this `if` doesn't hold?
>
> > > - command_handler: why do we bother to log it here?
> [...]
> > > I'm not necessarily opposed to logging this event per se, but it's
> > > awkward to see only this command is logged especially if it's at a
> > > debug level.
> > The message is used in the lettuce tests. but it's too annoying, so I
removed `{ poll-interval: 1}` from stats config in lettuce. Could you see
the following commit?
> > {{{
> > 4d94e85 [2252] remove poll-interval from stats config
> > }}}
>
> It doesn't really answer my point. Imagine someone reviews this
> code. It'd be quite awkward only this case of command has a debug
> log. I'd rather generalize it with a command line parameter or add
> log messages to all command separately (former is probably better
> because the latter approach isn't very much scalable).
I generalized the log id not to be specific to the `getstats` command.
Could you see the following commit?
{{{
b0f2aca [2252] replace the log message XFRIN_RECEIVED_COMMAND to the more
general id XFRIN_RECEIVED_COMMAND with the com
}}}
> > > '''xfrin_test.py'''
> > >
> > > - in general, I see many similar patters repeated like this:
> > > - test_soacheck doesn't confirm soaoutv6 is incremented.
> > > - test_do_xfrin: likewise. it's redundant, and yet incomplete.
>
> Regarding these, I guess we can even more unified them, and there are
> still missing cases that are not tested: at least ixfrreqv4
> and ixfrreqv6. If I were to implement the test, I'd introduce a
> separate test case, not extending an existing do_xfrin test, and only
> concentrate on statistics changes. So, I'd, for example, steal
> `_send_query` or `_handle_xfrin_responses` (because what it does
> doesn't matter in the statistics tests), and only checks statistics
> counter values after calling the tweaked do_xfrin. I'd parameterize
> everything so it's clearer all counters are actually tested without
> much duplicates. (am I clear?).
I removed once all statistics related tests from the existing Xfrin tests.
And I added them again as new tests. They are based on
`TestXfrinConnection`. So the change becomes big. Could you see the
following commits?
{{{
bae4e49 [2252] remove checks of statistics counters from the existing test
classes unrelated to statistics tests
208c318 [2252] add tests dedicated to statistics
}}}
> > I included in the above commits.
> >
> > > - isn't this mostly trivial?
> > > {{{#!python
> > > self.assertGreaterEqual(self.conn._counters.get('zones',
> > >
TEST_ZONE_NAME_STR,
> > >
'time_to_axfr'),
> > > 0.0)
> > > }}}
> [...]
> > Thank you for implying. but regarding this part, I changed it to a bit
more accurate check by using `assertAlmostEqual`. Could you see the
following commit?
> > {{{
> > fcf8b9d [2252] replace assertGreaterEqual with assertAlmostEqual
> > }}}
>
> It doesn't seem to be improved much. If we think a complete tests are
> too much for this task, I'd rather remove this counter and complete
> the implementation and tests separately. Otherwise, I'll do a better
> test covering various cases such as not counting it on exception, etc.
> What I don't like to do is to allow poorly test code to be merged with
> the excuse of "testing it is too heavy".
I tried to change the above assertion to more accurate one by using
`assertGreaterEqual`. It compares at 3 decimal places. This change is
included in the above commits.
> '''lettuce tests'''
>
> - First off, test failed. I thought it might be related to the issue
> of #2790, but at least this one doesn't seem to be related to it:
> {{{
> When I query statistics zones of bind10 module Xfrin with cmdctl
# features/terrain/bind10_control.py:367
> The statistics counters are 0 in category .Xfrin.zones except for
the follow The statistics counters are 0 in category .Xfrin.zones
except for the following items #
features/terrain/bind10_control.py:383
> | item_name | item_value | min_value |
> | _SERVER_.soaoutv6 | 1 | |
> | _SERVER_.axfrreqv6 | 1 | |
> | _SERVER_.xfrsuccess | 1 | |
> | _SERVER_.latest_axfr_duration | | 0.0 |
> | example.org..soaoutv6 | 1 | |
> | example.org..axfrreqv6 | 1 | |
> | example.org..xfrsuccess | 1 | |
> | example.org..latest_axfr_duration | | 0.0 |
> Traceback (most recent call last):
> File "/Library/Python/2.7/site-packages/lettuce/core.py", line
125, in __call__
> ret = self.function(self.step, *args, **kw)
> File
"/Users/jinmei/src/isc/git/bind10-2252/tests/lettuce/features/terrain/bind10_control.py",
line 419, in check_statistics_items
> (name, found, value)
> AssertionError: Statistics item .Xfrin.zones._SERVER_.soaoutv6 has
unexpected value 2 (expect 1)
> }}}
> and some others still fail even if I applied the patch Jelte
> mentioned.
Even if I didn't apply the Jelte's patch, the related tests passed on my
environment. And I haven't changed these thresholds since the first time
the codes could be reviewed. Did that always fail on your environment? Can
I see the full logs of lettuce tests and stderr? Depending on the results,
we might need to reconsider the thresholds.
> - check_statistics_items: it seems that the docstring has to be
> updated.
I updated the docstring. Could you see the following commit?
{{{
be426ec [2252] update description of check_statistics_items() about
min_value and max_value
}}}
> - check_init_statistics: if it's xfr specific, it seems it should be
> named accordingly. Also, it seems to have been somewhere other than
> bind10_control.py (from a quick look it's for more generic modules;
> xfrs are too specific in that sense).
I generalized it not to be specific to Xfrin nor Xfrout. Could you see the
following commits?
{{{
c37fd56 [2252] generalize the module name as an argument of
check_init_statistics()
0eec3b5 [2252] add a port number of cmdctl as an option of
check_init_statistics()
}}}
> - check_init_statistics: I'm not sure if these are 100% correct:
> {{{#!python
> step.given(check_str)
> ...
> step.given(check_str)
> }}}
> check_str would contain a trailing '%s' at this point, correct? if
> so, is that intentional?
That was wrong tests. I updated at the above commits. Thank you for
suggestion.
> - xfrin_notify_handling.feature: I don't understand the purpose of this
test:
> {{{
> check initial statistics for Xfrout
> When I query statistics socket of bind10 module Xfrout with cmdctl
port 47804
> The statistics counters are 0 in category .Xfrout.socket.unixdomain
except for the following items
> | item_name | min_value | max_value |
> | open | 0 | 1 |
> }}}
I realized that we don't really need to test them. Initial statistics are
already checked just before the test. I removed them at the following
commit. Thank you for suggestion.
{{{
c11f993 [2252] remove several initial checks for unixdomain socket
counters of Xfrout
}}}
> why xfrout test in an "xfrin feature"? why not a specific expected
> value but min-max? (I then noticed there are other xfrout tests
> here...why in xfrin tests?)
It might be a feature test specific to Xfrin before, but now it includes
various tests including statistics counter tests. So I updated the
description in the header of `xfrin_notify_handling.feature` file. Could
you see the following commit?
{{{
b448dab [2252] update description of xfrin_notify_handling.feature
}}}
> - it's not obvious why these are specified with "min". please add
> comments on why.
> {{{
> The statistics counters are 0 in category .Xfrout.zones except for
the following items
> | item_name | item_value | min_value |
> | _SERVER_.notifyoutv4 | 5 | |
> | _SERVER_.xfrrej | | 1 |
> | example.org..notifyoutv4 | 5 | |
> | example.org..xfrrej | | 1 |
> }}}
We cannot expected these rejection counters to be specific values. I add
notes about the reason. Please see the following commit?
{{{
9344c41 [2252] set the expected max value of rejection counters to 3
}}}
> - same for this:
> {{{
> | item_name | min_value | max_value |
> | _SERVER_.soaoutv4 | 1 | 3 |
> | _SERVER_.axfrreqv4 | 1 | 3 |
> ...
> }}}
On my environment, they cannot be expected to a specific value. They seem
to depend on timing and environment. I added notes about this. Please see
the following commit?
{{{
14ee53d [2252] add notes about counters of Xfrin: soaoutv4, axfrreqv4,
soaoutv6, axfrreqv6, and xfrfail
}}}
Regards,
--
Ticket URL: <http://bind10.isc.org/ticket/2252#comment:16>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list