BIND 10 #2796: Add a counter for queries with RD=1
BIND 10 Development
do-not-reply at isc.org
Wed May 29 17:44:00 UTC 2013
#2796: Add a counter for queries with RD=1
-------------------------------------+-------------------------------------
Reporter: y-aharen | Owner:
Type: enhancement | jinmei
Priority: medium | Status:
Component: b10-auth | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20130611
Sub-Project: DNS | Resolution:
Estimated Difficulty: 4 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:8 y-aharen]:
> > '''!ChangeLog'''
> >
> > - I'd at least explain what this counter means.
> I added an explanation (git 1ad0a17).
>
> > '''general'''
> >
> > - Is there a reason for limiting this counter for queries?
> Yes. To monitor an authoritative server, I think it is interesting
> to count 'queries' with RD=1. They are "recursive queries": normally
> they will not be reached to authoritative servers.
But in the normal case I suspect sensible requests of any type to
authoritative only servers, not only queries, don't have RD=1. So, in
that sense it'd be "interesting" to count any such abnormal requests.
If the reason for the limit is that queries with RD=1 are quite likely
from stub resolvers (if not some deviant recursive servers) and would
be of particular interest (e.g. the server address may be listed in
some /etc/resolv.conf), that may make sense. But it's not really
obvious to me, and I'd like to see it explained somewhere, like in the
spec description or man page.
> > '''statistics.cc.pre'''
> >
> > - (If there's a reason for the limitation) I'd compare opcode as an
> > Opcode object. It should be much better than using a magic number:
> > {{{#!cpp
> > if (code == 0) {
> > }}}
> I updated the code to use a constant instead of a magic number (git
89b1ca9).
> I didn't use the Opcode object to reuse a numeric code; if it's not
good,
> I'll update the code as you suggested.
In general, I'd prefer relying on higher abstraction if only because
of type safety. The major drawback of abstraction is performance, but
in this case the code should be mostly overhead free. But for this
particular case I don't necessarily insist on updating it that way;
I'd leave it to you. In any case, I'd remove this
now-mostly-redundant comment:
{{{#!cpp
// Opcode = 0: Query
if (code == Opcode::QUERY_CODE) {
}}}
(actually it could be even harmful by mentioning a specific magic
number).
> > '''lettuce features'''
> >
> > - What's the purpose of removing these? They don't seem to be related
> > to this branch:
> > {{{#!diff
> > - # make sure Auth module replied to the command
> > - And wait for new bind10 stderr message CC_REPLY
> > - # make sure the response is for 'getstats'
> > - And wait for new bind10 stderr message v4
> > }}}
> I've noticed they are not needed with #1938. For occasional test
failures
> in the past, I thought it's because there were no checks for the
response to
> 'getstats' (and I added them). However, the real cause of the failures
are an
> unpredictable behaviour of "only_new". These four lines made the test
stable
> as a side effect, but they're not necessary with #1938.
I don't understand how (the clarification of 'new' in) #1938 made it
safe. Previously the sequence was something like this:
{{{
When I wait for new bind10 stderr message
STATS_SEND_STATISTICS_REQUEST
# make sure Auth module receives a command
And wait for new bind10 stderr message AUTH_RECEIVED_COMMAND
# make sure Auth module replied to the command
And wait for new bind10 stderr message CC_REPLY
# make sure the response is for 'getstats'
And wait for new bind10 stderr message v4
}}}
and this branch removed the last 4 lines. How does #1938 allow it?
"only_new" is still unpredictable regarding the ordering between log
messages from different processes. So, for example, it's not
guaranteed that AUTH_RECEIVED_COMMAND happened after
STATS_SEND_STATISTICS_REQUEST. If you thought it guaranteed that and
that's the reason for the change, we can't do it. If you changed it
for a different reason, please explain (as I still don't understand
it).
(btw, if we talk about the "real cause", the more essential problem is
that we assume b10-stats received the response based on the behavior
of b10-auth. What we really need is to make a more reliable way to
confirm that b10-stats received the response from its own log message).
--
Ticket URL: <http://bind10.isc.org/ticket/2796#comment:9>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list