BIND 10 #2796: Add a counter for queries with RD=1

Wed May 29 17:44:00 UTC 2013

#2796: Add a counter for queries with RD=1
-------------------------------------+-------------------------------------
            Reporter:  y-aharen      |                        Owner:
                Type:  enhancement   |  jinmei
            Priority:  medium        |                       Status:
           Component:  b10-auth      |  reviewing
            Keywords:                |                    Milestone:
           Sensitive:  0             |  Sprint-20130611
         Sub-Project:  DNS           |                   Resolution:
Estimated Difficulty:  4             |                 CVSS Scoring:
         Total Hours:  0             |              Defect Severity:  N/A
                                     |  Feature Depending on Ticket:
                                     |          Add Hours to Ticket:  0
                                     |                    Internal?:  0
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:8 y-aharen]:

 > > '''!ChangeLog'''
 > >
 > > - I'd at least explain what this counter means.
 > I added an explanation (git 1ad0a17).
 >
 > > '''general'''
 > >
 > > - Is there a reason for limiting this counter for queries?
 > Yes. To monitor an authoritative server, I think it is interesting
 > to count 'queries' with RD=1. They are "recursive queries": normally
 > they will not be reached to authoritative servers.

 But in the normal case I suspect sensible requests of any type to
 authoritative only servers, not only queries, don't have RD=1.  So, in
 that sense it'd be "interesting" to count any such abnormal requests.

 If the reason for the limit is that queries with RD=1 are quite likely
 from stub resolvers (if not some deviant recursive servers) and would
 be of particular interest (e.g. the server address may be listed in
 some /etc/resolv.conf), that may make sense.  But it's not really
 obvious to me, and I'd like to see it explained somewhere, like in the
 spec description or man page.

 > > '''statistics.cc.pre'''
 > >
 > > - (If there's a reason for the limitation) I'd compare opcode as an
 > >   Opcode object.  It should be much better than using a magic number:
 > > {{{#!cpp
 > >         if (code == 0) {
 > > }}}
 > I updated the code to use a constant instead of a magic number (git
 89b1ca9).
 > I didn't use the Opcode object to reuse a numeric code; if it's not
 good,
 > I'll update the code as you suggested.

 In general, I'd prefer relying on higher abstraction if only because
 of type safety.  The major drawback of abstraction is performance, but
 in this case the code should be mostly overhead free.  But for this
 particular case I don't necessarily insist on updating it that way;
 I'd leave it to you.  In any case, I'd remove this
 now-mostly-redundant comment:
 {{{#!cpp
         // Opcode = 0: Query
         if (code == Opcode::QUERY_CODE) {
 }}}
 (actually it could be even harmful by mentioning a specific magic
 number).

 > > '''lettuce features'''
 > >
 > > - What's the purpose of removing these?  They don't seem to be related
 > >   to this branch:
 > > {{{#!diff
 > > -        # make sure Auth module replied to the command
 > > -        And wait for new bind10 stderr message CC_REPLY
 > > -        # make sure the response is for 'getstats'
 > > -        And wait for new bind10 stderr message v4
 > > }}}
 > I've noticed they are not needed with #1938. For occasional test
 failures
 > in the past, I thought it's because there were no checks for the
 response to
 > 'getstats' (and I added them). However, the real cause of the failures
 are an
 > unpredictable behaviour of "only_new". These four lines made the test
 stable
 > as a side effect, but they're not necessary with #1938.

 I don't understand how (the clarification of 'new' in) #1938 made it
 safe.  Previously the sequence was something like this:
 {{{
         When I wait for new bind10 stderr message
 STATS_SEND_STATISTICS_REQUEST
         # make sure Auth module receives a command
         And wait for new bind10 stderr message AUTH_RECEIVED_COMMAND
         # make sure Auth module replied to the command
         And wait for new bind10 stderr message CC_REPLY
         # make sure the response is for 'getstats'
         And wait for new bind10 stderr message v4
 }}}
 and this branch removed the last 4 lines.  How does #1938 allow it?
 "only_new" is still unpredictable regarding the ordering between log
 messages from different processes.  So, for example, it's not
 guaranteed that AUTH_RECEIVED_COMMAND happened after
 STATS_SEND_STATISTICS_REQUEST.  If you thought it guaranteed that and
 that's the reason for the change, we can't do it.  If you changed it
 for a different reason, please explain (as I still don't understand
 it).

 (btw, if we talk about the "real cause", the more essential problem is
 that we assume b10-stats received the response based on the behavior
 of b10-auth.  What we really need is to make a more reliable way to
 confirm that b10-stats received the response from its own log message).

-- 
Ticket URL: <http://bind10.isc.org/ticket/2796#comment:9>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development