BIND 10 #2157: Create an interface to pass statistics counters in Auth module
BIND 10 Development
do-not-reply at isc.org
Wed Dec 12 10:50:54 UTC 2012
#2157: Create an interface to pass statistics counters in Auth module
-------------------------------------+-------------------------------------
Reporter: y-aharen | Owner:
Type: enhancement | jinmei
Priority: medium | Status:
Component: b10-auth | reviewing
Keywords: | Milestone:
Sensitive: 0 | Sprint-20121218
Sub-Project: DNS | Resolution:
Estimated Difficulty: 8 | CVSS Scoring:
Total Hours: 0 | Defect Severity: N/A
| Feature Depending on Ticket:
| Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Changes (by y-aharen):
* owner: y-aharen => jinmei
Comment:
Hello,
Replying to [comment:36 jinmei]:
> I'm still working on it, but giving intermediate feedback.
>
> '''statistics.h,cc'''
>
> > > - `QRAttributes::setQueryOpCode` why does it take a (signed)
integer?
> > > what if it's a bogus value like -1 or 16? isn't it better to take
> > > an `Opcode` object itself? In any case the same question applies
> > > to req_opcode_: why it's an int?
> > Passing isc::dns::OpCode introduces more dependencies in stats code,
> > so I would avoid it.
>
> But it already depends on `dns::Message`. Since it implicitly means
> depending on `dns::Opcode`, the explicit reference doesn't really
> introduce any dependency. Besides, in general, it actually seems to
> be pretty natural that statistics counters for b10-auth use some
> concepts of libdns++. And,
>
> > However, it should be at least unsigned. Should we
> > introduce an assertion check?
>
> if we pass `Opcode` we don't need to worry about invalid values. It's
> also more type safe; the current interface is susceptible to bugs like
> this:
>
> {{{#!cpp
> attr.setRequestOpCode(rcode.getCode()); // actually meant opcode
> }}}
I understood. I updated my branch as you've suggested. I use RESERVED15
for initial opcode. (git b65007a)
> > > - `QRAttributes::setQuerySig()` this interface seems to be a bit ad
> > > hoc. how would it be used if and when we support SIG(0)?
> > Sorry if I misunderstood your question, is_sig0 will be used for
SIG(0).
>
> (One of) my points is that it would look awkward to pass "tsig_error"
> if it's not TSIG signed (that could happen if/when we support SIG(0)).
> I'd separate interfaces for the TSIG specific case and for the SIG(0)
> specific case.
OK, I removed SIG(0) related code from MessageAttributes. TSIG(0) counter
is not supported as noted. (git 1e68893)
> > > - setQueryIPVersion/setQueryTransportProtocol: I'd like to see
> > > expected values for these in the documentation. And if unexpected
> > > values are given?
> > I added notes that describes expected values. If unexpected value is
given,
> > the counter will not be incremented.
>
> So we cannot detect a bug that passes an invalid value unless we
> explicitly write tests for such cases. That doesn't seem to be a good
> interface.
I added enum definitions for IP versions and transport protocols
(git c216522).
> > > - I'd explicitly include statistics/counter.h from statistics.cc
> > > as the cc file directly uses its definition (`Counter`).
> > The header file must be included from statistics.h as a definition
> > for class isc::auth::statistics::Counters requires
> > isc::statistics::Counter.
> > {{{#!cpp
> > 230 class Counters : boost::noncopyable {
> > 231 private:
> > 232 // counter for DNS message attributes
> > 233 isc::statistics::Counter server_msg_counter_;
> > }}}
>
> I know, but that's an indirect dependency. Once we change the
> internal details of `Counters` it's possible the indirect inclusion
> will be gone.
OK, I understood the problem. I added #include to .cc. (git 12ca269)
> > > - why does `Counter` store the value as int? Do we expect negative
> > > values? And are int (often 32-bit) enough (I don't think so btw)?
> > isc::statistics::Counter stores the value as 'unsigned' int. We don't
> > expect negative values. However, isc::data::IntElement only accepts
signed
> > values (long) as I read the document for the class. I assume signed
long in
> > most of environments can represent maximum value of unsigned int but
> > cannot of unsigned long. Also, wiki:StatsModule implicitly assumes
that
> > counters are 32-bit according to MIB structure.
>
> Admitting it's beyond the scope of this task, I think (unsigned)
> 32-bit integers are way too small for modern statistics counters.
> A 32-bit counter will overflow within a day if we increment it at the
> rate 100K/sec (and for high performance DNS servers 100Kqps is not
> that big). We really need to revise the MIB definition, and extend
> related interfaces so we can handle 64 bit integers.
Maybe it's beyond the scope of this task. To avoid confusion, I'll leave
this. At the time when the definition is extended, Counter should store
64-bit value and maybe UnsignedIntElement will be needed.
> > > - is this assumption correct?
> > > {{{#!cpp
> > > // assume response is TSIG signed if request is TSIG signed
> > > }}}
> > I think yes, according to auth_srv.cc:
> > {{{#!cpp
> > 668 if (tsig_context.get() != NULL) {
> > 669 message.toWire(renderer_, *tsig_context);
> > 670 } else {
> > 671 message.toWire(renderer_);
> > 672 }
> > }}}
>
> This doesn't seem to be so (e.g.) in this case:
>
> {{{#!cpp
> } catch (const Exception& ex) {
> LOG_ERROR(auth_logger, AUTH_PROCESS_FAIL).arg(ex.what());
> makeErrorMessage(renderer_, message, buffer, Rcode::SERVFAIL());
> return (true);
> }
> }}}
OK, I updated my branch (git 3c564a1).
> Some other comments on the revised version follow:
>
> - `MessageAttributes` constructor: it should now be exception free.
> I've updated the doygen comment.
OK, thank you.
> - `set/getRequestEDNS0` brief descriptions don't seem to make sense.
> especially for set, and for get it's not attribute"s" (plural).
> there are some other awkward brief descriptions. Please re-read
> and fix them.
OK, I fixed them (git dd62fb4).
> - `MessageAttributes` get method: `const` isn't necessary for the
> parameters.
Sorry I couldn't get it: does it mean the return value of
getRequestOpCode() isn't necessary to be const?
> - `Counters::get()`: AFAIC, return type should be able to be
> `ConstElementPtr`.
OK, I made it const (git f04b2eb).
> - can't opcode_to_msgcounter, etc, be in unnamed namespace?
I think it can't be since they are also used in testcases.
> - nit: why is this all capitalized?
> {{{#!cpp
> MSG_OPCODE_STATUS, // Opcode = 2: STATUS
>
> }}}
> same for BADVERS.
I updated them per <http://www.iana.org/assignments/dns-parameters/dns-
parameters.xml>.
BADVERS is registered as all capitalized. (git 74a034d)
Replying to [comment:37 jinmei]:
> Still working on it, but dumping another round of intermediate comments.
>
> Replying to [comment:35 y-aharen]:
>
> > > '''gen-statistics_items.py'''
>
> > > - it misses tests. is it okay? for example, can we be sure it's
robust
> > > against broken input? (gen-rdatacode is a bad precedent, but let's
> > > not just use it as an excuse for making another bad practice).
> > I think it can be covered with C++ compiler and the other tests since
> > the output of the script becomes the input of C++ parser and/or the
other
> > tests.
>
> It doesn't only produce C++ code, and, in fact, there seem to be some
> errors in XML output (see specific comments). It's not good, but
> we've spent too much time on this ticket, so we should probably live
> with(out) it...but, as a general matter, please do not easily conclude
> "we don't need a test for this". It often turns out to be wrong.
OK, I added a test for XML output (git 40f93b0). The error should be
covered
by the test.
> > > - This description doesn't seem to match the implementation. It's
> > > counted for all versions of EDNS (what does BIND 9 do?).
> > > {{{
> > > edns0 QR_REQUEST_EDNS0 Number of requests with
EDNS(0) received by the b10-auth server.
> > > }}}
> > The implementation was wrong and I fixed it.
> > In BIND 9, it only counts for EDNS version 0 requests.
>
> On a closer look, I guess we should actually count all versions of
> EDNS. See the code comment.
May I do it in the separate ticket with updating the wiki page? I guess it
requires some discussion.
> > > - this description also applies to the "referral" case.
> > > {{{
> > > qrynxrrset QR_QRYNXRRSET Number of queries
received by the b10-auth server resulted in NOERROR but the number of
answer RR == 0.
> > > }}}
> > Corrected as described above.
>
> I don't see any change on this.
I misunderstood you've pointed it didn't counted not only for "queries"
(and it was fixed).
The description comes from BIND 9. Is it OK to mention AA bit? Or another
approach to describe it (and QryReferral)?
> > > - why does this explicitly say "authoritative"?
> > > {{{
> > > authqryrej QR_QRYREJECT Number of
authoritative queries rejected by the b10-auth server.
> > > }}}
> > I haven't checked RD flag is off. I added a check and it's
"authoritative".
>
> I don't know if we "resolve" this point by adding the check. In that
> sense, why don't we count number of rejects for queries with RD being
> on? BIND 9 counts both, but that's because it's a hybrid server. In
> the case of b10-auth, I'd simply count all queries resulting in
> REFUSED in a single counter.
>
> For that matter, it may be interesting to count number of queries with
> RD on for b10-auth.
May I do it in the separate ticket with updating the wiki page? I guess it
requires some discussion.
> '''statistics.h'''
>
> - this comment for `Counters::inc` doesn't seem to be correct
> {{{#!cpp
> /// This method is mostly exception free. But it may still throw a
> /// standard exception if memory allocation fails inside the method.
> }}}
I think it's correct; it can throw std::bad_alloc from
isc::statistics::Counter::Counter().
> '''auth_srv.cc'''
>
> - now that the `setRequestEDNS0` implies it's EDNS version 0, I
> suspect we also need to check its version:
> {{{#!cpp
> - impl_->stats_attrs_.setQueryEDNS(true,
> - edns->getVersion() !=
0);
> + impl_->stats_attrs_.setRequestEDNS0(true);
> }}}
> note that previously it was called `setQueryEDNS` (no '0') so it
> looked awkward to check the version. note also that libdns++
> doesn't ensure it's always version 0; it just throws if the version
> number is "unknown". right now it just happens to mean 0 and only
> 0, it's not guaranteed in terms of API. Come to think of it, I'm
> not sure if we want to differentiate EDNS0 and other versions within
> b10-auth; the design decision of libdns++ is to make the user of it
> agnostic about which versions of EDNS0 are supported. So, in auth
> statistics it seems to make more sense to handle any EDNS versions
> (but in reality it always means version 0, so behavior-wise it's
> still compatible with BIND 9).
May I do it in the separate ticket with updating the wiki page? I guess it
requires some discussion.
> > > - Why is `stats_attrs` defined as part of `AuthSrvImpl`?
>
> > It's for optimization similar to isc::auth::Query. It improved the
> > overhead to around 3% with other optimizations as mentioned in
bind10-dev
> > list.
> > However, using std::bitset saves some memory, so I updated the branch.
>
> Does that particular change (defining it in `AuthSrvImpl`) really
> contribute to the performance improvement? It's hard to believe to
> me, especially with the latest branch version: operational-wise the
> overhead should basically be identical: if we define it as temporary,
> the overhead is to call the constructor (the destructor is no-op, so
> the compiler will optimize it out); if we define it in the class,
> the overhead is to call reset() at the end. And these methods do the
> same thing. I wouldn't even be surprised if the temporary version is
> faster, because the struct is not pretty compact and it's more likely
> contained in the data cache with other local variables (although I
> also suspect such effect is marginal in the context of the total
> response performance).
>
> Assuming the performance difference is marginal, my main concern with
> the current approach is that it introduces the additional state. So
> we need to ensure it's always cleaned up for every query (or always
> re-initialized at the start). The current implementation does that
> by calling reset() at the end of the processing, but since the code is
> complicated it's not immediately clear if it covers all cases (isn't
> there really missing cleanup case?). Even if it's currently correct,
> this approach is susceptible to future code changes; if and when we
> add an "intermediate exit", it could then be a point of missing cleanup.
>
> So, unless this particular change really improves the performance, I'd
> generally avoid this approach.
OK, I updated my branch (git 2afe338).
I've compared query rate with query_bench using root zone, 50% of
non-existence queries. It shows 1.72% of performance drop.
> '''gen-statisticsitems.py'''
>
> - this is necessary?
> {{{#!python
> global item_list
> }}}
> (there are several cases of this). according to the way item_list
> is used, the `global` declaration doesn't seem to be necessary to me.
OK, I removed them (git 9e0f9e4).
> - using `with` is probably a bit better than explicit open-close.
> {{{#!python
> #item_definition = open(items_definition_file, 'r')
> with open(items_definition_file, 'r') as item_definition:
> # (no explicit close() in this block)
> }}}
> Same for other cases.
OK, I updated my branch as you've suggested (git 9e0f9e4).
> - import_definitions: I'm afraid it's not obvious what this does and
> needs a comment:
> {{{#!python
> if element[0] == '':
> element.pop(0)
> }}}
OK, I added a note for parsing (git 9e0f9e4).
> - it's not obvious why using '+' here. Please add comment.
> {{{#!python
> stats_pre_json = \
> json.loads(stats_pre.read().replace('@@LOCAL'+'STATEDIR@@',
> localstatedir))
> }}}
It's to avoid substitution of the keyword. I added a note (git 9e0f9e4).
> - generate_specfile: this doesn't seem to be correct:
> {{{
> This method reads the content of skeleton and replaces
> <!-- ### STATISTICS DATA PLACEHOLDER ### --> with statistics items
> definition.
> }}}
OK, updated the doc (git 9e0f9e4).
> - generate_docfile: its convert_list isn't trivial. Please add more
> comment/doc.
OK, I added comment/doc (git 9e0f9e4).
> - generate_docfile: at least for me it produces garbage:
> {{{
> b'<variablelist><varlistentry><term>request.v4</term>...'
> }}}
> This also suggests why we need a test.
I couldn't reproduce it with Python 3.1 and Python 3.3.
Could you please let me know your environment?
> - generate_docfile: also, it's better if the output is more readable
> with newlines.
It's now pretty-printed (git 9e0f9e4).
> - generate_docfile: what's the purpose of this line?
> {{{#!python
> para_tail = command
> }}}
It is a garbage, removed it (git 9e0f9e4).
> - generate_docfile: with this implementation it will have a redundant
> space at the end of simpara.
I fixed it (git 9e0f9e4).
> - generate_cxx: if I understand it correctly, its convert_list()
> doesn't work correctly only up to 1 recursion.
I think it should work. Anyway, if it doesn't work, unittest, systest
and/or lettuce fails.
Replying to [comment:38 jinmei]:
> '''auth_srv_unittest.cc'''
>
> - Generally, added tests look pretty good, but it repeats the same
> pattern of code many times. In particular, since the condition of
> the initial counters are the same (in most if not all cases), I
> think we should unify it into a single helper method (where we check
> all counters).
> - The initialization of stats_init is repeated many times. Isn't it
> better to define it as a test class member and initialize it in the
> constructor?
OK, I unified them into a single helper method and it is called in
the constructor (git 8cdb5d8).
> - `AXFRSuccess` test: this makes me wonder whether we should count
> "forwarded" queries (and/or requests), although it'd be beyond the
> scope of this ticket.
May I do it in the separate ticket with updating the wiki page?
> - It would also be good if we simplify and unify the verbose pattern
> seen in (e.g.) `queryCounterUDPNormal`.
OK, I simplified it (git 47378a9).
> - I've checked this, but is it comprehensive? i.e., is there a
> counter that should be incremented in some scenario but not checked
> in the test?
There was; a test for the response is truncated or not. The test shows
it doesn't work. It's because RendererHolder clears the state of
MessageRenderer. I modified the branch to check whether the message is
truncated or not in the destructor of RendererHolder (git 494784a).
> '''statistics_unittest.cc'''
>
> Like auth_srv test, it would be better if we unify same/similar
patterns.
I updated my branch (git 988a0f0). It is difficult to unify tests for
compound attributes as they're complex and it helps to understand what
it tests.
> '''Other'''
>
> These files are updated in the revised version. Am I expected to
> review these?
>
> - src/lib/statistics/counter.h
> - src/lib/statistics/counter_dict.h
> - tests/lettuce/configurations/example.org.inmem.config
> - tests/lettuce/features/queries.feature
> - tests/lettuce/features/terrain/bind10_control.py
> - tests/system/bindctl/tests.sh
Yes, could you please review them?
Thanks,
--
Ticket URL: <http://bind10.isc.org/ticket/2157#comment:41>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list