BIND 10 #762: Conversion of xfrout to use the new logging interface

Fri Jun 24 16:08:32 UTC 2011

#762: Conversion of xfrout to use the new logging interface
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  stephen                            |                Status:  reviewing
                       Type:         |             Milestone:
  enhancement                        |  Sprint-20110628
                   Priority:  major  |            Resolution:
                  Component:         |             Sensitive:  0
  logging                            |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  0.0
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:         |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by jelte):

 * owner:  jelte => jinmei

Comment:

 Replying to [comment:5 jinmei]:
 > '''General'''
 >
 > I've noticed some IDs in .mes are not used in .py and some other IDs
 > in .py are not defined in .mes.  Some specific cases are noted below,
 > but that's not a comprehensive check.  I guess we need a generic
 > checker script to identify these types of errors.  Also, ideally (as
 > also commented below), we should have test cases for log messages to
 > check whether intended log messages are really produced.

 Yeah, you're not the only one to think we need such a checking script, see
 #1055. Regarding testing log output, I also think it would be useful, but
 not as separate tests. I think perhaps we should extend our framework (or
 utility calls), to easily add 'was-this-logged?' tests within existing
 tests (as in theory the tests should cover all code anyway). I.e. in badly
 named pseudocode:
 {{{
 TEST(Foo, FooBar) {
         doFoo(goodval);
         checkLogged("Foo successfully completed");
 }

 TEST(Foo, FooBarFailed) {
         doFoo(badval):
         checkLogged("Foo failed! Error: bad value");
 }
 }}}
 Or something (it should probably include severity and levels etc, but this
 is just an example)

 If we then want to make it exhaustive, perhaps we can even try to get the
 tests to error if anything was logged that was not checked. This all is
 indeed outside the scope of this ticket, but definitely worth considering
 IMO.

 >
 > '''Specifics'''
 > - XFROUT_IMPORT_DNS: this can also fail due to failure of loading
 >   libutil..?  Also, it would be nicer if the detailed explanation
 >   could provide some more details of "a PYTHONPATH problem" and of how
 >   it can be fixed.
 >

 I've changed it to this:
 % XFROUT_IMPORT error importing python module: %1
 There was an error importing a python module. One of the modules needed
 by xfrout could not be found. This suggests that either some libraries
 are missing on the system, or the PYTHONPATH variable is not correct.
 The specific place where this library needs to be may depend on your
 system and your specific installation.

 > - XFROUT_HANDLE_QUERY: should %e be %1?  If so, it also seems to
 >   suggest that ideally we should have tests to check if expected log
 >   messages are produced (although it may not be easy, and would
 >   probably be beyond the scope of this ticket).
 >

 Ack, see above.

 > - XFROUT_PARSE_QUERY
 >   - "to xfrout" may be redundant?

 ok

 >   - "in principle" this should not happen except for really unexpected
 >     events like memory allocation failure because b10-auth should have
 >     handled bogus packets.

 or if the python library somehow manages to disagree with the c++ library
 of what constitutes a correct query ;)

 Added this text:
 In general, this should only occur for unexpected problems like
 memory allocation failures, as the query should already have been
 parsed by the b10-auth daemon, before it was passed here.

 (note that should we even come up with an 'optimization' where only the
 header is parsed before it is sent on this text would not be true anymore)

 >
 > - XFROUT_AXFR_TRANSFER_FAILED
 >  - maybe we should print RR class, too, as the commented-out original
 >    log tried to do (but if we do, don't hardcode it please:-)?

 hmm, now that you mention it, the code only looks at the name, not the
 class... I've extracted the requested class as well, but made checking the
 class in the code an added TODO (there are more of those :p)

 >  - according to the code, "REFUSED" happens the xfrout server reaches
 >    its max-transfer-out quota.  "not allowed" (as written in the
 >    detailed description) is not necessarily wrong, but to me this term
 >    sounds more like refusal due to ACL
 >

 Yes, actually, my message is partly incomplete (the max transfers thing),
 and partly ahead of time. I've changed it to

 (the limit of simultaneous outgoing AXFR
 transfers, as specified by the configuration value
 Xfrout/max_transfers_out, has been reached).

 And moved the original explanation to a commented-out value.

 > - XFROUT_AXFR_TRANSFER_STARTED
 >  - print RR class?
 >  - should we want to log the "END" event as the original
 >    (commented-out) log tried?

 ok, ok

 >
 > - XFROUT_AXFR_TRANSFER_ERROR
 >  - print RR class?

 ok

 >
 > - XFROUT_STOPPING: this is a specific event happening in the middle of
 >   transfer.  I'd clarify that fact.

 eek. Good point, updated text. But I think we should actually complete
 ongoing transfers before shutting down (but not accept new ones)

 >
 > - XFROUT_REQUEST_FETCH
 >   - This doesn't seem to be defined. (another case that would suggest
 >     the need for tests...)

 and that unit tests are not exhaustive...
 should have been FETCH_REQUEST, not REQUEST_FETCH

 >   - since this is an error event, I'd name it accordingly (FETCH seems
 >     like a normal event).
 >

 added _ERROR

 > - XFROUT_SOCKET_SELECT
 >   - need %1?
 >   - better name to indicate it's an error?

 added

 >   - does this really indicate "an error in the internal communication
 with b10-auth"?

 is my understanding that this is what this part of the code does (read
 fd's from b10-auth) wrong?

 >
 > - XFROUT_PROCESS_REQUEST
 >   - I suspect "client_address" here means b10-auth ('s unix domain
 >     socket?) and is mostly meaningless.

 hmm. the naming had me fooled then, what address is there to be on a
 domain socket? removed it.

 >   - better name to indicate it's an error?..hmm...this point seems to
 >     be common for many others, so I won't repeat.  it at least looks
 >     awkward that some error logs are named xxx_ERROR and others
 >     aren't.
 >

 yes, at some point i guess i decided that we have more than enough
 redundancy in log output, and adding _ERROR would usually mean that the
 word 'error' is printed 3 times in one log message;
 <time> ERROR [Foo.bar] FOO_BAR_ERROR error in foo while trying bar:
 <reason>

 But as it would seem i'm the only one who doesn't like this redundancy
 i'll just add ERROR to the message identifiers where it isn't clear.

 > - XFROUT_RECEIVE_FILE_DESCRIPTOR: I'm not sure if the description is
 >   accurate, but for now I'm okay with it.
 >
 > - XFROUT_UNIX_SOCKET_FILE_IN_USE
 >  - need %1?

 yep

 >  - I don't understand this: "This xfrout daemon will now shut down."
 >    Is "this xfrout daemon" the other process?  The daemon that emits
 >    this message actual exits (if that means 'shut down').

 oh, i used 'this' to make clear that it is the one printing the message
 that gives up (as opposed to 'that', which would be the already-running
 one, or 'the', which would be ambiguous). What about this?

 While starting up, the xfrout daemon tried to clear the unix domain
 socket needed for contacting the b10-auth daemon to pass requests
 on, but the file is in use. The most likely cause is that another
 xfrout daemon process is still running. This xfrout daemon (the one
 printing this message) will not start.

 >
 > - Is the log ID correct here?
 > {{{
 >         try:
 >             os.unlink(self._sock_file)
 >         except Exception as e:
 >             logger.error(XFROUT_REMOVE_UNIX_SOCKET_FILE, str(e))
 > }}}
 >   Note that XFROUT_REMOVE_UNIX_SOCKET_FILE should take two args.
 >
 > - XFROUT_REMOVE_OLD_UNIX_SOCKET_FILE: doesn't seem to be used in .py.
 >

 nope, the first should indeed have _OLD_ in it. Yet another reason for
 that checker.

 > - XFROUT_CC_SESSION_TIMEOUT_ERROR: for the detailed description, "The
 >   most likely cause it that..." should be "The most likely cause of it
 >   that..."?

 should have been 'the most likely cause is that', fixed.

-- 
Ticket URL: <http://bind10.isc.org/ticket/762#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development