BIND 10 #762: Conversion of xfrout to use the new logging interface
BIND 10 Development
do-not-reply at isc.org
Fri Jun 24 16:08:32 UTC 2011
#762: Conversion of xfrout to use the new logging interface
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
stephen | Status: reviewing
Type: | Milestone:
enhancement | Sprint-20110628
Priority: major | Resolution:
Component: | Sensitive: 0
logging | Sub-Project: DNS
Keywords: | Estimated Difficulty: 0.0
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Changes (by jelte):
* owner: jelte => jinmei
Comment:
Replying to [comment:5 jinmei]:
> '''General'''
>
> I've noticed some IDs in .mes are not used in .py and some other IDs
> in .py are not defined in .mes. Some specific cases are noted below,
> but that's not a comprehensive check. I guess we need a generic
> checker script to identify these types of errors. Also, ideally (as
> also commented below), we should have test cases for log messages to
> check whether intended log messages are really produced.
Yeah, you're not the only one to think we need such a checking script, see
#1055. Regarding testing log output, I also think it would be useful, but
not as separate tests. I think perhaps we should extend our framework (or
utility calls), to easily add 'was-this-logged?' tests within existing
tests (as in theory the tests should cover all code anyway). I.e. in badly
named pseudocode:
{{{
TEST(Foo, FooBar) {
doFoo(goodval);
checkLogged("Foo successfully completed");
}
TEST(Foo, FooBarFailed) {
doFoo(badval):
checkLogged("Foo failed! Error: bad value");
}
}}}
Or something (it should probably include severity and levels etc, but this
is just an example)
If we then want to make it exhaustive, perhaps we can even try to get the
tests to error if anything was logged that was not checked. This all is
indeed outside the scope of this ticket, but definitely worth considering
IMO.
>
> '''Specifics'''
> - XFROUT_IMPORT_DNS: this can also fail due to failure of loading
> libutil..? Also, it would be nicer if the detailed explanation
> could provide some more details of "a PYTHONPATH problem" and of how
> it can be fixed.
>
I've changed it to this:
% XFROUT_IMPORT error importing python module: %1
There was an error importing a python module. One of the modules needed
by xfrout could not be found. This suggests that either some libraries
are missing on the system, or the PYTHONPATH variable is not correct.
The specific place where this library needs to be may depend on your
system and your specific installation.
> - XFROUT_HANDLE_QUERY: should %e be %1? If so, it also seems to
> suggest that ideally we should have tests to check if expected log
> messages are produced (although it may not be easy, and would
> probably be beyond the scope of this ticket).
>
Ack, see above.
> - XFROUT_PARSE_QUERY
> - "to xfrout" may be redundant?
ok
> - "in principle" this should not happen except for really unexpected
> events like memory allocation failure because b10-auth should have
> handled bogus packets.
or if the python library somehow manages to disagree with the c++ library
of what constitutes a correct query ;)
Added this text:
In general, this should only occur for unexpected problems like
memory allocation failures, as the query should already have been
parsed by the b10-auth daemon, before it was passed here.
(note that should we even come up with an 'optimization' where only the
header is parsed before it is sent on this text would not be true anymore)
>
> - XFROUT_AXFR_TRANSFER_FAILED
> - maybe we should print RR class, too, as the commented-out original
> log tried to do (but if we do, don't hardcode it please:-)?
hmm, now that you mention it, the code only looks at the name, not the
class... I've extracted the requested class as well, but made checking the
class in the code an added TODO (there are more of those :p)
> - according to the code, "REFUSED" happens the xfrout server reaches
> its max-transfer-out quota. "not allowed" (as written in the
> detailed description) is not necessarily wrong, but to me this term
> sounds more like refusal due to ACL
>
Yes, actually, my message is partly incomplete (the max transfers thing),
and partly ahead of time. I've changed it to
(the limit of simultaneous outgoing AXFR
transfers, as specified by the configuration value
Xfrout/max_transfers_out, has been reached).
And moved the original explanation to a commented-out value.
> - XFROUT_AXFR_TRANSFER_STARTED
> - print RR class?
> - should we want to log the "END" event as the original
> (commented-out) log tried?
ok, ok
>
> - XFROUT_AXFR_TRANSFER_ERROR
> - print RR class?
ok
>
> - XFROUT_STOPPING: this is a specific event happening in the middle of
> transfer. I'd clarify that fact.
eek. Good point, updated text. But I think we should actually complete
ongoing transfers before shutting down (but not accept new ones)
>
> - XFROUT_REQUEST_FETCH
> - This doesn't seem to be defined. (another case that would suggest
> the need for tests...)
and that unit tests are not exhaustive...
should have been FETCH_REQUEST, not REQUEST_FETCH
> - since this is an error event, I'd name it accordingly (FETCH seems
> like a normal event).
>
added _ERROR
> - XFROUT_SOCKET_SELECT
> - need %1?
> - better name to indicate it's an error?
added
> - does this really indicate "an error in the internal communication
with b10-auth"?
is my understanding that this is what this part of the code does (read
fd's from b10-auth) wrong?
>
> - XFROUT_PROCESS_REQUEST
> - I suspect "client_address" here means b10-auth ('s unix domain
> socket?) and is mostly meaningless.
hmm. the naming had me fooled then, what address is there to be on a
domain socket? removed it.
> - better name to indicate it's an error?..hmm...this point seems to
> be common for many others, so I won't repeat. it at least looks
> awkward that some error logs are named xxx_ERROR and others
> aren't.
>
yes, at some point i guess i decided that we have more than enough
redundancy in log output, and adding _ERROR would usually mean that the
word 'error' is printed 3 times in one log message;
<time> ERROR [Foo.bar] FOO_BAR_ERROR error in foo while trying bar:
<reason>
But as it would seem i'm the only one who doesn't like this redundancy
i'll just add ERROR to the message identifiers where it isn't clear.
> - XFROUT_RECEIVE_FILE_DESCRIPTOR: I'm not sure if the description is
> accurate, but for now I'm okay with it.
>
> - XFROUT_UNIX_SOCKET_FILE_IN_USE
> - need %1?
yep
> - I don't understand this: "This xfrout daemon will now shut down."
> Is "this xfrout daemon" the other process? The daemon that emits
> this message actual exits (if that means 'shut down').
oh, i used 'this' to make clear that it is the one printing the message
that gives up (as opposed to 'that', which would be the already-running
one, or 'the', which would be ambiguous). What about this?
While starting up, the xfrout daemon tried to clear the unix domain
socket needed for contacting the b10-auth daemon to pass requests
on, but the file is in use. The most likely cause is that another
xfrout daemon process is still running. This xfrout daemon (the one
printing this message) will not start.
>
> - Is the log ID correct here?
> {{{
> try:
> os.unlink(self._sock_file)
> except Exception as e:
> logger.error(XFROUT_REMOVE_UNIX_SOCKET_FILE, str(e))
> }}}
> Note that XFROUT_REMOVE_UNIX_SOCKET_FILE should take two args.
>
> - XFROUT_REMOVE_OLD_UNIX_SOCKET_FILE: doesn't seem to be used in .py.
>
nope, the first should indeed have _OLD_ in it. Yet another reason for
that checker.
> - XFROUT_CC_SESSION_TIMEOUT_ERROR: for the detailed description, "The
> most likely cause it that..." should be "The most likely cause of it
> that..."?
should have been 'the most likely cause is that', fixed.
--
Ticket URL: <http://bind10.isc.org/ticket/762#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list