BIND 10 #762: Conversion of xfrout to use the new logging interface
BIND 10 Development
do-not-reply at isc.org
Mon Jun 27 18:27:48 UTC 2011
#762: Conversion of xfrout to use the new logging interface
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
stephen | Status: reviewing
Type: | Milestone:
enhancement | Sprint-20110628
Priority: major | Resolution:
Component: | Sensitive: 0
logging | Sub-Project: DNS
Keywords: | Estimated Difficulty: 2.0
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:7 jelte]:
The revised version looks mostly okay except two things:
- there was one minor grammar error in the description. I directly
fixed it and pushed it.
- The XFROUT_REMOVE_UNIX_SOCKET_FILE issue doesn't seem to be solved
(see the end of this comment message)
Also, I suggest performing manual checks (for now) about the
consistency of .mes and xfrout.py(.in), i.e., all defined ID is used
and all used IDs are defined. I've not done that comprehensive check
myself.
Other points are minor, so please feel free to address or ignore them
as you like, and, when done, feel free to merge.
> > '''General'''
> >
> > I've noticed some IDs in .mes are not used in .py and some other IDs
> > in .py are not defined in .mes. Some specific cases are noted below,
> > but that's not a comprehensive check. I guess we need a generic
> > checker script to identify these types of errors. Also, ideally (as
> > also commented below), we should have test cases for log messages to
> > check whether intended log messages are really produced.
>
> Yeah, you're not the only one to think we need such a checking script,
see #1055.
[...]
> Or something (it should probably include severity and levels etc, but
this is just an example)
>
> If we then want to make it exhaustive, perhaps we can even try to get
the tests to error if anything was logged that was not checked. This all
is indeed outside the scope of this ticket, but definitely worth
considering IMO.
These seem to be good ideas (aside from how exactly we implement them,
and on further thought we may thing these are overkilling). In any
case that's beyond the scope of this ticket. I see some (or all?)
points are covered in #1055. If #1055 would cover everything, we are
done with that. If there are some other things, I'd suggest adding
them as a comment to #1055 or creating a new ticket.
> > - XFROUT_PARSE_QUERY
> > - "in principle" this should not happen except for really unexpected
> > events like memory allocation failure because b10-auth should have
> > handled bogus packets.
>
> or if the python library somehow manages to disagree with the c++
library of what constitutes a correct query ;)
>
> Added this text:
> In general, this should only occur for unexpected problems like
> memory allocation failures, as the query should already have been
> parsed by the b10-auth daemon, before it was passed here.
>
> (note that should we even come up with an 'optimization' where only the
header is parsed before it is sent on this text would not be true anymore)
Right. I'm not sure whether we should prepare for that in this
context. We might implicitly clarify that by saying "in the current
implementation" or something. I'd leave it to you.
> > - XFROUT_AXFR_TRANSFER_FAILED
> > - maybe we should print RR class, too, as the commented-out original
> > log tried to do (but if we do, don't hardcode it please:-)?
>
> hmm, now that you mention it, the code only looks at the name, not the
class... I've extracted the requested class as well, but made checking the
class in the code an added TODO (there are more of those :p)
OK.
> > - XFROUT_AXFR_TRANSFER_STARTED
> > - print RR class?
> > - should we want to log the "END" event as the original
> > (commented-out) log tried?
>
> ok, ok
As for XFROUT_AXFR_TRANSFER_DONE, it seems this could be logged when
the transfer was incompletely terminated due to a shutdown while the
log description says:
{{{
+The transfer of the given zone has been completed successfully.
}}}
I'd say, e.g., "...has been successfully completed or aborted
halfway due to a shutdown event".
> > - XFROUT_STOPPING: this is a specific event happening in the middle of
> > transfer. I'd clarify that fact.
>
> eek. Good point, updated text. But I think we should actually complete
ongoing transfers before shutting down (but not accept new ones)
OK about the change. As for whether to complete it, I personally
think aborting it is better because it could take longer (e.g. for
several minutes in some bad cases), and when I perform "shutdown", I'd
expect the system is terminated (e.g. so I can restart it again) quite
soon. But in any case, that's a different topic (if you want to
change the behavior, please file a new ticket).
> > - XFROUT_SOCKET_SELECT
> > - does this really indicate "an error in the internal communication
with b10-auth"?
>
> is my understanding that this is what this part of the code does (read
fd's from b10-auth) wrong?
Or a "socketpair" within xfrout to signal a shutdown. But my question
is different: if python's select.select() inherits the semantics of
the standard select(2) system call, an error from the call should be
very local, such as passing an invalid FD or temporary memory
allocation failure, rather than a communication error, which should be
signaled via a "successful" select() call followed by a error result
in subsequent read/write operation. If my understanding is correct,
I'd replace "This most likely points to an error in the internal
communication with b10-auth." to, e.g. "This should be a result of
rare local error such as memory allocation failure and shouldn't
happen under normal conditions."
> > - better name to indicate it's an error?..hmm...this point seems to
> > be common for many others, so I won't repeat. it at least looks
> > awkward that some error logs are named xxx_ERROR and others
> > aren't.
>
> yes, at some point i guess i decided that we have more than enough
redundancy in log output, and adding _ERROR would usually mean that the
word 'error' is printed 3 times in one log message;
> <time> ERROR [Foo.bar] FOO_BAR_ERROR error in foo while trying bar:
<reason>
>
> But as it would seem i'm the only one who doesn't like this redundancy
i'll just add ERROR to the message identifiers where it isn't clear.
I don't like redundancy (in general:-). My main point in this case
was the inconsistency (I actually suggested adding _ERROR, but at that
time I didn't consider the redundancy that much). As for redundancy,
I see the point on one hand. I've been also a bit concerned about
possible inconsistency between the defined level and description due
to future changes to the level. For example, in the above example, we
might change our mind and make the level to "info", by changing
"logger.error" to "logger.info". That would implicitly request
changes to the log description (and ideally the ID also for
consistency), but I can easily imagine the situation we forget it.
So, it would be nicer if we only have a single definition point on the
log level, while still naming log IDs reasonably. But, like other
points in this ticket, it would be beyond the scope of this ticket.
So I'm okay with the current code for this specific case.
> > - I don't understand this: "This xfrout daemon will now shut down."
> > Is "this xfrout daemon" the other process? The daemon that emits
> > this message actual exits (if that means 'shut down').
>
> oh, i used 'this' to make clear that it is the one printing the message
that gives up (as opposed to 'that', which would be the already-running
one, or 'the', which would be ambiguous). What about this?
>
> While starting up, the xfrout daemon tried to clear the unix domain
> socket needed for contacting the b10-auth daemon to pass requests
> on, but the file is in use. The most likely cause is that another
> xfrout daemon process is still running. This xfrout daemon (the one
> printing this message) will not start.
OK.
> > - Is the log ID correct here?
> > {{{
> > try:
> > os.unlink(self._sock_file)
> > except Exception as e:
> > logger.error(XFROUT_REMOVE_UNIX_SOCKET_FILE, str(e))
> > }}}
> > Note that XFROUT_REMOVE_UNIX_SOCKET_FILE should take two args.
> >
> > - XFROUT_REMOVE_OLD_UNIX_SOCKET_FILE: doesn't seem to be used in .py.
>
> nope, the first should indeed have _OLD_ in it. Yet another reason for
that checker.
This doesn't seem to be fixed.
--
Ticket URL: <http://bind10.isc.org/ticket/762#comment:9>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list