BIND 10 #1518: 0 byte rotated log files when changing settings
BIND 10 Development
do-not-reply at isc.org
Thu Jan 5 15:30:56 UTC 2012
#1518: 0 byte rotated log files when changing settings
-------------------------------------+-------------------------------------
Reporter: jreed | Owner: UnAssigned
Type: | Status: reviewing
defect | Milestone:
Priority: | Sprint-20120110
critical | Resolution:
Component: | Sensitive: 0
logging | Sub-Project: Core
Keywords: | Estimated Difficulty: 7 (but see
Defect Severity: High | comment)
Feature Depending on Ticket: none | Total Hours: 0
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Changes (by stephen):
* owner: stephen => UnAssigned
* status: assigned => reviewing
Comment:
Firstly, the problem is not one of lost data. What is happening is that
when the log settings are changed, a small number of new log files are
created with a size of zero (or a few hundred bytes). If the logging is
configured such that only a small number of files are retained, these
zero-length log files take up a number of that limit and older files are
deleted. (Upping the maximum number of versions to something like 20 and
looking at the i-node numbers before and after changing the log settings
shows what is happening.)
The reason why these files are created is not completely clear, but it is
to do with the fact that there are multiple processes logging to the same
file. Currently when logging settings are changed, the logging code
clears all appenders (the component that writes to the file) and sets up
new ones. The effect of this is that for each process, all log files it
has open are closed and then re-opened.
Ideally each process should reopen the same file as it closed. However, I
speculate that when a BIND 10 process attempts to reopen the log file,
sometimes the operation fails and for some reason a file rollover is
triggered. (A file rollover is where old log files are renamed with a
higher a version number and a new current log file opened.) Since there
are several BIND 10 processes, there can be several rollovers, hence the
empty files. Looking at the files opened by the BIND 10 processes
confirms that view - after a logging reconfiguration we can end up with
different BIND 10 processes pointing to different versions of the log
file. (Processes do not necessarily all end up pointing to different log
files - two or three processes may end up pointing to the same file.)
This observation also explains why some files are zero length and some
only a few hundred bytes - it depends on whether the process in question
generated a log message when it received a configuration change: if so,
the message was logged to the newly-opened log file.
So we have one problem with a change of logging configuration. However,
thinking about it, even if the reconfiguration did not generate redundant
files, we would have still have a problem when a file rollover takes
place. To illustrate this, suppose we have processes Alpha and Beta both
writing to the file "out.log". Suppose also that the file has reached its
maximum size. When Alpha next writes to the log file, the size of the
file is checked. Since it has reached its maximum size, Alpha closes the
file and performs the rollover: "out.log" is renamed "out.log.1", a new
"out.log" is created, and Alpha writes the message to that. During this
process, Beta still has the same file open, so after the rename, it is
"out.log.1" that Beta has open. If Beta now writes to the log file, the
same thing happens. The check on the file size causes the file to be
closed and the rollover to take place: "out.log.1" is renamed "out.log.2",
"out.log" is renamed "out.log.1" and Beta opens "out.log" and writes the
message to it. So although we started with both Alpha and Beta logging to
the same file, we end up with Alpha and Beta logging to different files.
(I don't think we have seen this yet, but only because tests tend to
concentrate on a single component at a time.)
In summary, we have two problems with the rotating log files if all
processes are set to log to the same file:
1. When logging is reconfigured, instead of processes closing the current
log file and reopening it, sometimes the BIND 10 processes can end up
opening different log files.
2. Even if (1) were not a problem, we can expect processes to end up
logging to different log files when the log file size reaches the maximum
limit.
'''Suggestions'''[[BR]]
1. I think we will need to enable log4cplus debugging output to chase down
the first problem. (It seems to be on by default but output is written to
stderr, which is not captured by processes started by Boss.) It may be
that the problem is just that a number of processes attempt to open the
same file together and the error causes the rollover. A small random wait
may be all that is required.
2. I think the second problem could be mitigated by using a class derived
from the log4cplus !RollingFileAppender that overrides some of its
methods. When a rollover happens, instead of closing the current file,
rolling the file names and opening a file, it should close the file, open
what it thinks should be the current file and try to write again. Only if
this file exists and is at maximum size should the rollover be performed.
We might also want to consider the idea of a lock file to prevent two
processes trying to rollover a particular log file at the same time. (In
fact, use of a lock file might well solve the first problem as well.)
''I have put this for review in case anyone else has an
insight/suggestions for this. Before the next sprint planning meeting I
will close it and create tickets for the agreed solutions.''
--
Ticket URL: <http://bind10.isc.org/ticket/1518#comment:6>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list