BIND 10 #2790: Lettuce tests timing & missed messages
BIND 10 Development
do-not-reply at isc.org
Mon Mar 11 11:59:22 UTC 2013
#2790: Lettuce tests timing & missed messages
-------------------------------------+-------------------------------------
Reporter: vorner | Owner:
Type: defect | Status: new
Priority: medium | Milestone:
Component: Unclassified | Sprint-20130319
Keywords: | Resolution:
Sensitive: 0 | CVSS Scoring:
Sub-Project: Core | Defect Severity: N/A
Estimated Difficulty: 9 | Feature Depending on Ticket:
Total Hours: 0 | Add Hours to Ticket: 0
| Internal?: 0
-------------------------------------+-------------------------------------
Comment (by jelte):
Some additional experiment and discussion results:
It would appear that the stderr non-buffering is the culprit here; When I
repeatedly run one specific test it fails every 10-20 runs or so; I
changed the logging for that test to stdout instead of stderr, and it ran
successfully for 500+ runs.
Looking into the log4cplus code, it does a number of writes per log
message (depending on the format parts);
I tried setting stderr to line-buffering but this did not change anything,
either the call is ignored or one of the libraries resets it.
What did help was change log4cplus to first format into a string, and
output the whole string in one go. This may or may not be a good idea
anyway (as it results in an extra stack allocation, and an extra copy of
the data, but less system calls).
But what this also suggests is that the interprocess file locking may not
be working completely; one of the 'bad' lines in my output had data from
two different processes, we have so far not been able to figure out why
this happens.
So there are two workarounds for now:
- change output (either in tests, by default, or both) to use the line-
buffered stdout instead of stderr
- update log4cplus to do one write per log message (it's a very small
patch)
doing the second will however take some time to get into release and
distributions, etc.
And of course we need to figure out why the file locking doesn't appear to
work in this condition (any attempts to debug that so far resulted in the
symptom disappearing)
--
Ticket URL: <http://bind10.isc.org/ticket/2790#comment:6>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list