100% CPU / wedge with 9.8.3-P4 & RPZ?
vjs at rhyolite.com
Sat Mar 16 14:21:54 UTC 2013
> From: Phil Mayers <p.mayers at imperial.ac.uk>
> >> In the last 12 hours, we've had repeated instances of named getting
> >> wedged. The symptoms are:
> >> * named consuming nearly 100% CPU, all in user-time
> >> * lots of queries apparently not processed, and based on query
> >> logging, a sharp drop in the rate of queries that are
> >> * a very sharp drop (almost a complete halt, in fact) in the rate of
> >> RPZ "hits" in the logs at the exact time this happens
> >> * no other interesting logs, as far as I can see
How can the rate of RPZ hits not drop along with a sharp drop in the
rate of queries?
> >> I can't see anything in the release notes for 9.8.4/9.8.5 - any ideas?
There have been no recent RPZ hangs, but the release notes for 9.8.5b2
mention a DNSSEC hang and I noticed that imperial.ac.uk has RRSIGs.
There is also a hang mentioned in the 9.8.4-P1 release notes.
> Examination of the journal suggests they deleted and re-added more or
> less every record in the zone (presumably an error at their side).
Wouldn't deleting more or less every record in the response policy zone
tend to reduce the rate of RPZ hits?
> Does anyone else slave the Spamhaus RPZ and saw this? It seems like
> there might be a bind bug here with large updates to RPZ.
Not to defend RPZ, but what is the evidence that links RPZ to the
problem problem, even if it is related to large updates of a zone
instead of, for example, NSEC chains?
How many times did named get wedged? According to the theory that
the problem is related to large updates of policy zones, there
should have been at most 3 instances of wedged named processes per
computer and they should have happened during or soon after the end
of large rpz.spamhaus.org transfers.
My logs have these instances of transfers of rpz.spamhaus.org involving
at least 100 messages during March (NTP disciplined UTC timestamps):
02-Mar-2013 21:45:42.511 07-Mar-2013 22:47:56.423 08-Mar-2013 03:19:46.419
08-Mar-2013 03:26:50.262 08-Mar-2013 07:27:13.176 08-Mar-2013 07:33:29.203
08-Mar-2013 10:07:05.829 08-Mar-2013 11:18:09.837 15-Mar-2013 22:52:02.969
16-Mar-2013 00:04:14.447 16-Mar-2013 07:21:07.576 16-Mar-2013 11:06:46.515
Vernon Schryver vjs at rhyolite.com
More information about the bind-users