Solved: high CPU and 'top' shows named as the culprit

Frank Bulk frnkblk at iname.com
Sat Jul 25 14:42:45 UTC 2015


For the benefit of the archives, I want to share what I found while
troubleshooting a high CPU issue on two of our servers running BIND. (We
happen to be running Debian Wheezy with a Debian patched version of BIND
9.7.3) 

While looking through some graphs I noticed that the CPU of two of our
servers was very high, and 'top' revealed that named was taking 60 to 70% of
the CPU.

Since we had enabled DNSsec earlier this week that was the item immediately
under suspicion, but the graphs showed this high CPU issue started on June
30.

Using my google-fu I found lots of hits on the "managed-keys-directory"
issue
(https://stackoverflow.com/questions/13059014/bind-named-service-high-cpu-lo
ad), and while implementing those suggestion did resolve the warning I was
seeing the logs, the CPU remained high.

The next area focus was permissions, as I was getting this warning:
	Jul 24 16:08:41 mail2 named[3787]: logging channel 'default_log'
file '/var/log/bind.log': permission denied
but logging was working correctly, and fixing the permission for that file,
and verifying it for the others, didn't reduce the CPU.
(https://askubuntu.com/questions/469866/bind-fatal-error-cant-open-custom-lo
g and https://bugs.launchpad.net/ubuntu/+source/bind9/+bug/1086775 and
others).

I got side-tracked with AppArmor for a while, but our Debian installation
doesn't have that, so it's a moot point.

Then I googled more specifically for 9.7.3 and high CPU, and came across a
thread that mentioned NTP
(https://lists.isc.org/pipermail/bind-users/2012-July/088166.html).
Curious, it was June 30, the day of the leapsecond, let me check ... sure
enough, the CPU jump exactly at 7 pm (I'm in U.S. Central).

Restarting NTP (and then named) did not resolve the issue, but executing 
	date -s "$(date -u)"
immediately resolved the issue, without needing to restart named.

More on this issue here: 
https://serverfault.com/questions/405003/named-9-7-3-takes-lots-of-cpu-on-ce
ntos 
http://www.paranoids.at/high-cpu-load-due-to-leap-second/
http://blog.wpkg.org/2012/07/01/java-leap-second-bug-30-june-1-july-2012-fix
/

Frank




More information about the bind-users mailing list