<div dir="ltr"><div>Found the problem.<br></div><div><br></div><div>According to the strace log, named was sending logging to syslog. This couldn't be delivered somehow (have not investigated why). When I changed the default logging channel to a local file, named started working properly again. Diff:</div>
<div><br></div><div>===================================================================<br>--- named.conf.erb (revision 2263)<br>+++ named.conf.erb (working copy)<br>@@ -50,11 +50,16 @@<br> };<br> <br> -logging {<br>
- channel queries_syslog {<br>- syslog daemon;<br>+logging{<br>+ channel bindlog {<br>+ file "/var/log/named/bind.log" versions 3 size 5m;<br> severity info;<br>+ print-time yes;<br>+ print-severity yes;<br>
+ print-category yes;<br> };<br>- category queries { queries_syslog; };<br>+ category default{<br>+ bindlog;<br>+ };<br> };<br></div><div><br></div><div>---------</div><div>It is working for me now.</div></div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Nov 5, 2013 at 1:31 PM, K L <span dir="ltr"><<a href="mailto:kl.forwarder@gmail.com" target="_blank">kl.forwarder@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div>All,<br></div><div><br></div><div>I am hoping you can help me. I had working DNS servers, now my internal master server stopped. Restarting takes +1min. I have reinstalled it, rebooted the machine, that did not help. Server has 3 (virtual) cores and does not swap when the 'crash' happens.</div>
<div><br></div><div>What I mean by crash: the process is still running, but the server is not responding to queries. Even a `/etc/init.d/named status` takes 28 - 60 seconds.</div><div><br></div><div>Here is a strace log from when it happens: <a href="http://pastebin.com/raw.php?i=7i0PgALG" target="_blank">http://pastebin.com/raw.php?i=7i0PgALG</a> . Example:</div>
<div>6500 recvmsg(518, {msg_name(16)={sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.101.50")}, msg_iov(1)=[{"~\223\201\200\0\1\0\1\0\5\0\6\3ns3\5cymru\3com\0\0\1\0\1\300"..., 4096}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 252<br>
6500 recvmsg(518, 0x7fd4b6588900, 0) = -1 EAGAIN (Resource temporarily unavailable)</div><div><br></div><div>I am not a C programmer, but from this, what I think I see is a packet is being delivered to named, and that fails.</div>
<div><br></div><div>What could the problem be? Is this a bind problem? OS/System problem maybe?</div><div>I don't recall any (kernel) parameters since it worked.</div><div><br></div><div>Regards,</div><div>kl</div></div>
</blockquote></div><br></div>