selecttest tool

JINMEI Tatuya / 神明達哉 Jinmei_Tatuya at isc.org
Mon Aug 4 22:29:17 UTC 2008


At Mon, 04 Aug 2008 09:24:51 -0500,
Walter Gould <gouldwp at auburn.edu> wrote:

> > If so, and if servers always return SERVFAILs to any query, it may
> > indicate a different type of problem than merely exhausting file
> > descriptors.  If you also serve an authoritative zone in that server,
> > you may want to check whether queries for names in the authoritative
> > zone are responded.  
> When the servfail errors occur, I want to *think* that we are able to 
> still resolve names for our authoritative zone. However - I will need to 
> test this again to be sure though. When the servfail errors happen - we 
> definitely cannot resolve queries for names we are not authoritative for.
> 
> Also, when this happens, I notice that the output from "lsof | grep 
> named | wc -l" jumps from around 40 to ~1000. Do you believe this is 
> related to the errors - or just a coincidence that this number rises 
> when we are having problems resolving external names?

I think these 1000 descriptors are related to the trouble you're
seeing, but I have no idea about how exactly they caused the problem.
One thing that looks strange to me is that the server seems to have
only about 1000 sockets even with the larger ISC_SOCKET_FDSETSIZE (but
this may be because the server simply needed that number of sockets).
Another thing that looks strange to me is that the server reportedly
keeps returning SERVFAILs with having a large number of sockets even
though the CPU load is not high.

I guess we need more information to diagnose:

- your detailed configuration (named.conf)
- output of initial log message (output of named -g before it starts
  accepting queries)
- output of 'rndc status' while the trouble is happening
- output of 'rndc recursing' while the trouble is happening

> > You may also want to check whether the server
> > returns a query for "version.bind TXT CH" (after configuring the
> > server to respond to it).
> >   
> What would this tell or buy me? Currently, the version is removed from 
> our named.conf file.

It will tell you whether the problem is cache specific or about the
server as a whole.  If you manage an authoritative zone in the same
server, you can do the same test with it.

---
JINMEI, Tatuya
Internet Systems Consortium, Inc.


More information about the bind-users mailing list