selecttest tool

Walter gouldwp at auburn.edu
Sat Aug 2 08:06:02 UTC 2008


JINMEI Tatuya / 神明達哉 wrote:
> At Thu, 31 Jul 2008 10:43:21 -0500,
> Walter Gould <gouldwp at auburn.edu> wrote:
>
>   
>> Here are the results from one of our linux servers:
>>
>> # ./selecttest
>> selecttest: nsocks = 4093, TEST_FDSETSIZE = -1, FD_SETSIZE = 1024, 
>> sizeof fd_set = 128
>> created 4093 sockets, maxfd = 4095
>> FD_CLR test...OK
>> FD_SET test...OK
>> select test...OK
>>
>> Based on the above, why do I get the "too many open file descriptors" 
>> error when I run 9.5.0-P1? Help my simple mind understand this. :) Does 
>>     
>
> There are several reasons.  One big issue is BIND9 internally limited
> the possible number of open files to FD_SETSIZE.
>
>   
>> this mean that P2 will run as expected on this machine or will it give 
>> "too many open file descriptors" error also?
>>     
>
> The former (but you'll have to build BIND with a reasonable large
> value of ISC_SOCKET_FDSETSIZE).  Note, however, this doesn't mean P2
> will solve all problems that P1 had.  For example, if P1 made named
> busy (wrt CPU load), it's pretty likely that P2 will also make it
> busy.  So, you'll have to carefully watch the server behavior.
>
> ---
> JINMEI, Tatuya
> Internet Systems Consortium, Inc.
>
>   

I downloaded P2 tonight and did what was suggested in the CHANGES file.  
Here is my configure statement:
STD_CDEFINES="-DISC_SOCKET_FDSETSIZE=4096" ./configure 
--prefix=/usr/local/bind-9.5.0-P2 --sysconfdir=/var/named

It seemed to build and run fine for about an hour.  I was running "lsof 
| grep named | wc -l" every five seconds.  For the first hour, lsof 
returned values anywhere from 40 to 90, but then it climbed to right at 
1000 and named stopped resolving names.  The daemon never stopped 
running, but resolution totally failed. 

Also, I never saw the "too many open files" error in  /var/log/messages 
as I had before when running P1.  If I configured named to us a 
FDSETSIZE of 4096, then why is it failing when it reaches 1000?  When I 
run "ulimit -n", it returns 4096.  There is something weird about that 
1000...  FWIW - my load average never increased greatly.

Just so you'll know, I am running RHEL 3 (I know I need to upgrade) and 
9.5.0 runs fine.  I sure wish I could get one of these patches running 
successfully.  Any suggestions on the above?

P.S. Thanks to all the ISC staff for all of your hard work the past few 
weeks.  It is greatly appreciated.

Walter Gould
Auburn University 




More information about the bind-users mailing list