bind-9.4.0b2 throws SERVFAIL

Fri Oct 6 09:58:57 UTC 2006

JINMEI Tatuya / 神明達哉 schrieb:
>>>>>> On Fri, 06 Oct 2006 09:01:45 +0200, 
>>>>>> Marco Schumann <schumann at strato-rz.de> said:
> 
>>   This behaviour seems to be fixed in 9.4 as mentioned in the Release
>> notes as we haven't seen this in the short period we used this version
>> (9.4.0b2) on that hardware. Nevertheless, we still have seen a
>> significant amount of UDP drops.
>>  Now we are running bind-9.4.0b2 with threading enabled and a
>> max-cache-size of 3072M (4G physical memory) (on Dual Core AMD
>> Opteron(tm) Processor 185). There are ~3000..4000q/s, no more UDP drops,
>> either processor core uses 50% average, we have 16 worker threads
>> enabled, the cleaning-interval is 15m. When the cache size hits
>> 2.6...2.8G, bind stops recursing and throws SERVFAIL instead. In the
>> resolver logs we find entries "resolver: error: could not mark server as
>> lame: out of memory". It disappears when named is restarted.
>>  We are using views, no datalimit is set. Is max-cache-size the size per
>> view or a global setting for all views? Or where does the "out of
>> memory" come from?
> 
> max-cache-size can apply per-view basis, but it's a global parameter
> for all views by default (and it (indirectly) helps control the amount
> of memory used for lame information).  But in any event, memory
> shortage can happen if memory is consumed faster than it's cleaned.
> 
> So, I'd be interested in how much of memory the server uses in total
> (i.e., not only for the cache).  Were the 2.8G of memory just for the
> cache, or the total memory footprint?  Also, (although I suspect it's
> not the direct reason for this) does your server act as an
> authoritative nameserver for a large zone?  If so, it would
> effectively reduce the available memory for lame info and may lead to
> some weird situation like the one you saw.  In that sense it would
> help if you can show us your named.conf.
> 
> BTW, I'm afraid running 16 worker threads on a dual-core machine
> doesn't really make sense (or can even be harmful) because BIND9's
> worker threads normally perform non-blocking tasks; using more threads
> than available processors/cores would simply increase control overhead
> without benefit.

Hi,

thanks for your quick response... The config files can be seen at

http://www.hidden-primary.net/index.html

max-cache-size is defined as a global option, thus I think this is the
overall limitation? 2.8G were used by the whole process, an yes, as you
can see in the configs, some of the zones are reverse zones of /16
networks...

And you'd like to say I should decrease the number of worker threads to
the number of CPU cores? OK, I'll do that.

> memory shortage can happen if memory is consumed faster than it's cleaned.

Hmm, is the cleaning still too complicated? Or simply too slow?

Kind regards
-- 
_____________________________
[Marco Schumann