bind-9.4.0b2 throws SERVFAIL
schumann at strato-rz.de
Fri Oct 6 09:58:57 UTC 2006
JINMEI Tatuya / 神明達哉 schrieb:
>>>>>> On Fri, 06 Oct 2006 09:01:45 +0200,
>>>>>> Marco Schumann <schumann at strato-rz.de> said:
>> This behaviour seems to be fixed in 9.4 as mentioned in the Release
>> notes as we haven't seen this in the short period we used this version
>> (9.4.0b2) on that hardware. Nevertheless, we still have seen a
>> significant amount of UDP drops.
>> Now we are running bind-9.4.0b2 with threading enabled and a
>> max-cache-size of 3072M (4G physical memory) (on Dual Core AMD
>> Opteron(tm) Processor 185). There are ~3000..4000q/s, no more UDP drops,
>> either processor core uses 50% average, we have 16 worker threads
>> enabled, the cleaning-interval is 15m. When the cache size hits
>> 2.6...2.8G, bind stops recursing and throws SERVFAIL instead. In the
>> resolver logs we find entries "resolver: error: could not mark server as
>> lame: out of memory". It disappears when named is restarted.
>> We are using views, no datalimit is set. Is max-cache-size the size per
>> view or a global setting for all views? Or where does the "out of
>> memory" come from?
> max-cache-size can apply per-view basis, but it's a global parameter
> for all views by default (and it (indirectly) helps control the amount
> of memory used for lame information). But in any event, memory
> shortage can happen if memory is consumed faster than it's cleaned.
> So, I'd be interested in how much of memory the server uses in total
> (i.e., not only for the cache). Were the 2.8G of memory just for the
> cache, or the total memory footprint? Also, (although I suspect it's
> not the direct reason for this) does your server act as an
> authoritative nameserver for a large zone? If so, it would
> effectively reduce the available memory for lame info and may lead to
> some weird situation like the one you saw. In that sense it would
> help if you can show us your named.conf.
> BTW, I'm afraid running 16 worker threads on a dual-core machine
> doesn't really make sense (or can even be harmful) because BIND9's
> worker threads normally perform non-blocking tasks; using more threads
> than available processors/cores would simply increase control overhead
> without benefit.
thanks for your quick response... The config files can be seen at
max-cache-size is defined as a global option, thus I think this is the
overall limitation? 2.8G were used by the whole process, an yes, as you
can see in the configs, some of the zones are reverse zones of /16
And you'd like to say I should decrease the number of worker threads to
the number of CPU cores? OK, I'll do that.
> memory shortage can happen if memory is consumed faster than it's cleaned.
Hmm, is the cleaning still too complicated? Or simply too slow?
More information about the bind-users