Strange recursor response time pattern

Havard Eidnes he at uninett.no
Tue Sep 5 21:39:05 UTC 2017


>> some further local discussion has made me aware that us running
>> "collectd" for monitoring BIND may be contributing to the
>> problem; collectd fetches data each 10s by using the BIND-
>> configured statistics-channel, thus BIND is processing a TCP
>> connection to deliver the statistics data.
>>
>> It's still somewhat surprising and disappointing that this should
>> interfere this much with DNS query processing...
>
> There are various URLs (see the BIND 9 ARM) that provide a subset of the
> full statistics.
>
> The stats channel output relating to running tasks and memory contexts
> is very extensive.

Either way I would not have expected use of the statistics
channel to negatively impact the query performance.  Is the query
channel processed with "no-delay", so that a thread doesn't get
stuck waiting for data to drain from the other end?

> If collectd doesn't need the full set, you may be able to ask for just
> the traffic-volume related subset(s).

I'm using the "bind" plugin in collectd in its default
configuration.

The code for the BIND plugin of collectd is at

  https://github.com/collectd/collectd/blob/master/src/bind.c

...and it's not me who's written it.  We use grafana as the
display frontend, and the choice of what data to graph is made
through that interface.  Therefore it may not be as easy to
restrict what data is fetched via the stats channel.

I'm running BIND on a 16-core AMD Opteron 6274 processor, and
BIND is running a thread on each core.  The machine has 16G
memory, and is not starved in that department either.  I'm
therefore still rather surprised about the impact of using the
stats channel.

The other machines in my test are similarly or better spec'ed,
Xeon E5-2640 (16 cores), 32G memory, and they are also not
resource-starved.

Regards,

- Håvard


More information about the bind-users mailing list