Strange recursor response time pattern
he at uninett.no
Wed Sep 6 08:13:47 UTC 2017
>> The stats channel output relating to running tasks and memory contexts
>> is very extensive.
> Either way I would not have expected use of the statistics
> channel to negatively impact the query performance. Is the query
> channel processed with "no-delay", so that a thread doesn't get
> stuck waiting for data to drain from the other end?
I've done some further observations, by doing a system call trace
on named when it processes a HTTP request on the stats channel.
In the sample I've looked at, BIND uses ~258ms between receiving
the HTTP request until it's finished writing the statistics data
over HTTP. Most of the time (236ms) is consumed between
receiving the request and the start of pushing the result over
The thread which writes the statistics data does not process any
DNS client requests between receiving the HTTP request and
finishing sending the HTTP result. It looks like the thread has
filled the TCP send window around 12 times while sending the HTTP
result, and the OS puts the thread in the "lwp_park" state around
1-2ms each time this happens.
Shortly after the stats data has been sent, the very same thread
proceeds to process DNS client requests. I'm guessing that DNS
client queries has continued to be dispatched to the thread while
it's processing the HTTP request.
Perhaps the HTTP / stats processing ought to have been done in a
separate thread which isn't also involved in DNS client request
handling? (No, I do not really know how easy or difficult that
is to make happen...)
More information about the bind-users