BIND 9.2.4rc8 Multithreading on Win32

Sun Sep 12 16:21:09 UTC 2004

At 10:42 AM 9/11/2004, Vinny Abello wrote:
>At 11:29 PM 9/10/2004, you wrote:
>>>OK... The unfortunate part is any RBL that I serve secondary for works 
>>>in either one of two ways. The first is AXFR. I don't have control over this.
>>
>>You can't get them to run BIND 9 and use IXFR?
>
>It's a free service (sbl.spamhaus.org). I never even contacted them. They 
>allow anyone to do zone transfers and they are not IXFR (based on the 
>information I see in the logs anyway) so I don't believe I have any say in 
>it, unfortunately. I could try to reach out to them to find out.

The IXFR is initiated by the client, not the server. If the server is unable to
handle IXFR the client falls back to AXFR. If they're running BIND 9 there
should be no problem.

>>>  The second way is rsync where it's reloaded in a script after being 
>>> transferred. Both these methods result in BIND not responding to 
>>> queries for a period of time. I've seen that this is an issue that's 
>>> gone back as far as RBL zones have existed and people have been trying 
>>> to use them in BIND. It seems a lot of people use alternate programs 
>>> that handle this better. I just can't understand after all these years 
>>> why BIND is unable to both load a large file and continue to respond to 
>>> queries. MSDNS handles this just fine as does other DNS software, but I 
>>> prefer BIND of course. :)
>>
>>Multithreading and multiple CPU's largely solves this. I don't know what you
>>are seeing so it's hard to answer. I had set it up to have one more worker
>>thread than CPU's (n+1) to allow for situations like this.
>
>It doesn't seem to work like that at all unless you have more than one CPU 
>and in certain situations only.

Just to clarify, what I said about worker threads only applies to the I/O not
the tasks that needs to be managed. I did check the code and multithreading
is enabled on Windows so the task manager should be using more than
one thread to handle the zone transfer and handle queries. This should
be okay on a multi-CPU system at least. There may be a bug in the task
code but that's much harder to figure out.

>As far as the rsync RBL, what I am seeing is if I do a "rndc reload 
>zonename" on the server after the rsync is done, my server stops 
>responding to queries for a while and the CPU usage rockets on a single 
>CPU (actually it bounces around from one to another over the span of time 
>this happens). This is even on a machine with two hyperthreaded 
>processors. (Windows 2003).
>
>The zone is around 31MB in size. Even though BIND detects "found 4 CPUs, 
>using 4 worker threads", whenever that zone is reloaded, I can query the 
>server all I like and it does not reply even for zones it is master for. 
>As long as I see the one CPU pegged, it will not respond (even though 
>there are three other "processors" doing nothing). This is also on BIND 
>9.3.0rc4 on Windows which I currently have upgraded to (I like some of the 
>additional logging information and check-names and am reading up on other 
>new features).
>
>The other machines with a single processor I noted that when an AXFR zone 
>transfer occurs, they also stop responding to queries for a brief amount 
>of time, despite your n+1 worker thread design based on # of CPU's. That 
>zone is a lot smaller (around 5MB) and I've never detected a problem on 
>the machine with the two hyperthreaded processors having this issue when 
>doing a zone transfer, only the ones with a single CPU, so that is kind of 
>interesting.

As I said above the n+1 is for worker threads to handle the I/O. Anything else
is related to the way tasks are multithreaded and I don't know exactly what
goes on there.

>My synopsis is that when doing large AXFR zone transfers, multiple CPUs 
>(or worker threads) helps in keeping BIND responding to queries. However, 
>if a reload or reconfig is done via rndc that causes BIND to load a large 
>zone, this does not apply and it will still stop responding to queries. 
>That is basically what I have observed, again, even with multiple worker 
>threads. Is there a reason for this or is this a flaw/bug? And why does 
>this happen even with zone transfers on a single CPU server when it's 
>supposed to be doing n+1 worker threads?

It's possible that there is a bug but not in those worker threads which don't
deal with file I/O.

Danny