Request for review of performance advice

Wed Jul 8 16:39:00 UTC 2020

On 7/7/2020 5:57 PM, Victoria Risk wrote:
> A while ago we created a KB article with tips on how to improve your 
> performance with our Kea dhcp server. The tips were fairly obvious to 
> our developers and this was pretty successful. We would like to do 
> something similar for BIND, provide a dozen or so tips for how to 
> maximize your throughput with BIND. However, as usual, everything is 
> more complicated with BIND.

This is an excellent idea.

If it comes to fruition, I ask there be some guidance offered on when 
such optimizations are useful. I've seen places where such a guide-sheet 
is followed when the guidelines were suitable for a business with 10X or 
100X the traffic the customer sees.

That is, a configuration which benefits an organization seeing 100,000 
qps may be excessively complex (or brittle) for one seeing 100 qps.

--
Do things because you should, not just because you can.

John Thurston    907-465-8591
John.Thurston at alaska.gov
Department of Administration
State of Alaska

> 
> Can those of you who care about performance, who have worked to improve 
> your performance, share some of your suggestions that have the most 
> impact?  Please also comment if you think any of these ideas below are 
> stupid or dangerous. I have combined advice for resolvers and for 
> authoritative servers, I hope it is clear which is which...
> 
> The ideas we have fall into four general categories:
> 
> System design
> 1a) Use a load balancerto specialize your resolvers and maximize your 
> cache hit ratio.  A load balancer is traditionally designed to spread 
> the traffic out evenly among a pool of servers, but it can also be used 
> to concentrate related queries on one server to make its cache as hot as 
> possible. For example, if all queries for domains in .info are sent to 
> one server in a pool, there is a better chance that an answer will be in 
> the cache there.
> 
> 1b) If you have a large authoritative system with many servers, consider 
> dedicating some machines to propagate transfers. These machines, called 
> transfer servers, would not answer client queries, but just send 
> notifies and process IXFR requests.
> 
> 1c) Deploy ghost secondaries.  If you store copies of authoritative 
> zones on resolvers (resolvers as undelegated secondaries), you can avoid 
> querying those authoritative zones. The most obvious uses of this would 
> be mirroring the root zone locally or mirroring your own authoritative 
> zones on your resolver.
> 
> we have other system design ideas that we suspect would help, but we are 
> not sure, so I will wait to see if anyone suggests them.
> 
> OS settings and the system environment
> 2a) Run on bare metal if possible, not on virtual machines or in the 
> cloud. (any idea how much difference this makes? the only reference we 
> can cite is pretty out of date - 
> https://indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf 
> <https://urldefense.com/v3/__https://indico.dns-oarc.net/event/19/contributions/234/attachments/217/411/DNS_perf_OARC_Apr_14.pdf__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYfEBpbu8w$> 
> )
> 
> 2b) Consider using with-tuning-large. (https://kb.isc.org/docs/aa-01314 
> <https://urldefense.com/v3/__https://kb.isc.org/docs/aa-01314__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYdvKmJFZQ$>) 
> This is a compile time option, so not something you can switch on and 
> off during production.
> 
> 2c) Consider which R/W lock choice you want to use - 
> https://kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named 
> <https://urldefense.com/v3/__https://kb.isc.org/docs/choosing-a-read-write-lock-implementation-to-use-with-named__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYftHIt-qg$> 
> For the highest tested query rates (> 100,000 queries per second), 
> pthreads read-write locks with hyper-threading /enabled/seem to be the 
> best-performing choice by far.
> 
> 2d) Pay attention to your choice of NIC cards. We have found wide 
> variations in their performance. (Can anyone suggest what specifically 
> to look for?)
> 
> 2e) Make sure your socket send buffers are big enough. (not sure if this 
> is obsolete advice, do we need to tell people how to tell if their 
> buffers are causing delays?)
> 
> 2f) When the number of CPUs is very large (32 or more), the increase in 
> UDP listeners may not provide any performance improvement and might 
> actually reduce throughput slightly due to the overhead of the 
> additional structures and tasks. We suggest trying different values of 
> -U to find the optimal one for your production environment.
> 
> 
> named Features
> 3a) Minimize logging. Query logging is expensive (can cost you 20% or 
> more of your throughput) so don’t do it unless you are using the logs 
> for something. Logging with dnstap is lower impact, but still fairly 
> expensive. Don’t run in debug mode unless necessary.
> 
> 3b) Use named.conf option minimal-responses yes; to reduce the amount of 
> work that named needs to do to assemble the query response as well as 
> reducing the amount of outbound traffic
> 
> 3c) Disable synth-from-dnssec. While this seemed like a good idea, it 
> turns out, in practice it does not improve performance.
> 
> 3d) Tune your zone transfers. (https://kb.isc.org/docs/aa-00726 
> <https://urldefense.com/v3/__https://kb.isc.org/docs/aa-00726__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYe98KMFqg$>)
> 
> When tuning the behavior of the primary, there are several factors that 
> you can control:
> 
> - The rate of notifications of changes to secondary servers 
> (serial-query-rate and notify-delay)
> 
> - Limits on concurrent zone transfers (transfers-out, tcp-clients, 
> tcp-listen-queue, reserved-sockets)
> 
> - Efficiency/management options (max-transfer-time-out, 
> max-transfer-idle-out, transfer-format)
> 
> The most important options to focus on are transfers-out, 
> serial-query-rate, tcp-clients and tcp-listen-queue.
> 
> 4e) If you use RPZ, consider using qnane-wait-recurse. We have had 
> issues with RPZ transfers impacting query performance in resolvers. In 
> general, more smaller RPZ zones will transfer faster than a few very 
> large RPZ zones.
> 
> 4f) Consider enabling prefetch on your resolver, unless you are running 
> 9.10 (which is EOL) https://kb.isc.org/docs/aa-01122 
> <https://urldefense.com/v3/__https://kb.isc.org/docs/aa-01122__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYcf-H7ZBg$>
> 
> Fix your transport network.
> Transport network issues cause BIND to keep retrying, which is a 
> performance drain.
> 4a) Disable (in some cases, completely remove in order to prevent 
> ongoing interference) outbound firewalls/packet-filters (particularly 
> that maintain state on connections). These are a frequent cause of 
> problems in the DNS that can cause your DNS server to do a lot of extra 
> work.
> 
> 4b) Set an appropriate MTU for your network. Ensure that your network 
> infrastructure supports EDNS and large UDP responses up to 4096. Ensure 
> that your network infrastructure allows transit for and reassembly of 
> fragmented UDP packets (these will be large query responses if you are 
> DNSSEC signing)
> 
> 4c) Ensure that your network infrastructure allows DNS over TCP.
> 
> 4d) Check for, and eliminate any incomplete IPv6 interface set-up (what 
> can go wrong here is that BIND thinksthat it can use IPv6 authoritative 
> servers, but actually the sends silently fail, leaving named waiting 
> unnecessarily for responses)
> 
> Any further suggestions, corrections or warnings are very welcome.
> 
> Thank you!
> Vicky
> 
> ---------
> 
> Victoria Risk
> Product Manager
> Internet Systems Consortium
> vicky at isc.org <mailto:vicky at isc.org>
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Please visit https://urldefense.com/v3/__https://lists.isc.org/mailman/listinfo/bind-users__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYflfQafZw$  to unsubscribe from this list
> 
> ISC funds the development of this software with paid support subscriptions. Contact us at https://urldefense.com/v3/__https://www.isc.org/contact/__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYd9ITf9ow$  for more information.
> 
> 
> bind-users mailing list
> bind-users at lists.isc.org
> https://urldefense.com/v3/__https://lists.isc.org/mailman/listinfo/bind-users__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYflfQafZw$
>