Request for review of performance advice
john.thurston at alaska.gov
Wed Jul 8 16:39:00 UTC 2020
On 7/7/2020 5:57 PM, Victoria Risk wrote:
> A while ago we created a KB article with tips on how to improve your
> performance with our Kea dhcp server. The tips were fairly obvious to
> our developers and this was pretty successful. We would like to do
> something similar for BIND, provide a dozen or so tips for how to
> maximize your throughput with BIND. However, as usual, everything is
> more complicated with BIND.
This is an excellent idea.
If it comes to fruition, I ask there be some guidance offered on when
such optimizations are useful. I've seen places where such a guide-sheet
is followed when the guidelines were suitable for a business with 10X or
100X the traffic the customer sees.
That is, a configuration which benefits an organization seeing 100,000
qps may be excessively complex (or brittle) for one seeing 100 qps.
Do things because you should, not just because you can.
John Thurston 907-465-8591
John.Thurston at alaska.gov
Department of Administration
State of Alaska
> Can those of you who care about performance, who have worked to improve
> your performance, share some of your suggestions that have the most
> impact? Please also comment if you think any of these ideas below are
> stupid or dangerous. I have combined advice for resolvers and for
> authoritative servers, I hope it is clear which is which...
> The ideas we have fall into four general categories:
> System design
> 1a) Use a load balancerto specialize your resolvers and maximize your
> cache hit ratio. A load balancer is traditionally designed to spread
> the traffic out evenly among a pool of servers, but it can also be used
> to concentrate related queries on one server to make its cache as hot as
> possible. For example, if all queries for domains in .info are sent to
> one server in a pool, there is a better chance that an answer will be in
> the cache there.
> 1b) If you have a large authoritative system with many servers, consider
> dedicating some machines to propagate transfers. These machines, called
> transfer servers, would not answer client queries, but just send
> notifies and process IXFR requests.
> 1c) Deploy ghost secondaries. If you store copies of authoritative
> zones on resolvers (resolvers as undelegated secondaries), you can avoid
> querying those authoritative zones. The most obvious uses of this would
> be mirroring the root zone locally or mirroring your own authoritative
> zones on your resolver.
> we have other system design ideas that we suspect would help, but we are
> not sure, so I will wait to see if anyone suggests them.
> OS settings and the system environment
> 2a) Run on bare metal if possible, not on virtual machines or in the
> cloud. (any idea how much difference this makes? the only reference we
> can cite is pretty out of date -
> 2b) Consider using with-tuning-large. (https://kb.isc.org/docs/aa-01314
> This is a compile time option, so not something you can switch on and
> off during production.
> 2c) Consider which R/W lock choice you want to use -
> For the highest tested query rates (> 100,000 queries per second),
> pthreads read-write locks with hyper-threading /enabled/seem to be the
> best-performing choice by far.
> 2d) Pay attention to your choice of NIC cards. We have found wide
> variations in their performance. (Can anyone suggest what specifically
> to look for?)
> 2e) Make sure your socket send buffers are big enough. (not sure if this
> is obsolete advice, do we need to tell people how to tell if their
> buffers are causing delays?)
> 2f) When the number of CPUs is very large (32 or more), the increase in
> UDP listeners may not provide any performance improvement and might
> actually reduce throughput slightly due to the overhead of the
> additional structures and tasks. We suggest trying different values of
> -U to find the optimal one for your production environment.
> named Features
> 3a) Minimize logging. Query logging is expensive (can cost you 20% or
> more of your throughput) so don’t do it unless you are using the logs
> for something. Logging with dnstap is lower impact, but still fairly
> expensive. Don’t run in debug mode unless necessary.
> 3b) Use named.conf option minimal-responses yes; to reduce the amount of
> work that named needs to do to assemble the query response as well as
> reducing the amount of outbound traffic
> 3c) Disable synth-from-dnssec. While this seemed like a good idea, it
> turns out, in practice it does not improve performance.
> 3d) Tune your zone transfers. (https://kb.isc.org/docs/aa-00726
> When tuning the behavior of the primary, there are several factors that
> you can control:
> - The rate of notifications of changes to secondary servers
> (serial-query-rate and notify-delay)
> - Limits on concurrent zone transfers (transfers-out, tcp-clients,
> tcp-listen-queue, reserved-sockets)
> - Efficiency/management options (max-transfer-time-out,
> max-transfer-idle-out, transfer-format)
> The most important options to focus on are transfers-out,
> serial-query-rate, tcp-clients and tcp-listen-queue.
> 4e) If you use RPZ, consider using qnane-wait-recurse. We have had
> issues with RPZ transfers impacting query performance in resolvers. In
> general, more smaller RPZ zones will transfer faster than a few very
> large RPZ zones.
> 4f) Consider enabling prefetch on your resolver, unless you are running
> 9.10 (which is EOL) https://kb.isc.org/docs/aa-01122
> Fix your transport network.
> Transport network issues cause BIND to keep retrying, which is a
> performance drain.
> 4a) Disable (in some cases, completely remove in order to prevent
> ongoing interference) outbound firewalls/packet-filters (particularly
> that maintain state on connections). These are a frequent cause of
> problems in the DNS that can cause your DNS server to do a lot of extra
> 4b) Set an appropriate MTU for your network. Ensure that your network
> infrastructure supports EDNS and large UDP responses up to 4096. Ensure
> that your network infrastructure allows transit for and reassembly of
> fragmented UDP packets (these will be large query responses if you are
> DNSSEC signing)
> 4c) Ensure that your network infrastructure allows DNS over TCP.
> 4d) Check for, and eliminate any incomplete IPv6 interface set-up (what
> can go wrong here is that BIND thinksthat it can use IPv6 authoritative
> servers, but actually the sends silently fail, leaving named waiting
> unnecessarily for responses)
> Any further suggestions, corrections or warnings are very welcome.
> Thank you!
> Victoria Risk
> Product Manager
> Internet Systems Consortium
> vicky at isc.org <mailto:vicky at isc.org>
> Please visit https://urldefense.com/v3/__https://lists.isc.org/mailman/listinfo/bind-users__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYflfQafZw$ to unsubscribe from this list
> ISC funds the development of this software with paid support subscriptions. Contact us at https://urldefense.com/v3/__https://www.isc.org/contact/__;!!J2_8gdp6gZQ!7sRXGLQDm9waSVfgufc44e2-G1iYoLGoT_iBOLgmPYx3xAW8jKIAFbCB5OVJYYd9ITf9ow$ for more information.
> bind-users mailing list
> bind-users at lists.isc.org
More information about the bind-users