Tuning suggestions for high-core-count Linux servers
Stuart.Browne at neustar.biz
Wed May 31 07:25:44 UTC 2017
I've been able to get my hands on some rather nice servers with 2 x 12 core Intel CPU's and was wondering if anybody had any decent tuning tips to get BIND to respond at a faster rate.
I'm seeing that pretty much cpu count beyond a single die doesn't get any real improvement. I understand the NUMA boundaries etc., but this hasn't been my experience on previous iterations of the Intel CPU's, at least not this dramatically. When I use more than a single die, CPU utilization continues to match the core count however throughput doesn't increase to match.
All the testing I've been doing for now (dnsperf from multiple sources for now) seems to be plateauing around 340k qps per BIND host.
- Primarily looking at UDP throughput here
- Intention is for high-throughput, authoritative only
- The zone files used for testing are fairly small and reside completely in-memory; no disk IO involved
- RHEL7, bind 9.10 series, iptables 'NOTRACK' firmly in place
- Current configure:
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--localstatedir=/var' '--with-libtool' '--enable-threads' '--enable-ipv6' '--with-pic' '--enable-shared' '--disable-static' '--disable-openssl-version-check' '--with-tuning=large' '--with-libxml2' '--with-libjson' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -fPIC' 'LDFLAGS=-Wl,-z,relro ' 'CPPFLAGS= -DDIG_SIGCHASE -fPIC'
- Using 'taskset' to bind to a single CPU die and limiting BIND to '-n' cpu's doesn't improve much beyond letting BIND make its own decision
- NIC interfaces are set for TOE
- rmem & wmem changes (beyond a point) seem to do little to improve performance, mainly just make throughput more consistent
I've yet to investigate the switch throughput or tweaking (don't yet have access to it).
So, any thoughts?
More information about the bind-users