Tuning suggestions for high-core-count Linux servers

Mon Jun 5 07:59:53 UTC 2017

So, different tact today, namely the monitoring of '/proc/net/softnet_stat' to try reduce potential errors on the interface.

End result: 517k qps.

Final changes for the day:
sysctl -w net.core.netdev_max_backlog=32768
sysctl -w net.core.netdev_budget=2700
/root/nic_balance.sh em1 0 2

netdev_max_backlog:

An increase to this value is indicated by an increase in the 2nd column of /proc/net/softnet_stat. The default value starts at a reasonable amount, however even 500k qps pushes the limits of this buffer when pinning IRQ's to cores. Doubled it.

netdev_budget:

An increase to this value is indicated by an increase in the 3rd column of /proc/net/softnet_stat. The default value is quite low (300) and this is easily blown away, especially if all of the NIC IRQ's are pinned to a single CPU core. Tried various values until the increase was small (at 2700).

As the best numbers have been when using 2 cores however, this number can probably be lowered. It seems stable at 2700 however, so didn't re-test at lower numbers.

'/root/nic_balance.sh em1 0 2':
(Custom Script based off of RH 20150325_network_performance_tuning.pdf)

Pin all the IRQ's for the 'em1' NIC to the first 2 CPU cores of the local NUMA node.

This had the most noticeable effects. By default, the 'irqbalance' service and the system in general will create numerous rx/tx listening threads for the NIC, each with a soft interrupt. When spread across the multiple NUMA nodes, each ingress packet gets delayed as it gets switched to the NUMA node where the rest of the process is living.

At low throughput, this isn't a concern. At high throughput, this becomes quite noticeable; roughly 100k qps difference.

I tried various levels of tuning (spread across 12 cores, spread across 8, 4 and pinned to a single core), finding 2 cores the best on the bare-metal node.

...

Whilst 'softnet_stat' didn't show any dropped packets (2nd column), 'netstat -s -u' still shows 'packet receive errors'. Still uncertain how they differ and how I can fix netstat's problem.

Stuart