[Kea-users] Help diagnosing (and potentially addressing) a possible performance problem?

Thu Oct 5 01:03:18 UTC 2017

Hi everyone,

We've been using Kea successfully for several months now as a key part of
our provisioning process. However, it seems like the server we're running
it on (a VM running under XenServer 6.5) isn't beefy enough, but I'm not
100% confident in that diagnosis.

There are currently ~200 unique subnets defined, about 2/3rd of which are
use to provide a single lease during provisioning, at which point the host
in question assigns itself a static IP. There are 77 subnets that are
actively in use (for IPMI), with the following lease attributes:

  "valid-lifetime": 4000,
  "renew-timer": 1000,
  "rebind-timer": 2000,

>From what I'm seeing in the output of tcpdump, there are a LOT more
requests coming in than replies going out, and *netstat* seems to confirm
that:

# netstat -us
...
Udp:
    71774 packets received
    100 packets to unknown port received.
    565 packet receive errors
    4911 packets sent

If I monitor *netstat* continuously, I see spikes on the RecvQ for Kea that
fluctuate wildly, anywhere between 0 and nearly 500K (and sometimes higher)
moment to moment.

The log also reports a lot of ALLOC_ENGINE_V4_ALLOC_FAIL errors after
typically 53 attempts (not sure why 53, but that number seems to be the
typical upper limit before failure is confirmed).

I've been experimenting over the last hour or so with tuning various kernel
parameters (net.ip4.udp_mem, net.core.rmem_default,
net.core.netdev_max_backlog, etc.) but those don't appear to make any kind
of difference, and the RecvQ remains high.

Is there any way I can either tune the daemon to handle this kind of
backlog, or a list of which kernel tuneables I should be looking at
modifying? Is there a more clear way to determine if I've got a genuine
performance limitation that we're just now running into?

I've got a bare metal machine temporarily helping carry the burden and it
doesn't have these issues, but then again, it's not carrying the full load;
I'm loath to dedicate a whole physical server just to DHCP, but if the load
is going to remain high like this, maybe that's just what I have to do.

thanks,
Klaus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/kea-users/attachments/20171004/51b7ae05/attachment.htm>