<div dir="ltr"><br><div>Hi everyone,</div><div><br></div><div>We've been using Kea successfully for several months now as a key part of our provisioning process. However, it seems like the server we're running it on (a VM running under XenServer 6.5) isn't beefy enough, but I'm not 100% confident in that diagnosis.</div><div><br></div><div>There are currently ~200 unique subnets defined, about 2/3rd of which are use to provide a single lease during provisioning, at which point the host in question assigns itself a static IP. There are 77 subnets that are actively in use (for IPMI), with the following lease attributes:</div><div>


<p class="gmail-p1"><span class="gmail-s1"><span class="gmail-Apple-converted-space">  </span>"valid-lifetime": 4000,<br></span><span class="gmail-Apple-converted-space">  </span>"renew-timer": 1000,<br><span class="gmail-Apple-converted-space">  </span>"rebind-timer": 2000,<br><br>From what I'm seeing in the output of tcpdump, there are a LOT more requests coming in than replies going out, and <i>netstat</i> seems to confirm that:<br><br>


</p><p class="gmail-p1"><span class="gmail-s1"># netstat -us<br></span>...<br>Udp:<br><span class="gmail-Apple-converted-space">    </span>71774 packets received<br><span class="gmail-Apple-converted-space">    </span>100 packets to unknown port received.<br><span class="gmail-Apple-converted-space">    </span>565 packet receive errors<br><span class="gmail-Apple-converted-space">    </span>4911 packets sent</p><p class="gmail-p1">If I monitor <i>netstat</i> continuously, I see spikes on the RecvQ for Kea that fluctuate wildly, anywhere between 0 and nearly 500K (and sometimes higher) moment to moment.</p><p class="gmail-p1">The log also reports a lot of ALLOC_ENGINE_V4_ALLOC_FAIL errors after typically 53 attempts (not sure why 53, but that number seems to be the typical upper limit before failure is confirmed).</p><p class="gmail-p1">I've been experimenting over the last hour or so with tuning various kernel parameters (net.ip4.udp_mem, net.core.rmem_default, net.core.netdev_max_backlog, etc.) but those don't appear to make any kind of difference, and the RecvQ remains high.</p><p class="gmail-p1">Is there any way I can either tune the daemon to handle this kind of backlog, or a list of which kernel tuneables I should be looking at modifying? Is there a more clear way to determine if I've got a genuine performance limitation that we're just now running into?</p><p class="gmail-p1">I've got a bare metal machine temporarily helping carry the burden and it doesn't have these issues, but then again, it's not carrying the full load; I'm loath to dedicate a whole physical server just to DHCP, but if the load is going to remain high like this, maybe that's just what I have to do.</p><p class="gmail-p1">thanks,<br>Klaus</p></div></div>