<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 10pt'>

<p>Hi Klaus,</p>

<p>I have seen something very similar on vmware with another application receiving a lot udp traffic and unfortunately we never found a solution for it and switched to bare metal as a workaround, which has irked me ever since and I'm interested in finding a root causes for these kinds of problems.</p>

<p>As far as I understand, and according to the netstat man page, Recv-Q is the count of bytes not yet copied by the user program connected to the socket. </p>

<p>Do you have special rules, execute something or do dns lookups when handling dhcp requests?</p>

<p>Have you read the comments on ALLOC_ENGINE_V4_ALLOC_FAIL?</p>

<p>"% ALLOC_ENGINE_V4_ALLOC_FAIL %1: failed to allocate an IPv4 address after %2 attempt(s)<br />The DHCP allocation engine gave up trying to allocate an IPv4 address<br />after the specified number of attempts.  This probably means that the<br />address pool from which the allocation is being attempted is either<br />empty, or very nearly empty.  As a result, the client will have been<br />refused a lease. The first argument includes the client identification<br />information.<br /><br />This message may indicate that your address pool is too small for the<br />number of clients you are trying to service and should be expanded.<br />Alternatively, if the you know that the number of concurrently active<br />clients is less than the addresses you have available, you may want to<br />consider reducing the lease lifetime.  In this way, addresses allocated<br />to clients that are no longer active on the network will become available<br />sooner."</p>

<p>Br,</p>

<p>Rasmus</p>

<p>Klaus Steden skrev den 2017-10-05 03:03:</p>

<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0"><!-- html ignored --><!-- head ignored --><!-- meta ignored -->

<div dir="ltr"><br />

<div>Hi everyone,</div>

<div> </div>

<div>We've been using Kea successfully for several months now as a key part of our provisioning process. However, it seems like the server we're running it on (a VM running under XenServer 6.5) isn't beefy enough, but I'm not 100% confident in that diagnosis.</div>

<div> </div>

<div>There are currently ~200 unique subnets defined, about 2/3rd of which are use to provide a single lease during provisioning, at which point the host in question assigns itself a static IP. There are 77 subnets that are actively in use (for IPMI), with the following lease attributes:</div>

<div>

<p class="gmail-p1"><span class="gmail-s1"><span class="gmail-Apple-converted-space">  </span>"valid-lifetime": 4000,<br /></span><span class="gmail-Apple-converted-space">  </span>"renew-timer": 1000,<br /><span class="gmail-Apple-converted-space">  </span>"rebind-timer": 2000,<br /><br />From what I'm seeing in the output of tcpdump, there are a LOT more requests coming in than replies going out, and <em>netstat</em> seems to confirm that:<br /><br /></p>

<p class="gmail-p1"><span class="gmail-s1"># netstat -us<br /></span>...<br />Udp:<br /><span class="gmail-Apple-converted-space">    </span>71774 packets received<br /><span class="gmail-Apple-converted-space">    </span>100 packets to unknown port received.<br /><span class="gmail-Apple-converted-space">    </span>565 packet receive errors<br /><span class="gmail-Apple-converted-space">    </span>4911 packets sent</p>

<p class="gmail-p1">If I monitor <em>netstat</em> continuously, I see spikes on the RecvQ for Kea that fluctuate wildly, anywhere between 0 and nearly 500K (and sometimes higher) moment to moment.</p>

<p class="gmail-p1">The log also reports a lot of ALLOC_ENGINE_V4_ALLOC_FAIL errors after typically 53 attempts (not sure why 53, but that number seems to be the typical upper limit before failure is confirmed).</p>

<p class="gmail-p1">I've been experimenting over the last hour or so with tuning various kernel parameters (net.ip4.udp_mem, net.core.rmem_default, net.core.netdev_max_backlog, etc.) but those don't appear to make any kind of difference, and the RecvQ remains high.</p>

<p class="gmail-p1">Is there any way I can either tune the daemon to handle this kind of backlog, or a list of which kernel tuneables I should be looking at modifying? Is there a more clear way to determine if I've got a genuine performance limitation that we're just now running into?</p>

<p class="gmail-p1">I've got a bare metal machine temporarily helping carry the burden and it doesn't have these issues, but then again, it's not carrying the full load; I'm loath to dedicate a whole physical server just to DHCP, but if the load is going to remain high like this, maybe that's just what I have to do.</p>

<p class="gmail-p1">thanks,<br />Klaus</p>

</div>

</div>

<br />

<div class="pre" style="margin: 0; padding: 0; font-family: monospace">_______________________________________________<br /> Kea-users mailing list<br /> <a href="mailto:Kea-users@lists.isc.org">Kea-users@lists.isc.org</a><br /> <a href="https://lists.isc.org/mailman/listinfo/kea-users">https://lists.isc.org/mailman/listinfo/kea-users</a></div>

</blockquote>

<p><br /></p>


</body></html>