[kea-dev] Performance problems with 1M devices

Tomek Mrugalski tomasz at isc.org
Wed Mar 29 04:41:31 UTC 2017

On 28/03/17 23:07, le trung wrote:
> I have installed kea-dhcp opensource for providing IP to network device
> in the my company.
> Kea is work very good with : total of device < 800k device . But when
> total of device reaches 1M  device , kea seems has a problem:
> -          When so many request come to KEA ( 400 rqs/s )  ( exceed
> request that KEA can process , results of  my load testing for my server
> is 248 rqs/s). So when KEA sent OFFER(type 2) message to client  >>> 
> Client : sent ACK ( type 5) to KEA server.  but  KEA is processing other
> request (type 2) ( full 248 rqs/s) so  KEA can not more received 
> request  ACK ( type 5). ==è client received IP after  many time sent
> offer ( affer client sent discover 30 minute or later)
I'm sorry, but I have difficulties understanding this. Are you saying
you send ACK to Kea? That doesn't sound right. ACK is a message being
sent by the server to clients. If you somehow managed to sent ACK packet
to Kea, Kea will ignore it and log an error.

What hardware are you running this on? What Kea version are you using?
What lease backend are you using? We ran performance tests on relatively
modern hardware and came up with performance results several times
higher than 248 packets/s.

> So I have ideal :  How to configure KEA to process 248 rqs ( offer
> message )  and drop offer message  come late ( after request 248). Then
> completed processing total these 248 requests , KEA continutes receive
> and process the other 248 requests. That mean, Each time KEA just
> process 248 offer message and drop other reques come.
How exactly is your testing structured? Do you send a bulk of DHCP
packets and then wait? Is this an artificial test or actual traffic
generated by clients?

Clients have exponential backoff mechanism implemented. After first
transmission, the client waits up to 4 seconds (+- 1 second). If the
response does not arrive, it will retransmit and roughly double the
waiting time (to 7-9 seconds). This will continue until maximum wait
time of 64 seconds is reached. If during that time a packet finally
comes back, then the client should accept it. The protocol was so
designed that if the reply to first transmission gets very long to reach
the client for whatever reason, the client should accept it, even though
it may have sent additional retransmissions of that packet. Yes, the
server may then process those retransmissions, but it doesn't matter for
the client as it will have its configuration already applied.

If you *really* want to make kea drop packets after certain number of
queued packets, you can tweak your system socket buffers to make them
smaller. But I don't believe this will improve the situation much.

What you can do is to investigate why Kea is not able to respond to more
than 248 leases/sec. We manage to get around 1100 leases for MySQL and
PostgreSQL and around 9000 leases/sec for memfile. Disclaimer: I'm on a
business trip right now and quoting those from memory. The numbers may
be off a bit.

Maybe you're logging too much? Extensive logging is a sure way to kill
performance. If you're using MySQL, make sure it's configured
appropriately to handle the load.

If none of this helps, you can consider getting professional, paid
support from ISC. See https://www.isc.org/dhcp-subscription/ for details.


More information about the kea-dev mailing list