[kea-dev] Good news and bad news
Thomas Markwalder
tmark at isc.org
Fri Nov 30 11:26:39 UTC 2018
We have a fundamental flaw in our "non-queue", default receiver logic.
One that has been there forever. We never notice it because we never
test with traffic on more than one interface. Shame on us. Here's the
code at the heart of the issue:
In the regular, main-thread mode, we call IfaceMgr::receive4() which
reads DHCP socket data with this block of code:
// Let's find out which interface/socket has the data
BOOST_FOREACH(iface, ifaces_) {
BOOST_FOREACH(SocketInfo s, iface->getSockets()) {
if (FD_ISSET(s.sockfd_, &sockets)) {
candidate.reset(new SocketInfo(s));
break;
}
}
if (candidate) {
break;
}
}
This reads and returns the first packet on the first ready interface.
Now this works fine with one interface. When there is more than one and
they are all equally busy, the first ready socket we come to is the that
gets serviced. Because we always loop through them in the same order,
if that interface is really busy it gets all the attention. The rest
starve. To demonstrate this I ran two instances of perfdhcp against
kea-dhcp4 with MySQL, and without packet queuing, configured with two
two subnets, 175.0.0.0/8 and 178.0.0.0/8:
First interface declared in the config):
----------------------------------------
Running: perfdhcp -4 -r 500 -R 500000 -p 5 175.16.1.10
***Rate statistics***
Rate: 113.388 4-way exchanges/second, expected rate: 500
***Statistics for: DISCOVER-OFFER***
sent packets: 2161
received packets: 658
drops: 1503
Second interface declared in the config:
----------------------------------------
Running: perfdhcp -4 -r 500 -R 500000 -p 5 178.16.1.10
***Rate statistics***
Rate: 0.199951 4-way exchanges/second, expected rate: 500 <-------
STARVED!!!!
***Statistics for: DISCOVER-OFFER***
sent packets: 2211
received packets: 1
drops: 2210
(I used a simple shell script and nohup to start perfdhcp instances at
the same time). What this means is that sites running Kea now with
multiple sockets (interfaces or subnets), are probably having issues
during high traffic conditions.
The good news is that the packet-queue receive logic is structured a bit
differently. Looking at IfaceMgr::receiveDHCP4Packets(), which is used
by the receiver thread:
// Let's find out which interface/socket has data.
BOOST_FOREACH(iface, ifaces_) {
BOOST_FOREACH(SocketInfo s, iface->getSockets()) {
if (FD_ISSET(s.sockfd_, &sockets)) {
receiveDHCP4Packet(*iface, s);
// Can take time so check one more time the watch
socket.
if (dhcp_receiver_->shouldTerminate()) {
return;
}
}
}
}
The function, receiveDHPC4Packet() pushes the packet on the queue, but
rather than breaking on the first one, the loop continues reading
from ALL ready interfaces. Running the same dual perdhcp test shows this:
First interface declared in the config):
----------------------------------------
Running: perfdhcp -4 -r 500 -R 500000 -p 5 175.16.1.10
***Rate statistics***
Rate: 51.3835 4-way exchanges/second, expected rate: 500
***Statistics for: DISCOVER-OFFER***
sent packets: 2172
received packets: 876
drops: 1296
Second interface declared in the config):
----------------------------------------
Running: perfdhcp -4 -r 500 -R 500000 -p 5 178.16.1.10
***Rate statistics***
Rate: 54.7949 4-way exchanges/second, expected rate: 500
***Statistics for: DISCOVER-OFFER***
sent packets: 2214
received packets: 838
drops: 1376
Notice the combined LPS is approximately 100 LPS, which matches the
single-thread performance for the serviced interface. In other words,
my test setup can serve about 100 LPS, regardless of how many interfaces
are involved. With queuing we at least service all interfaces.
One of the things we really need to add to our testing, is multiple
interface/socket scenarios.
Thomas
More information about the kea-dev
mailing list