[Kea-users] kea-dhcp4 1.4.0-P1 HA features

Marcin Siodelski marcin at isc.org
Thu Sep 13 13:30:56 UTC 2018


Ivan,

Thanks for providing the log and config snippets. I have several
comments to share, but neither of them may definitively solve your issue.

Your heartbeat-delay and max-ack-delay are set to very low values. Note
that they are provided in milliseconds. This means that each server will
be constantly sending heartbeats to its partner and when the partner
doesn't respond to a heartbeat it won't wait long enough for it to
generate DHCP response before it assumes that it is down.

I realize that you might be doing it to simulate failure scenario where
the surviving server takes over the partner's traffic quickly and the
whole test is not stuck waiting for such transition. However, you may
consider re-running this test with significantly higher values of
heatbeat-delay to make sure that the server being in "partner-down"
state isn't hammered by the heartbeats it needs to generate. It
shouldn't be, but one never knows.

Secondly, your subnet contains two pools which are not assigned to any
of the HA servers (they lack "client-class" specification). Without this
specification both servers should be able to use both pools. However,
during the normal operation (load balancing) they may end up offering
the same address to two distinct clients and the race condition occurs.
Admittedly, this should not be the reason for the behavior you're
seeing, but I thought I make it clear.

Thirdly, the subnet configuration provided doesn't contain any subnet
selector. Such selector is typically an "interface" or "relay" parameter
specified at the subnet level. Let's take an "interface" as an example.
If you say "interface": "eth0" in the subnet configuration it means that
the server will assign that subnet for the DHCP traffic received on its
interface "eth0".

If the subnet selector is not provided the server will try matching
"some" address in the client's packet with available subnets. This can
be: ciaddr, giaddr, source ip address etc. However, if this client is
booting, none of those may be available and the server is unable to
select subnet for the client. As a result it will drop the query.

However, you say that the servers are responding to the clients before
simulating a failure on one of them. This would mean that the subnet is
selected correctly. However, perhaps the fact that both servers are
online is masking the issue that one of them as actually not responding?
Just a thought.

Did you try simulating a failure of the other server in the pair? I am
wondering if this is specific to the Kea instance.

Can you re-run the test with DEBUG logging enabled? We'd see if the
surviving server receives any packets and why it drops them.

Marcin

On 13.09.2018 14:09, Ivan Stenda wrote:
> Hello Marcin,
> 
> what I see on working host is:
> 
> 2018-09-13 13:52:53.079 WARN  [kea-dhcp4.ha-hooks/2558]
> HA_LEASE_UPDATE_COMMUNICATIONS_FAILED [hwtype=1 08:3e:5d:10:53:54],
> cid=[no info], tid=0x70b576d2: failed to communicate with dhcp-12
> (http://10.58.0.12:8080/): Connection refused
> 2018-09-13 13:52:53.789 WARN  [kea-dhcp4.ha-hooks/2558]
> HA_HEARTBEAT_COMMUNICATIONS_FAILED failed to send heartbeat to dhcp-12
> (http://10.58.0.12:8080/): Connection refused
> 2018-09-13 13:52:54.820 WARN  [kea-dhcp4.ha-hooks/2558]
> HA_HEARTBEAT_COMMUNICATIONS_FAILED failed to send heartbeat to dhcp-12
> (http://10.58.0.12:8080/): Connection refused
> 2018-09-13 13:52:55.957 WARN  [kea-dhcp4.ha-hooks/2558]
> HA_HEARTBEAT_COMMUNICATIONS_FAILED failed to send heartbeat to dhcp-12
> (http://10.58.0.12:8080/): Connection refused
> 2018-09-13 13:52:55.957 INFO  [kea-dhcp4.ha-hooks/2558]
> HA_STATE_TRANSITION server transitions from LOAD-BALANCING to
> PARTNER-DOWN state, partner state is UNDEFINED
> 2018-09-13 13:52:55.957 INFO  [kea-dhcp4.ha-hooks/2558]
> HA_LEASE_UPDATES_DISABLED lease updates will not be sent to the partner
> while in PARTNER-DOWN state
> 
> and can confirm that OFFERs are not send out  from working host.
> 
> 
> configuration  snippets here:
> {
>     "interfaces-config": {
>         "interfaces": [ "ens192" ],
>         "dhcp-socket-type": "udp"
>     },
> 
> {
>             "subnet": "10.187.0.0/24 <http://10.187.0.0/24>",
>             "pools": [
>                 {
>                     "pool": "10.187.0.10 - 10.187.0.127"
>                 },
>                 {
>                     "pool": "10.187.0.128 - 10.187.0.250"
>                 }
>             ],
> 
>             "option-data": [
>                 {
>                     "name": "routers",
>                     "data": "10.187.0.1"
>                 }
>             ]
> 
>         },
> 
>   "hooks-libraries": [
> {
>             "library": "/opt/kea/usr/lib/hooks/libdhcp_lease_cmds.so",
>             "parameters": { }
>         },
> {
>             "library": "/opt/kea/usr/lib/hooks/libdhcp_ha.so",
>             "parameters": {
>                 "high-availability": [ {
>                     "this-server-name": "dhcp-11",
>                     "mode": "load-balancing",
>                     "heartbeat-delay": 10,
>                     //"max-response-delay": 10000,
>                     "max-ack-delay": 5,
>                     "max-unacked-clients": 5,
>                     "peers": [
>                         {
>                             "name": "dhcp-11",
>                             "url": "http://10.58.0.11:8080/",
>                             "role": "primary",
>                             "auto-failover": true
>                         },
>                         {
>                             "name": "dhcp-12",
>                             "url": "http://10.58.0.12:8080/",
>                             "role": "secondary",
>                             "auto-failover": true
>                         }
>                     ]
>                 } ]
>             }
>         }
> 
>   ]
> 
> },
> 
> 
> regards
> i
> 
> št 13. 9. 2018 o 13:23 Marcin Siodelski <marcin at isc.org
> <mailto:marcin at isc.org>> napísal(a):
> 
>     On 13.09.2018 09:03, Ivan Stenda wrote:
>     > Hello guys,
>     >
>     > I am trying to set up HA on $subj with no luck. Managed peers to
>     talk in
>     > between via kea-ctrl-agent, lease updates send from host to host and
>     > vice versa but on simulated failure clients are not served seamless.
>     > They are doing whole DORA because working host does not send OFFER
>     from
>     > failed peer pool ...
>     >
>     > Maybe I am wrong about networking around KEA. Could someone guide
>     me for
>     > networking setup in case of UDP socket and relay hosts ?
>     >
>     > regards
>     > i
>     >
>     >
>     > _______________________________________________
>     > Kea-users mailing list
>     > Kea-users at lists.isc.org <mailto:Kea-users at lists.isc.org>
>     > https://lists.isc.org/mailman/listinfo/kea-users
>     >
> 
>     Hello Ivan,
> 
>     It is hard to say without looking into configurations of both of your HA
>     peers.
> 
>     I guess the first question is whether the server that takes over the
>     traffic from the failed partner sends an OFFER (according to your logs)
>     and this OFFER doesn't go through the network, or the OFFER is not
>     generated by the server. Also, when you're expecting those offers do you
>     observe the server which should send this offer being in the
>     "partner-down" state (according to logs)?
> 
>     Marcin Siodelski
>     ISC DHCP Engineering
> 
> 
> 
> _______________________________________________
> Kea-users mailing list
> Kea-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/kea-users
> 




More information about the Kea-users mailing list