Failure to recover from commuications-interrupted

Simon Detheridge simon at widgit.com
Mon Feb 9 14:36:47 UTC 2009


Hi,

I have a reproducible problem, where ISC DHCP v3.1.1 fails to recover to 'normal' processing mode, after entering 'communications-interrupted'. The symptoms of the problem look exactly like a problem discussed last November on this list:
https://lists.isc.org/pipermail/dhcp-users/2008-November/007433.html

As far as I can see, no solution was ever found. I think it looks like a bug.

My setup is slightly different from the description above but the behaviour looks the same... I'm trying to run DHCP over ethernet tunnels, inside Amazon's EC2 cloud. Here's how the problem manifests itself:

Two boxes are set up. Both have dhcp bound only to veth1, a virtual ethernet device. ("ip link add type veth") - The other end of the virtual ethernet device (veth0) is connected to a bridge (br0) which is then connected to multiple vtun-based tap tunnels. One tunnel goes to the other DCHP server, the rest to the clients (although this problem can be reproduced without any clients.)

When up and running, the system works fine. However, if the tunnel between primary & secondary drops for any period of time, they both move to communications-interrupted. That's what I expect, but the problem is that when the tunnel is restored, they do not move back to 'normal' despite the fact that they can ping each other.

By using tcpdump -i veth1, I can see that no traffic is sent between primary and secondary, after they've been in communications-interrupted for a period of time. I would have expected them to periodically query to see if the other host is alive, but this does not happen.

Restarting DHCP on either box restores the connection back to 'normal'.

The biggest problem with this is that when one of the boxes starts up, DHCP starts in 'communications-interrupted' mode. This is because it takes a short while to establish the tunnel between the boxes. By the time the tunnel is established, the server decides it's not going to query for it's partner any more and never moves to 'normal'. The DHCP server has to be manually restarted.

Is there any way (perhaps an undocumented configuration option??) to make the DHCP server periodically query it's partner, when in communications-interrupted mode? I'd have thought it would do this automatically.

Thanks,
Simon

-- 
Simon Detheridge - CTO, Widgit Software
26 Queen Street, Cubbington, CV32 7NA - Tel: +44 (0)1926 333680



More information about the dhcp-users mailing list