Failover peer separation revisted

David Pick D.M.Pick at
Sat Nov 15 20:45:29 UTC 2008

sthaug at wrote:
>> I wrote to the list a few weeks ago about a random problem we're seeing.
>> We're running ISC DHCPD 3.0.7 on FreeBSD 6.3-64bit via the FreeBSD  
>> ports.
>> Everything appears to be working normally, except at random times our  
>> DHCP servers appear to just disconnect from each other and they NEVER  
>> reconnect.
> We have seen the "disconnect" problem in connection with heavy disk
> traffic as mentioned in previous messages to this list (probably
> starvation of the messages from the failover partner).
> We have also seen the problem of the servers not reconnecting, and I
> believe I have a reproducible bug here - just need to test it on
> Linux also to be sure that it's not a FreeBSD specific bug.

For what it's worth, I've been hunting a problem between pairs
of (FreeBSD 6) machines on a backbone LAN, but nothing to do
with DHCP traffic. So far, I've found that under some yet-to-be-
defined circumstances one machine gets into a state where it
issues an ARP request, receives a reply (according to "tcpdump"),
but does not put the MAC address in that received packet into the
ARP tables. At the same time (more-or-less) using the "arp" user-
level program to try and delete an entry taked 15-20 seconds to
complete, but with normal very small processor time. I'm starting
to suspect some sort of lock problem in the kernel, but can't pin
it down yet. The problem eventually clears itself (for a while)...

I'd be interested in hearing anything you find to either confirm
or refute the possibility that it's the same problem.

	David Pick

