Failover peer separation revisted

David Pick D.M.Pick at
Sat Nov 15 21:40:04 UTC 2008

sthaug at wrote:
>>> We have also seen the problem of the servers not reconnecting, and I
>>> believe I have a reproducible bug here - just need to test it on
>>> Linux also to be sure that it's not a FreeBSD specific bug.
>> For what it's worth, I've been hunting a problem between pairs
>> of (FreeBSD 6) machines on a backbone LAN, but nothing to do
>> with DHCP traffic. So far, I've found that under some yet-to-be-
>> defined circumstances one machine gets into a state where it
>> issues an ARP request, receives a reply (according to "tcpdump"),
>> but does not put the MAC address in that received packet into the
>> ARP tables. At the same time (more-or-less) using the "arp" user-
>> level program to try and delete an entry taked 15-20 seconds to
>> complete, but with normal very small processor time. I'm starting
>> to suspect some sort of lock problem in the kernel, but can't pin
>> it down yet. The problem eventually clears itself (for a while)...
>> I'd be interested in hearing anything you find to either confirm
>> or refute the possibility that it's the same problem.
> The DHCP failover pair that I'm using also run other network based
> services, and an ARP problem as described above would probably be
> rather quickly visible. Also, I can *trigger* "my" problem using a
> specific sequence of blocking/opening the connection between the
> servers in the DHCP failover pair.
> Thus I believe I'm seeing a different problem.

Thanks for that; it does look different.

	David Pick

