Failover peer separation revisted

sthaug at sthaug at
Sat Nov 15 21:30:09 UTC 2008

> > We have also seen the problem of the servers not reconnecting, and I
> > believe I have a reproducible bug here - just need to test it on
> > Linux also to be sure that it's not a FreeBSD specific bug.
> For what it's worth, I've been hunting a problem between pairs
> of (FreeBSD 6) machines on a backbone LAN, but nothing to do
> with DHCP traffic. So far, I've found that under some yet-to-be-
> defined circumstances one machine gets into a state where it
> issues an ARP request, receives a reply (according to "tcpdump"),
> but does not put the MAC address in that received packet into the
> ARP tables. At the same time (more-or-less) using the "arp" user-
> level program to try and delete an entry taked 15-20 seconds to
> complete, but with normal very small processor time. I'm starting
> to suspect some sort of lock problem in the kernel, but can't pin
> it down yet. The problem eventually clears itself (for a while)...
> I'd be interested in hearing anything you find to either confirm
> or refute the possibility that it's the same problem.

The DHCP failover pair that I'm using also run other network based
services, and an ARP problem as described above would probably be
rather quickly visible. Also, I can *trigger* "my" problem using a
specific sequence of blocking/opening the connection between the
servers in the DHCP failover pair.

Thus I believe I'm seeing a different problem.

Steinar Haug, Nethelp consulting, sthaug at

