Failover Partners mysteriously disconnect and never reconnect

sthaug at nethelp.no sthaug at nethelp.no
Sun Oct 26 12:52:26 UTC 2008


> I've been tracking a problem with our DHCP failover setup for some  
> time now and it seems we've tracked it down to the fact that the  
> partners get into a "communication" interrupted state and never try to  
> connect to each other again.
> 
> Both servers are stable running FreeBSD 6.3-RELEASE.
> 
> They are both connected via Gig-E to a Cisco 6509.
> 
> There are no errors on the ports, no errors on the NIC's and all other  
> applications on the servers run fine.  The ports are 1000M/full and I  
> can transfer data between them no problem.  There are about 200  
> devices connected tot he same switch, all with no connectivity  
> issues.  We've tried moving ports and NIC's, but we still see this  
> issue randomly pop up.

We have seen this too, using 3.1.1 on FreeBSD 7.0. In our case it seems
to be related to very heavy disk activity on one of the servers - heavy
enough that the disk subsystem "falls behind". Our speculation is that
this leads to starvation of the messages from the failover partner. See

     http://marc.info/?l=dhcp-users&m=121966409312188&w=2

In our case also, the servers do *not* get back to normal by themselves,
but restarting one of them fixes the problem.

Steinar Haug, Nethelp consulting, sthaug at nethelp.no


More information about the dhcp-users mailing list