DHCP failover not balancing

Wed Feb 11 22:25:22 UTC 2009

On Sep 4, 2008, at 4:57 PM, David W. Hankins wrote:

> On Tue, Sep 02, 2008 at 12:44:53PM +0800, li jun wrote:
>> and  also  the same problem troubles me for a long time
>> would anyone help me
>>
>>> Primary server:
>>> Aug 20 09:17:04 primary dhcpd: balancing pool 8123240 1.2.3/22   
>>> total
>>> 949  free 206  backup 382  lts -88  max-own (+/-)59 Aug 20
>>> 09:17:04 primary dhcpd: balanced pool 8123240 1.2.3/22  total
>>> 949  free 206  backup 382  lts -88  max-misbal 88
>>>
>>> Secondary server:
>>> Aug 20 09:17:04 secondary dhcpd: balanced pool 8123178 1.2.3/22   
>>> total
>>> 949  free 588  backup 0  lts -294  max-misbal 88 Aug 20
>>> 09:17:04 secondary dhcpd: balancing pool 814e9f0 1.2.3/22
>>> total 30  free 19  backup 0  lts -9  max-own (+/-)2
>>> (requesting peer rebalance!)
>
> This sort of problem is a straight database inconsistency ("backup 0"
> is the key).  Additionally, I'm concerned that for 1.2.3/22, it seems
> the primary and secondary are not consistently configured (the 'total'
> count is way off).
>
> To get your servers back up and running, you want to "fault" the lease
> database; in this case the secondary's.
>
>  kill dhcpd
>  mv dhcpd.leases dhcpd.leases.save
>  touch dhcpd.leases
>  dhcpd [flags/options]
>
> The servers should cycle through recovery and get back to operational.
>
> I'm tracking a 3.0.x->3.1.x migration problem with lease database
> consistency, we'll publish a fix in the next maintenance release, but
> I'm not aware of any recurring/persistent problems here (yet?).
>
> To fix the configuration inconsistency, the BCP on failover ops right
> now is to run a dhcpd.conf on each server, and a dhcpd.include.conf
> for the subnet/etc configs, then include it from the server-specific
> dhcpd.conf's.  Just scp the included dhcpd.conf between the servers
> when you reconfigure.  This way you are very well assured they are
> both loading the same config.
>

This problem is back. Back when I got this reply I was able to fix it  
by "fault"ing the lease file on secondary server as directed. As for  
the configuration inconsistency it was never there, the problem was in  
my attempt to anonymize the IPs that I sent. We have two small server  
specific dhcpd.conf files and a large single file with the pools that  
is replicated between the servers. The problem that I have now is that  
I don't know which server to fault since there is at least one pool on  
each server that is showing 0 leases and requesting a rebalance.
For a refresher these servers are currently running 3.1.1b1 (I don't  
see any changelogs in 3.1.2 that appear to fix this, but let me know  
if you think that it will.) on Debian Etch.

Thanks,
Mike Robbert