ctrace.c error / failover "peer holds all free leases"

Thu Apr 9 23:15:31 UTC 2009

There is no timer.

This has been discussed previously, and the consensus, as I remember
it, was that "determining if the other system is down" is a pretty site
specific thing. eg is it down when the other host is unable to ping it,
unable to run cron, intermediate router(s) down, you've logged in and
checked it, or some other specific thing, etc.

A very flexible workaround exists using a shell script and omshell so
it was deemed not to be functionality to put into the dhcpd core.

Depending on your renewal rate any period from a few minutes to a few
hours is a reasonable time to wait.

If I can find a link to the thread in the archives I'll post it later.

regards,
-glenn

>Date: Thu, 9 Apr 2009 21:08:38 +0100
>Subject: Re: ctrace.c error / failover "peer holds all free leases"
>From: Matt Causey <matt.causey at gmail.com>
>
>There is no timer that I can find in the source, or from operational 
experience.
>
>Our sites have 2 dhcp servers in the same rack - sometimes 1 fails.
>If the failure lasts long enough eventually 1/2 the clients start
>dying because they cannot get leases.
>
>The design concept is, that perhaps server A cannot talk to server B.
>But server B is still handing out leases somewhere.  In this scenario,
>if server A starts scavenging leases, it could lead to duplicate IP
>addresses on the network.
>
>Personally, I'd rather assume that this scenario won't happen on my
>topology - so I've got a cron which checks the status, and if a server
>is alive enough to run crons, and is in communications-interrupted for
>longer than a few minutes, then it automatically places the server in
>partner-down.
>
>Is there any way that such a timer could be added as a configurable
>option?  Conceptually - would a patch with this functionality be
>accepted by ISC?
>
>Cheers,
>
>--
>Matt
>
>2009/3/26 Foggi, Nicola <NFOGGI at depaul.edu>:
>>
>> is there a timer that takes it from communications-interrupted to 
partner-down state?  it also appears that in communications-interrupted the 
reserved leases may not be honored.  Of course i didn't have debugging enabled, 
so can't tell exactly what happened, but the primary server (which was up) 
leased a random new ip to the client vs the reserved ip... back to the source 
code, ughh...
>>
>> maybe reserved leases and failover together aren't ready for production yet 
:(
>>
>> Nicola
>>
>> -----Original Message-----
>> From: dhcp-users-bounces at lists.isc.org on behalf of Foggi, Nicola
>> Sent: Thu 3/26/2009 11:36 AM
>> To: dhcp-users at lists.isc.org
>> Subject: ctrace.c error / failover "peer holds all free leases"
>>
>>
>> running 3.1.2b1
>>
>> received this message:
>>
>> ctrace.c(168): trace_write_packet: short write (407:596)
>>
>> on one of the servers in a failover pair, caused the server to stop running, 
any ideas what it means?  The other problem is then the primary server started 
giving "peer holds all free leases" messages.  Our failover has:
>>
>>    mclt 900;
>>    split 255;
>>    load balance max seconds 3;
>>
>> so the "primary" server should have all leases to hand out if i read the 
documentation correctly, but turning on "DEBUG_FIND_LEASE" returned:
>>
>> dhcpd: Not returning a lease.
>> dhcpd: DHCPDISCOVER from 00:1c:b3:62:e7:97 via 10.99.24.1: peer holds all 
free leases
>>
>> this is after more than 24 hours of the secondary server being down... any 
one else see this?
>>
>> Nicola
>>
>>
>> _______________________________________________
>> dhcp-users mailing list
>> dhcp-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>_______________________________________________
>dhcp-users mailing list
>dhcp-users at lists.isc.org
>https://lists.isc.org/mailman/listinfo/dhcp-users
>