Failback causes lost lease

Thu Jun 25 20:39:59 UTC 2015

Gregory,

Thanks for your reply.

On 06/25/2015 12:47 PM, Gregory Sloop wrote:
Re: Failback causes lost lease SM> In testing my dhcp failover, I pulled the ethernet cable on the primary
SM> server.

SM> The secondary server acknowleged renewal requests as expected.

SM> Then I plugged the cable back in. After both the primary and secondary
SM> had moved from communications-interrupted to normal, the secondary logs
SM> multiple dhcp requests from a client whose lease is owned by the primary
SM> server. The primary server does not log any of these but the last 
SM> request, reporting that "lease in transition state expired".

SM> Then the secondary server logs a DHCPDISCOVER from that client and 
SM> records it load balancing to the primary server.

SM> The primary server also sees the DHCPDISCOVER and offers a new lease 
SM> that is not the same number as the previous lease. This despite the old
SM> number not having been reassigned.

SM> The end result is that failback causes my clients to change their ip 
SM> address.

SM> Why does this happen and how can I prevent it?

SM> _______________________________________________
SM> dhcp-users mailing list
SM> dhcp-users at lists.isc.org
SM> https://lists.isc.org/mailman/listinfo/dhcp-users

1) Logs would be good.
2) I think something with your config is broken. If I were to [wildly] guess, it's a physical/network layer issue.
3) I have a very small setup with 100+ clients, and it certainly doesn't work this way for me. 

There are some issues when a single server is up and in "communications interrupted" mode and you've got a tight IP pool and the leases were fairly evenly balanced against both servers. [I've posted, in the past, about an event that was kinda ugly for this client while running a 4.1 version [IIRC]. *However* those problems should be vastly less of a problem with 4.2+ - and you're not having an issue with communications interrupted anyway.
I am having an issue with communications interrupted. When I pull the ethernet cable, both the primary and secondary servers move from normal to  communications-interrupted.
But in your initial post on this thread you said: 

> "After both the primary and secondary
> had moved from communications-interrupted to normal"

It can't be both ways. Either they are CI, or in a Normal state. It can't be both.
Like I said, logs would probably be helpful. [Unless someone else has a lightening bolt moment and can tell you exactly what's wrong without them - but I doubt that.

As far as "tight IP pool" goes, it's the only ip in use in a /16 pool.

Yes, I expected as much. Further the symptoms you're having don't match, at all, what I'm describing. [No free leases is the result of my situation.]

IIRC, you had a problem where the two servers wouldn't recover from CI to Normal like they should too. How did you solve that problem? Is it possible this is related? [I'm too lazy to go check old threads, but I _think_ it was you...my apologies if I'm wrong.]
That was a stupid networking mistake where the failover traffic wasn't making it between peers. That problem was solved when I quit being so stupid. In this case, the peers are communicating failover data correctly when not in "communications-interrupted" stage.

So, I'd ask for logs that demonstrate that:
1) What real state both fail-over peers are actually in. [CI/Normal/recovery something else]
2) Logs and/or packet caps that show the [primary] peer who initially leased the IP is actually getting the renew requests.
3) Is this a really simple config setup? If not - and it appears to be a test environment - strip the config down to bare minimum. Then build up. It's quite easy to make a mistake in a config that borks everything up, whilst trying to do everything in one go. [Though I can't envision a mistake in a config that would produce these results/symptoms...but I'm far from a total guru on dhcpd.]

My gut feeling is unchanged, in that there's some physical/data/network/transport layer issue that's preventing all the relevant traffic getting from clients to both peers, and perhaps even between the peers themselves. 

-Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20150625/d8ad3b43/attachment.html>