Multi-subnet/vlan and failover

Mon May 13 23:25:36 UTC 2013

Top posting... [Sorry for anyone it offends. :) ]

So, essentially if I understand you correctly, just trust that
balancing the pools should work fine with the rational defaults in the
system, and don't worry about it too much.

As far as MCLT - there is this text in the manpage
"The longer you set this, the longer it will take for the running
server to recover IP addresses after moving into PARTNER-DOWN state"

In further reading, I think this is an non-elegent way of saying that
the number of seconds set as MCLT must pass after the master and peer
re-establish connection, before the master starts acting as a master
again.

---
So, if this was a *really* long time - say 3 weeks, it would take
three weeks for a master to come back on-line and if the peer also
failed in that three weeks, you'd have no dhcp servers at all.

So, set it long enough that the remaining server can keep up with the
expected load during a failure, but short enough not to incur
excessive risk for both the peer and master failing even though the
master [or peer] is back up, but hasn't yet been up longer than the
MCLT.
---

Is that correct?

And thanks so much for the discussion. While I don't have any more
control than before - I at least think I understand the moving parts a
bit better.

-Greg

SC> So the servers will rebalance the pools on their own, it's in the
SC> code, it's not user configurable. MCLT has nothing whatsoever to do
SC> with the rebalancing process

SC> MCLT is the maximum client lead time that will be used by the server
SC> in a failover situation. e.g. if the failover is enacted and the
SC> secondary has to respond to a lease request on behalf of the primary
SC> (which is down) then the lease time will be MCLT. Additionally the
SC> first time any lease is issued by either server to a new client it
SC> will be issued as MCLT, this is to allow the background updates to
SC> take place between the failover association. When the failover is
SC> restored the server which was down will wait MCLT before starting to
SC> issue new leases, this is to allow the servers time to resync.

SC> Beware putting the system into partner down when the partner isn't
SC> actually down, I've seen this halt both servers from issuing leases
SC> for MCLT, panic mode then ensues as you can't get on the network at
SC> all.

SC> There is no situation that I'm aware of that the system wouldn't
SC> automatically rebalance (though I don't know how it would handle only
SC> having 1 remaining lease, I would assume one system would get it and
SC> the other would then have no free leases).

SC> On 11 May 2013 01:05, Gregory Sloop <gregs at sloop.net> wrote:
>> So, yes, I did have a VLAN leak. [eek - not enough sleep, too little
>> thinking!]
>>
>> But that's resolved now - thanks for the tip.
>> So, now I have failover working, as well as VLAN/Multi-segment. [Very
>> nice.]
>>
>> I must say "Thanks!!" for all those who do the work on this product.
>> It's a core piece of virtually every network and like most IT work,
>> you never get credit when it works and does so unobtrusively without a
>> bunch of babying etc. But you can always guarantee when it doesn't
>> work, they haven't forgotten where to whine to either!
>>
>> ---
>> But the discussion about the split values and lease-balancing is one
>> I'd like to discuss...
>>
>> I'm happy to start a new thread, but since we started discussing here,
>> I thought it might make sense to continue. Google should find it in a
>> search in any case...
>>
>> ---
>> So the relevant params for address recovery etc seem to be:
>> mclt - which is only _somewhat_ comprehensible to me.
>> [I see it's the maximum lease time for any lease when in partner-down
>> state - but I don't understand what it has to do with recovery of
>> leases in in PDS.]
>>
>> But if I thought that was bad, I really don't grok:
>> max-lease-misbalance
>> max-lease-ownership
>> min-balance
>> max-balance
>>
>> At least not really.
>>
>> ---
>> Is there some layman, dumb-oaf version of what happens when one of the
>> partner servers runs out of leases? [Like Thag just stumbled into
>> your data center and was looking for a job configuring DHCP servers!?
>> :) ]
>>
>> I've read the section several times, and really get fairly lost.
>>
>> Here's how I understand it.
>> In short, as the master/peer hand out addresses, they split the
>> addresses 50/50. [with a few exceptions]
>> They then hand out addresses and try to balance the free address pool
>> on master/pool so they remain equivalent to each other.
>>
>> When the system detects that it may run out of addresses on either the
>> master or the pool [over X time-frame] , it tries to re-balance the
>> free leases again to meet a 50/50 split [again with some exceptions
>> too complicated to finish explaining in the next few hours or so.]
>>
>> Does this generally sound right?
>> ---
>>
>> But does mclt have anything to do with lease re-balancing? [The
>> description seems to indicate it does, but after reading it multiple
>> times, I don't really think it does.]
>>
>> ---
>> So, as a final thought. What kinds of situations would run you in risk
>> of having a wildly mis-balanced pool and running out of addresses on a
>> master/peer - where the system wouldn't "automagically" re-balance to
>> save itself?
>>
>> What settings would help in this regard, and what values might one
>> pick.
>>
>> I'd guess this discussion has occurred before, so I'm more than glad to
>> be pointed at a thread somewhere and do the slog to read it and see if
>> that helps.
>>
>> Sorry for the long post and thanks in advance for your help!
>>
>> -Greg
>>
>>
>>
>>
>> SC> No, regardless of the split the leases will still be shared 50/50 with
>> SC> both servers, so you could still run into an issue where the secondary
>> SC> runs out of addresses. When both servers are online and one is running
>> SC> low on leases they will rebalance the lease pool and share the
>> SC> remaining leases 50/50. (This bit really needs to be documented better
>> SC> as lots of people fall into that trap)
>>
>> SC> 255 would make the primary respond to all requests when both systems
>> SC> are online. When the primary goes offline you will have a limited
>> SC> amount of time before the leases will be depleted, at which point you
>> SC> will need to tell the secondary that its partner is down and the
>> SC> secondary will then assume control of the full lease pools.
>>
>> SC> My general advice to anyone using DHCP failover is if either of the
>> SC> systems is going to be out for longer than the period of your smallest
>> SC> lease time then set the partner to be down as once that minimum lease
>> SC> time is up you will already have started eating into additional
>> SC> leases.
>>
>>
>>
>> SC> On 10 May 2013 08:58, Gregory Sloop <gregs at sloop.net> wrote:
>>>> It might be, it is a test environment - but I didn't think I had
>>>> anything that whacked.
>>>>
>>>> I'll do some more testing the next chance I get. Any other ideas are
>>>> more then welcome.
>>>>
>>>> ---
>>>> As for split - I generally intend for all requests to be handled by
>>>> the primary and only fail to the peer. [Fail-over only, no
>>>> load-balance]
>>>>
>>>> I'm not sure if that's the best idea - but it seems more
>>>> straightforward. (Essentially my worry is if the blocks are split and
>>>> a peer goes down, could we run out of addresses in the block for the
>>>> "up" server before reclaiming them from the "down" server. I suspect
>>>> this worry is mostly because I don't fully grasp how it is handling
>>>> things, despite reading the docs - but not as carefully as I probably
>>>> need to do.)
>>>>
>>>> [So, I assume a split of 255 would then make it do what I want, having
>>>> all requests served by the primary - instead of load-balance, right?]
>>>>
>>>>
>>>> -Greg
>>>>
>>>>
>>>> SC> Sounds like you have a leak in your network and broadcast packets are
>>>> SC> leaking from one VLAN into another.
>>>>
>>>> SC> One other thing, is there a reason you are using "split 0;"? This
>>>> SC> would mean the secondary peer will answer all lease requests. For a
>>>> SC> balanced approach you should use 128 which will allow both DHCP
>>>> SC> servers to respond to lease requests.
>>>>
>>>> SC> On 10 May 2013 08:19, Gregory Sloop <gregs at sloop.net> wrote:
>>>>>> As a follow-up, because it may well impact the answer to my duplicate
>>>>>> DHCPOFFER issue, let me describe how the DHCP servers are connected in
>>>>>> relation to VLANS etc.
>>>>>>
>>>>>> The DHCP Servers are on VLAN1, say 10.1.1.11/10.1.1.12 [master/peer]
>>>>>>
>>>>>> The L3 switch is configured to forward dhcp sessions to 10.1.1.11 and
>>>>>> 10.1.1.12
>>>>>>
>>>>>> ---
>>>>>> The duplicate messages are seen on DHCP negotiations from VLAN3 [and, I assume VLAN2]
>>>>>>
>>>>>> But I have not tested VLAN1 or VLAN2 attached clients to see what
>>>>>> happens on those VLANs.
>>>>>>
>>>>>> TIA for any assistance!
>>>>>>
>>>>>> -Greg
>>>>>>
>>>>>> GS> @Kyle
>>>>>> GS> Yes, that's it exactly. Thanks!
>>>>>>
>>>>>> GS> ---
>>>>>> GS> I did find a post about putting it in a pool block after posting
>>>>>> GS> my query, just about the time you posted your response - but
>>>>>> GS> hadn't had a chance to test it - so that's great. It now works.
>>>>>>
>>>>>> GS> BUT...
>>>>>> GS> When I run it, I see odd stuff [running dhcpd in -d -f
>>>>>> GS> debug/foreground mode]...
>>>>>>
>>>>>> GS> ---
>>>>>> GS> I see a pair of DHCPDISCOVERs
>>>>>>
>>>>>> GS> One from ETH0 and the other from the IP/DHCP helper on the L3 switch.
>>>>>> GS> i.e.
>>>>>> GS> DHCPDISCOVER from so:me:ma:ca:dd:rs on eth0
>>>>>> GS> DHCPDISCOVER from so:me:ma:ca:dd:rs on 10.1.2.1
>>>>>> GS> [This second one is the layer 3 switch, which is forwarding the DHCP session to the DHCP server]
>>>>>>
>>>>>> GS> Then dhcpd makes two offers - one on 10.1.1.X and one on 10.1.2.X
>>>>>> GS> Since the station isn't on the 10.1.1.X VLAN and *is* on the 10.1.2.X
>>>>>> GS> VLAN it "accepts" the 10.1.2.X address and it "works."
>>>>>>
>>>>>> GS> But I'm sure it's not supposed to be this way.
>>>>>> GS> [And I'm pretty sure I'm doing something obvious and perhaps
>>>>>> GS> stupid, but I just don't know where to look or what to try.]
>>>>>>
>>>>>> GS> How do I go about making it only see the forwarded DHCP session
>>>>>> GS> and not the one on eth0 [or some other option I'm simply not aware of...]
>>>>>>
>>>>>> GS> ---
>>>>>>
>>>>>> GS> -Greg
>>>>>>
>>>>>>
>>>>>> GS> Are you looking for something like this?
>>>>>>
>>>>>> GS> subnet 172.21.27.0 netmask 255.255.255.0 {
>>>>>> GS>   option subnet-mask 255.255.255.0;
>>>>>> GS>   option broadcast-address 172.21.27.255;
>>>>>> GS>   option routers 172.21.27.1;
>>>>>> GS>   ddns-domainname "example.com.";
>>>>>> GS>   option domain-search "example.com";
>>>>>> GS>   pool {
>>>>>> GS>     failover peer "dhcp-failover";
>>>>>> GS>     range 172.21.27.5 172.21.27.254;
>>>>>> GS>   }
>>>>>> GS> }
>>>>>>
>>>>>>
>>>>>> GS> On Thu, May 9, 2013 at 8:08 PM, Gregory Sloop <gregs at sloop.net> wrote:
>>>>>> GS> So, I've done a fair bit of reading and searching - and this general
>>>>>> GS> template is what I thought would work, but it doesn't.
>>>>>>
>>>>>> GS> Let me post the dhcp.conf file and then discuss what's wrong and ask
>>>>>> GS> for pointers.
>>>>>>
>>>>>> GS> ---
>>>>>> GS> authoritative;
>>>>>> GS> #ddns-update-style interim;
>>>>>> GS> ignore client-updates;
>>>>>> GS> #option host-name = config-option server.ddns-hostname;
>>>>>>
>>>>>> GS> #include "/etc/rndc.key";
>>>>>>
>>>>>> GS> option domain-name              "somedom.local";
>>>>>> GS> option domain-name-servers      10.1.1.190,10.1.2.1,10.1.1.17;
>>>>>> GS> option time-offset              -18000; # Pacific Standard Time
>>>>>> GS> option ntp-servers              10.1.1.14
>>>>>> GS> one-lease-per-client off;
>>>>>>
>>>>>> GS> #4 hour lease
>>>>>> GS> default-lease-time 14400;
>>>>>> GS> max-lease-time 14400;
>>>>>> GS> option ip-forwarding off;
>>>>>>
>>>>>> GS> failover peer "dhcp-failover" {
>>>>>> GS>   primary; # declare this to be the primary server
>>>>>> GS>   # Address if THIS dhcp server, or what address to listen ON
>>>>>> GS>   address 10.1.1.1;
>>>>>> GS>   port 647;
>>>>>> GS>   # Address of the DHCP fail-over peer.
>>>>>> GS>   peer address 10.1.1.2;
>>>>>> GS>   peer port 647;
>>>>>> GS>   max-response-delay 60;
>>>>>> GS>   max-unacked-updates 10;
>>>>>> GS>   #load balance max seconds 3;
>>>>>> GS>   mclt 3600;
>>>>>> GS>   split 0;
>>>>>> GS> }
>>>>>>
>>>>>> GS>     subnet 10.1.1.0 netmask 255.255.255.0 {
>>>>>> GS>         range 10.1.1.1 10.1.1.254;
>>>>>> GS>         option routers                  10.1.1.1;
>>>>>> GS>         option subnet-mask              255.255.255.0;
>>>>>> GS>         failover peer "dhcp-failover";
>>>>>> GS>     }
>>>>>>
>>>>>> GS>     subnet 10.1.2.0 netmask 255.255.255.0 {
>>>>>> GS>         range 10.1.2.1 10.1.2.254;
>>>>>> GS>         option routers                  10.1.2.1;
>>>>>> GS>         option subnet-mask              255.255.255.0;
>>>>>> GS>         failover peer "dhcp-failover";
>>>>>> GS>     }
>>>>>>
>>>>>> GS>     subnet 10.1.3.0 netmask 255.255.255.0 {
>>>>>> GS>         range 10.1.3.1 10.1.3.254;
>>>>>> GS>         option routers                  10.1.3.1;
>>>>>> GS>         option subnet-mask              255.255.255.0;
>>>>>> GS>         failover peer "dhcp-failover";
>>>>>> GS>     }
>>>>>>
>>>>>> GS> ---
>>>>>> GS> Now, I've disabled DDNS updates for simplicity sake. Once I get the
>>>>>> GS> multi-subnet/VLAN setup and failover working I'll add that back.
>>>>>>
>>>>>> GS> Perhaps that impacts things somehow, so if you'll keep that in mind,
>>>>>> GS> I'd appreciate it.
>>>>>>
>>>>>> GS> So, when I try this config I get an error saying that a failover needs
>>>>>> GS> to be inside a shared network block.
>>>>>>
>>>>>> GS> But if I do that, I've been told [read] that the DHCP server won't
>>>>>> GS> know how to assign the different subnets. [This would apply to a
>>>>>> GS> network where I wanted to share all the 10.1.1.1-10.1.3.254 as a
>>>>>> GS> single pool/block and assign any station any IP in the whole block.]
>>>>>>
>>>>>> GS> But I have a L3 switch and I want these assigned to each VLAN.
>>>>>>
>>>>>> GS> ---
>>>>>> GS> So, I setup the conf file without a shared-network and it works fine
>>>>>> GS> with the L3 DHCP helper/proxy. Clients on VLAN1 get 10.1.1.0 blocks
>>>>>> GS> and VLAN2 get 10.1.2.0 blocks etc.
>>>>>>
>>>>>> GS> So, with the "failover" block commented out, it works charmingly! Very
>>>>>> GS> cool!
>>>>>>
>>>>>> GS> ---
>>>>>> GS> But I *also* want to use failover.
>>>>>>
>>>>>> GS> And when I put in a fail-over outside a shared-network, it complains
>>>>>> GS> that it must be inside a shared network.
>>>>>>
>>>>>> GS> So, how to I use fail-over AND maintain the subnet grouping above?
>>>>>>
>>>>>> GS> ---
>>>>>> GS> I'll keep reading, but I've tinkered with this quite a bit and for the
>>>>>> GS> life of me, I can't see how one would go about it.
>>>>>>
>>>>>> GS> -Greg
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gregory Sloop, Principal: Sloop Network & Computer Consulting
>>>>>> Voice: 503.251.0452 x82
>>>>>> EMail: gregs at sloop.net
>>>>>> http://www.sloop.net
>>>>>> ---
>>>>>>
>>>>>> _______________________________________________
>>>>>> dhcp-users mailing list
>>>>>> dhcp-users at lists.isc.org
>>>>>> https://lists.isc.org/mailman/listinfo/dhcp-users
>>>> SC> _______________________________________________
>>>> SC> dhcp-users mailing list
>>>> SC> dhcp-users at lists.isc.org
>>>> SC> https://lists.isc.org/mailman/listinfo/dhcp-users
>>>>
>>>> --
>>>> Gregory Sloop, Principal: Sloop Network & Computer Consulting
>>>> Voice: 503.251.0452 x82
>>>> EMail: gregs at sloop.net
>>>> http://www.sloop.net
>>>> ---
>>>>
>>>> _______________________________________________
>>>> dhcp-users mailing list
>>>> dhcp-users at lists.isc.org
>>>> https://lists.isc.org/mailman/listinfo/dhcp-users
>> SC> _______________________________________________
>> SC> dhcp-users mailing list
>> SC> dhcp-users at lists.isc.org
>> SC> https://lists.isc.org/mailman/listinfo/dhcp-users
>>
>> --
>> Gregory Sloop, Principal: Sloop Network & Computer Consulting
>> Voice: 503.251.0452 x82
>> EMail: gregs at sloop.net
>> http://www.sloop.net
>> ---
>>
>> _______________________________________________
>> dhcp-users mailing list
>> dhcp-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/dhcp-users
SC> _______________________________________________
SC> dhcp-users mailing list
SC> dhcp-users at lists.isc.org
SC> https://lists.isc.org/mailman/listinfo/dhcp-users

-- 
Gregory Sloop, Principal: Sloop Network & Computer Consulting
Voice: 503.251.0452 x82
EMail: gregs at sloop.net
http://www.sloop.net
---