multi interfaces(vlans) configuration

Thu Oct 8 19:51:58 UTC 2020

ahiya <ahiya at younity.io> wrote:

> I'm new to isc/kea.I have multi-sites with around 2000-5000 devices per
> site.the real issue is that they are spread across 500 different VLANs.I
> wanted to know is isc/kea is the right solution for that task.and what is
> the right way to implement it?

There isn't really a right or wrong way to do it - just different trade-offs.

> Raspberry PI4 with 8G mem will be enough?

I've not kept up, but does the Pi4 now have a proper ethernet and means of attaching a disk ? I know earlier models had a USB-ethernet bridge which is far from ideal.

> should I use .conf files or should I go for the backend server?

As another has said, you would be well advised to automate with that size of setup. Manually configuring that size of config (without errors !) will be a nightmare - so put the config in some sort of config management and script generation of the config files for the servers.

Ahiya Zadok <ahiya at younity.io> wrote:

> Regarding the number of sites- I plan to have a server per site.
> Each site will have around 500 subnets with around 10-15 devices per
> subnet.
> Does the number of IPs per subnet affect memory even when they are not
> assigned?

Yes, memory requirements scale (I believe) roughly linearly with number of IP addresses available in your config - even those that have never been assigned. So the size of each pool in each of your 500 subnets will make a massive difference to memory requirements.
Note that once an IP has every been leased to a client, it will remain in the leases file "forever". The server will never delete it unless you remove the IP address from the config (remove/change a range). Eventually, when all addresses have been used once, the server will start to re-used old leases in a least recently used manner.

Ahiya Zadok <ahiya at younity.io> wrote:

> Do you think that the numbers of subnets and the number of interfaces
> (vlans) that DHCP is listening to have much effect on resource
> utilization?

No, number of subnets makes a fairly small impact on memory requirements - it's number of IP address that makes the big difference. So 500 subnets with (say) 50 IPs each (250,000 in all) will take a lot less memory than the same 500 subnets with (say) 250 IPs each (1,250,000 in all).

Ahiya Zadok <ahiya at younity.io> wrote:

> The network gear in my sites is the bottleneck
> It supports up to 256 DHCP servers/relay agents.

Are you sure that's a limitation ? In principle you only need one or two (if you use failover) per network device - remember you only have one or two servers to forward requests to. I would be surprised if a device didn't support at least one ip-helper per interface, or a small number globally.

> Do you think that raspberry pi could handle 500 VLAN interfaces?

I've never gone about 30-something interfaces in Linux. In principle I would have thought it could handle it - we hear of people running hundreds of virtual machines on a host, and each of those gets one (or more) virtual network interfaces. But it's probably easier to just add one (or two) ip-helper addresses to the routers.

A few more thoughts in no particular order ...

What are you planning to do regarding fault tolerance ? With that many devices, I imagine loss of the DHCP service would quickly start to cause problems - and the corresponding enquiries from customers. You could run two servers per site - either in failover, or with non-overlapping ranges but the same subnets. The latter would mean clients changing address is the server they got their address from fails, and the DNS would not get updated - but they would continue to work.

One idea from a long time ago was to run small servers out in the network, and a central massive server. Each small server would have a failover relationship with the central server - so the central server would hold a copy of all the leases (where it's relatively easy to provide fault tolerance (RAID, UPS, etc) and backup.
At the edge, the servers would run diskless, storing the lease database on ramdisk - and after a restart would load the leases database from the central server via failover. Whether this would work with that many clients per site would be interesting to know.
You don't have to run diskless at the edge - that was mainly a suggestion to avoid all the issues that come with having storage dotted around remote sites where it's hard to manage and involves an engineer visit if anything goes wrong.
During normal operations, clients will use the local server because it will normally respond first due to being closer (in terms of network links and latency). If the local server is down, they will be able to use the central server.

When sizing the system, you need to consider other than just the steady state. Even a modest server can manage many clients if the leases are long - but what happens if there's a mass event such as a power cut that causes many clients to re-connect in a short space of time ?
In such an event, your server will experience a significantly higher load - which will be higher if all the devices auto-startup when power is restored, but lower if it's (e.g.) desktop systems that need the user to power them on. It's not as simple as "a queue will form".
If the server can't cope, clients will send a request, and eventually time out waiting for a reply - they'll then send another request, and another, and ... with man of the requests getting dropped. That in itself might not be too bad - clients would be happy when they hit the jackpot and their packet in one of the ones that didn't get dropped. But it's not that simple - many devices (and every device without both a real time clock and persistent storage) will first send a discover, then after it gets an offer will send a request (the DORA cycle, Discover-Offer-request-Ack). If the client doesn't get an Ack to it's request, it will eventually go back to sending Discovers.
In extreme, it could be a very long time before clients get addresses and the load dies down. And disk usage (for the leases file) will also temporarily increase - each transaction results in a new record being appended to the end of the file.

Simon