tuning for maximum dhcp performance

Sat Apr 26 14:29:56 UTC 2008

After considerable engineering, I have decided to do the following to
improve the robustness of our systems:

We have a pair of SunFire 280R's doing both DHCP and DNS....using dhcp
failover protocol.

1. send all logs over the network (through a dedicated NIC) to a remote
   syslog server (partially to eliminate disk-write competition between
   named/syslog and dhcpd, partially to consolidate the multiple logs,
   and partially to eliminate log processing off of the dns/dhcp boxes).

2. introduce a third server to act as a hidden master and take on all
   dynamic dns traffic (and associated log messages, also sent to the
   remote syslog server).

3. upgrading to dhcp 3.1.1 as soon as it is released.  This is mainly
   to take advantage of the improvements in the failover protocol
   since all of our past problems were related to using failover
   protocol under heavy load conditions.

And I am also looking for a battery-backed ramdisk (haven't found one
yet) to store nothing but the dhcp leases file.

(comments?)

Our environment:

Consider a surge of dhcp requests in a medium sized corporate HQ where
95% of all requests are handled okay, but 5% of the users need to
manually do an "ipconfig" or reboot to try again.  That means 200 users
are calling the helpdesk all at the same time -- exceeding the capacity
of the helpdesk.

This is a career threatening event for someone in the I.T. staff -- who
ever is stuck with the hot potato.

In this environment, the required benchmark is that 100% of all dhcp
requests are always processed, and no client ever times out.  Or else.!
In our environment, upper management expects that all systems will
continue to function in their full capacity at all times, or else a lot
of middle management is subjected to intense scrutiny.  And you know
which direction it rolls....

Every day, 85% or more of the staff boots their computers up within a
10 minute window.  The system supports roughly 6000 dhcp clients
(including the remote sites) without problem most of the time.  But the
4 times in 3 years that the systems became over run, causing dozens
or hundreds of helpdesk calls, is terribly unacceptable.

So, some engineering was mandatory.  It's not like a bunch of cable
users whose expectations are lower and whose only recourse might be
to cancel service - but they rarely ever do from what I've seen.
Not a big deal in comparison.  And with cable users, it is a not an
every day event that a bulk of them all are seeking addresses at the
same time.

Every environment is different.

--
Gordon A. Lang

----- Original Message ----- 
From: "Frank Bulk - iNAME" <frnkblk at iname.com>
To: <dhcp-users at isc.org>
Sent: Friday, April 25, 2008 9:49 PM
Subject: RE: tuning for maximum dhcp performance

>I serve up 10,000 leases ranging from 3 to 14 days.  I haven't spent a
> second optimizing it.  It just works and has worked no matter what the
> client outage conditions have been.
>
> Unless you're serving up a campus where there is a real possibility that
> thousands of like clients (i.e. VoIP phone) may power up and come back
> online, there's no need to spend time over-engineering.  If there were 20k
> computers on a campus that lost power and power came back on 
> simultaneously,
> many of the PCs would stay off (configured in the BIOS), and those
> configured to power on after power failure would reach the DHCP request
> phase at different spots.  At 80/second, it would take just a bit over 4
> minutes to serve them all (if the requests were linear).  Would it really
> matter if in the worst of all cases it took 10 minutes for every client to
> be back online?
>
> It's those networks that serve hundreds of thousands of clients that need 
> to
> spend time engineering a solution that serves up IPs in a timely fashion.
>
> Frank
>
> -----Original Message-----
> From: dhcp-users-bounce at isc.org [mailto:dhcp-users-bounce at isc.org] On 
> Behalf
> Of Dan
> Sent: Friday, April 25, 2008 1:01 PM
> To: dhcp-users at isc.org
> Subject: tuning for maximum dhcp performance
>
>
> I'm currently constructing a replacement for an old Cisco Network
> Registrar setup serving about 20,000 nodes (10,000 with 24hr leases,
> 10,000 with 7day leases).
>
> I'm running Linux 2.6.22 using ISC DHCPd 3.0.5 with dhcp-3.0.5-ldap-patch
> and dhcp-3.0.5-next-file.patch.  I hope to use failover between the 2
> servers, but haven't worked on that yet.
>
> As stated time and again, the software will not be the bottleneck. Using
> dhcpref's discovery benchmark, I'm seeing about 80 clients/second right
> now with my new hardware (ping-check off).  When I disable the per-lease
> fsync or move the dhcpd.leases file to ramdisk, it jumps to well over 400
> clients/second limited by CPU.
>
> My hardware is 2 servers with the following spec:
>   Dell PowerEdge 2970
>   Dual-core 2Ghz 64bit AMD
>   4G RAM
>   10k RAID1 System Drives
>   15k RAID10 Storage Drives (just for dhcpd.leases file)
>
>
> Do anyone have any pointers on running a system like this and achieving
> maximum dhcp performance?
>
> Some factors that come to mind are:
>   -Other patches I should/could be using?
>   -Raid stripe element size, read-ahead, and write-back?
>      (currently 64Kb, no, and yes)
>   -Filesystem choice for dhcpd.leases file?
>      (ext3, reiserfs, xfs, jfs -- currently resierfs)
>   -Filesystem parameters to tune?
>   -Kernel parameters to tune?
>
>
> Having a better understanding about how DHCPd works with the dhcpd.leases
> file might give me some of the answers to these questions also.
>
> Any information or shared experiences would be greatly appreciated.
>
> Thanks,
>
> Dan
>
>
>