tuning for maximum dhcp performance

Glenn Satchell Glenn.Satchell at uniq.com.au
Sun Apr 27 14:32:46 UTC 2008


How long is your lease time? If you make it longer than 24 hours, then
those clients that can't get a response from the dhcp server when
everyone logs on at 9am (say) will still have a valid lease and could
just keep on working? This is a bit simplistic.

One option I have used where there were remote sites was to use a
'spoke' design. One central dhcp server and a peer server in each
remote site. Each remote server only handles the local IP ranges for
that site and peers with the central server. The central server has
different failover peers for each of the remote sites. This also has
the benefit that dhcp is available in remote sites if the neetwork is
isolated fromt the main office for any reason.

The other pain you couldbe feelinghere are routers that drop the UDP
dhcp packets when they get overloaded. Is there some configuration that
can be done on the routers to prefer to not drop dhcp packets?

I feel your pain, dhcp is meant to be one of those services that is
"just there" all the time, but engineering it to do so is a difficult
task. Sunfire 280R is pretty old now, is management prepared to pay for
new hardware to provide the always there dhcp service?

regards,
-glenn

>From: "Gordon A. Lang" <glang at goalex.com>
>Date: Sat, 26 Apr 2008 10:29:56 -0400
>
>After considerable engineering, I have decided to do the following to
>improve the robustness of our systems:
>
>We have a pair of SunFire 280R's doing both DHCP and DNS....using dhcp
>failover protocol.
>
>1. send all logs over the network (through a dedicated NIC) to a remote
>   syslog server (partially to eliminate disk-write competition between
>   named/syslog and dhcpd, partially to consolidate the multiple logs,
>   and partially to eliminate log processing off of the dns/dhcp boxes).
>
>2. introduce a third server to act as a hidden master and take on all
>   dynamic dns traffic (and associated log messages, also sent to the
>   remote syslog server).
>
>3. upgrading to dhcp 3.1.1 as soon as it is released.  This is mainly
>   to take advantage of the improvements in the failover protocol
>   since all of our past problems were related to using failover
>   protocol under heavy load conditions.
>
>
>And I am also looking for a battery-backed ramdisk (haven't found one
>yet) to store nothing but the dhcp leases file.
>
>(comments?)
>
>
>Our environment:
>
>Consider a surge of dhcp requests in a medium sized corporate HQ where
>95% of all requests are handled okay, but 5% of the users need to
>manually do an "ipconfig" or reboot to try again.  That means 200 users
>are calling the helpdesk all at the same time -- exceeding the capacity
>of the helpdesk.
>
>This is a career threatening event for someone in the I.T. staff -- who
>ever is stuck with the hot potato.
>
>In this environment, the required benchmark is that 100% of all dhcp
>requests are always processed, and no client ever times out.  Or else.!
>In our environment, upper management expects that all systems will
>continue to function in their full capacity at all times, or else a lot
>of middle management is subjected to intense scrutiny.  And you know
>which direction it rolls....
>
>Every day, 85% or more of the staff boots their computers up within a
>10 minute window.  The system supports roughly 6000 dhcp clients
>(including the remote sites) without problem most of the time.  But the
>4 times in 3 years that the systems became over run, causing dozens
>or hundreds of helpdesk calls, is terribly unacceptable.
>
>So, some engineering was mandatory.  It's not like a bunch of cable
>users whose expectations are lower and whose only recourse might be
>to cancel service - but they rarely ever do from what I've seen.
>Not a big deal in comparison.  And with cable users, it is a not an
>every day event that a bulk of them all are seeking addresses at the
>same time.
>
>Every environment is different.
>
>--
>Gordon A. Lang
>
>
>
>----- Original Message ----- 
>From: "Frank Bulk - iNAME" <frnkblk at iname.com>
>To: <dhcp-users at isc.org>
>Sent: Friday, April 25, 2008 9:49 PM
>Subject: RE: tuning for maximum dhcp performance
>
>
>>I serve up 10,000 leases ranging from 3 to 14 days.  I haven't spent a
>> second optimizing it.  It just works and has worked no matter what the
>> client outage conditions have been.
>>
>> Unless you're serving up a campus where there is a real possibility that
>> thousands of like clients (i.e. VoIP phone) may power up and come back
>> online, there's no need to spend time over-engineering.  If there were 20k
>> computers on a campus that lost power and power came back on 
>> simultaneously,
>> many of the PCs would stay off (configured in the BIOS), and those
>> configured to power on after power failure would reach the DHCP request
>> phase at different spots.  At 80/second, it would take just a bit over 4
>> minutes to serve them all (if the requests were linear).  Would it really
>> matter if in the worst of all cases it took 10 minutes for every client to
>> be back online?
>>
>> It's those networks that serve hundreds of thousands of clients that need 
>> to
>> spend time engineering a solution that serves up IPs in a timely fashion.
>>
>> Frank
>>
>> -----Original Message-----
>> From: dhcp-users-bounce at isc.org [mailto:dhcp-users-bounce at isc.org] On 
>> Behalf
>> Of Dan
>> Sent: Friday, April 25, 2008 1:01 PM
>> To: dhcp-users at isc.org
>> Subject: tuning for maximum dhcp performance
>>
>>
>> I'm currently constructing a replacement for an old Cisco Network
>> Registrar setup serving about 20,000 nodes (10,000 with 24hr leases,
>> 10,000 with 7day leases).
>>
>> I'm running Linux 2.6.22 using ISC DHCPd 3.0.5 with dhcp-3.0.5-ldap-patch
>> and dhcp-3.0.5-next-file.patch.  I hope to use failover between the 2
>> servers, but haven't worked on that yet.
>>
>> As stated time and again, the software will not be the bottleneck. Using
>> dhcpref's discovery benchmark, I'm seeing about 80 clients/second right
>> now with my new hardware (ping-check off).  When I disable the per-lease
>> fsync or move the dhcpd.leases file to ramdisk, it jumps to well over 400
>> clients/second limited by CPU.
>>
>> My hardware is 2 servers with the following spec:
>>   Dell PowerEdge 2970
>>   Dual-core 2Ghz 64bit AMD
>>   4G RAM
>>   10k RAID1 System Drives
>>   15k RAID10 Storage Drives (just for dhcpd.leases file)
>>
>>
>> Do anyone have any pointers on running a system like this and achieving
>> maximum dhcp performance?
>>
>> Some factors that come to mind are:
>>   -Other patches I should/could be using?
>>   -Raid stripe element size, read-ahead, and write-back?
>>      (currently 64Kb, no, and yes)
>>   -Filesystem choice for dhcpd.leases file?
>>      (ext3, reiserfs, xfs, jfs -- currently resierfs)
>>   -Filesystem parameters to tune?
>>   -Kernel parameters to tune?
>>
>>
>> Having a better understanding about how DHCPd works with the dhcpd.leases
>> file might give me some of the answers to these questions also.
>>
>> Any information or shared experiences would be greatly appreciated.
>>
>> Thanks,
>>
>> Dan
>>
>>
>> 
>
>
>



More information about the dhcp-users mailing list