Watching performance on a DHCP Server

Mon Feb 11 18:55:29 UTC 2008

The known clients are mainly wireless or cable modems. I agree that 
management of each CPE device by MAC can be tedious, though generally we 
use a web based front end which makes the process much more simple and 
scalable as far as management is concerned.

If I understood the 3.0.x code, a write (fflush) as well as an fsync 
occurred anytime the lease was written (offer, ack) and the entire 
leases database is re-written on restart.

I hadn't even considered the fluctuation as people turn PC's on/off 
based on the work day. Though our leases have generally remained stable 
in the past with a 1 day lease timer. Same goes with PPP sessions, 
though those are typically managed in modem.

-Blake

-------- Original Message  --------
Subject: Re: Watching performance on a DHCP Server
From: Barr Hibbs <rbhibbs at pacbell.net>
To: dhcp-users at isc.org
Date: Monday, February 11, 2008 11:07:43 AM
> in the case I reported, the clients were entirely within a single 
> enterprise, and while it was certainly possible for NICs to be 
> replaced from time to time, the client population was remarkably 
> stable.  For us, a 7 day or even 31 day lease would have been 
> appropriate.  Our users were instructed to shut down the clients every 
> day at close of business, then to restart them the following business 
> day, so we effectively had 100% of the clients doing an INIT-REBOOT at 
> least once each business day, with well over 90% rebooting at 8:00 AM 
> -- talk about a spike in network traffic!  Over time, with changing 
> workload requirements, expansion of working shifts, and the 
> realization that considerable time could be saved at the beginning of 
> each shift (not just mornings any longer) by utilizing sleep mode for 
> power saving, the in-rush of init-reboot requests dropped significantly.
>  
> There is one last point I forgot to mention in my previous 
> response...  our modification of the ISC server updated the leases 
> file for each and every message processed that modified the lease.  
> Our server was based on version 2, so there were no DNS updates as 
> part of the lease assignment and renewal process.
>  
> Basically, the more volatile your client population, the shorter the 
> lease time should be, though that is not an absolute.  Consider 
> operational hours, predictions of network traffic, number of 
> routers/relay agents and their placement, and typical use patterns of 
> the clients before deciding.
>  
> I've never been a fan of permitting only known MAC addresses, as the 
> daily maintenance of the server configuration in very large 
> environments is a major pain, and what of NIC replacement without 
> prior notice?  Just a few of my biases based on experience with 
> programmable NICs, frequent moves, adds, and changes, and cheaply made 
> NICs with high failure rates.
>  
> --Barr Hibbs
>  
>  
>  -----Original Message-----
> *From:* dhcp-users-bounce at isc.org 
> [mailto:dhcp-users-bounce at isc.org]*On Behalf Of *Blake Hudson
> *Sent:* Monday, February 11, 2008 07:51
> *To:* dhcp-users at isc.org
> *Subject:* Re: Watching performance on a DHCP Server
>
>     Thanks Barr, it is always interesting to hear relative practical
>     experiences. This is exactly the kind of problem I would like to
>     prepare/plan for. I've read that Microsoft defaults to an 8 day
>     lease time. ISC uses a default lease time of 10 minutes, with a
>     max of 2 hours in their sample config included with 4.1.x.
>
>     We have successfully used 1 day leases in the past. Though I know
>     some larger ISPs use 5 day, 7 day or even longer lease times.
>
>     I'm assuming that the main advantage to a short lease time is that
>     hosts that join and leave a network give their leases up more
>     rapidly (keeping IP pool usage as low as possible). The main
>     advantage to longer lease times being load on the DHCP server. If
>     I have a relatively stable network (only known macs are allowed)
>     then it seems like a longer lease time (say 7-14 days) is more
>     appropriate. And on a relatively stable cable or DSL network
>     anything between 5-7 days seems acceptable? Volatile networks
>     (wifi hotspots?) would probably benefit from a 1 hour or shorter
>     lease time.
>
>     Does it sound like I am in the right ballpark with these figures?
>
>     -Blake
>
>
>     -------- Original Message  --------
>     Subject: Re: Watching performance on a DHCP Server
>     From: Barr Hibbs <rbhibbs at pacbell.net>
>     To: dhcp-users at isc.org
>     Date: Sunday, February 10, 2008 4:35:37 PM
>>     this experience is with a derivative of version 2 of the
>>     server, but as the basic functionality has not changed
>>     significantly for IPv4, it may be instructive....
>>
>>     at the time, our environment had about 12,000 clients split
>>     roughly 55/45 between two servers...  each server was
>>     connected by two links to each of approximately 120 remote
>>     subnets, each link diversely routed to minimize disruption
>>     due to network problems, but also delivering 2 copies of
>>     every client message to the servers
>>
>>     we suffered a massive regional power failure that lasted
>>     2-1/2 days before complete restoration...  our clients
>>     received 7-day leases, largely grouped with their renewal
>>     times between 8 am and 6 pm, so in a 2-1/2 day outage, we
>>     could expect renewal requests to come from about half of our
>>     clients, and certainly init-reboot requests to come from
>>     all...  so, that is roughly 18,000 requests to be serviced
>>     as power is restored....
>>
>>     of course, the power restoral didn't occur all at once, but
>>     was somewhat randomly distributed over a period of roughly
>>     32 hours
>>
>>     entirely by coincidence, we had instrumented the server to
>>     capture detailed message arrival rates and response times,
>>     expecting a normal, boring weekend...  but then the power
>>     failed, and...  we got lots more data than we expected!
>>
>>     the real-time clock on our computers was capable of only 1
>>     millisecond resolution, so I must extrapolate....  our
>>     servers survived a nearly CONTINUOUS load of more than 1,000
>>     requests per second for 32 hours...
>>
>>     of course, your mileage may vary, but by choosing an
>>     appropriate lease lifetime, you will probably see similar or
>>     better performance.
>>
>>     --Barr Hibbs
>>
>>
>>       
>>>     -----Original Message-----
>>>     From: dhcp-users-bounce at isc.org
>>>     [mailto:dhcp-users-bounce at isc.org]On
>>>     Behalf Of David W. Hankins
>>>     Sent: Friday, February 08, 2008 08:55
>>>     To: dhcp-users at isc.org
>>>     Subject: Re: Watching performance on a DHCP Server
>>>
>>>
>>>     On Thu, Feb 07, 2008 at 06:07:51PM -0600, Blake
>>>     Hudson wrote:
>>>         
>>>>     By default in my distribution the leases file
>>>>           
>>>     is stored in
>>>         
>>>>     /var/lib/dhcpd/dhcpd.leases. This happens to be
>>>>           
>>>     on a RAID1 array with
>>>         
>>>>     15k scsi disks and iostat shows the array as
>>>>           
>>>     being maxed out once it
>>>         
>>>>     reaches ~ 300 I/O's per second. DHCP logging is
>>>>           
>>>     done asynchronously to
>>>         
>>>>     the same array (which normally experiences ~ 50
>>>>           
>>>     I/O ops). With CPU and
>>>         
>>>>     memory barely breaking a sweat, this leads me
>>>>           
>>>     to believe that the
>>>         
>>>>     limitation is with the disks (lots of tiny writes).
>>>>
>>>>     I could move the leases file to a different
>>>>           
>>>     array, or to tmpfs, but
>>>         
>>>>     before I do I just want to know if these
>>>>           
>>>     results are typical and that I
>>>         
>>>>     have interpreted the test data correctly and
>>>>           
>>>     made the correct
>>>         
>>>>     determination as to the bottleneck.
>>>>           
>>>     those results are typical for that kind of
>>>     hardware, and you have
>>>     interpreted the test data correctly: fsync() is
>>>     the biggest
>>>     bottleneck.
>>>
>>>     in 4.1.0a1, you will find a feature, however,
>>>     which was provided to
>>>     us in a patch by Christof Chen.  it permits the
>>>     server to queue
>>>     multiple ACKs behind a single fsync(); default 28
>>>     (576 byte DHCP
>>>     packets filling default socket buffer send
>>>     sizes).  the burst of acks
>>>     are sent presently if the sockets go dry, and
>>>     shortly will be backed
>>>     up with a sub-second timeout.
>>>
>>>     it has some bugs we're working on, particularly
>>>     with failover, but
>>>     we'll address those in alpha.
>>>
>>>     you may find that it provides some form of
>>>     multiplicative benefit to
>>>     your performance stats, since fsync() is the
>>>     bottleneck, and now there
>>>     are 28 acks per fsync max.
>>>
>>>     so if you are only pushing 50 requests/s
>>>     currently, you may live
>>>     comfortably in a 250 request/s buffer for some
>>>     months until the
>>>     4.1.x code is stable?
>>>
>>>         
>>>>     Also, I would appreciate any anecdotal evidence
>>>>           
>>>     with regards to how many
>>>         
>>>>     requests are typical in a large network under
>>>>           
>>>     normal (or abnormal)
>>>         
>>>>     conditions. If 10,000 users all of a sudden
>>>>           
>>>     came online, how many
>>>         
>>>>     requests would they really generate per second?
>>>>           
>>>     there have been a few folks who suffered mass
>>>     power outages, i don't
>>>     know what search query to use, but you can find
>>>     them on the old
>>>     dhcp-server mailing list.  they did not report
>>>     problems, rather the
>>>     surprise at the lack of problem.
>>>
>>>     --
>>>     Ash bugud-gul durbatuluk agh burzum-ishi krimpatul.
>>>     Why settle for the lesser evil?
>>>         
>>     https://secure.isc.org/store/t-shirt/
>>     --
>>     David W. Hankins	"If you don't do it right the first time,
>>     Software Engineer		     you'll just have to do it again."
>>     Internet Systems Consortium, Inc.		-- Jack T. Hankins
>>
>>
>>       
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/dhcp-users/attachments/20080211/fb6daf14/attachment.html>