silently neglected requests fixed by restarting dhcpd
dhcp at thehobsons.co.uk
Thu May 18 09:24:21 UTC 2006
Gordon A. Lang wrote:
>After making the following three changes, my dhcp server has worked
>flawlessless with the full complement of 45 VLAN's for a full week now:
> (1) turn off dynamic dns updates
> (2) reduce ping-timeout from 3 sec. to 1 sec
> (3) shutdown the secondary server, and eliminate the failover protocol
>from the architecture
>But I need to turn dynamic dns updates back on, and I will eventually need
>to reactivate the failover protocol. Before I can do so, I need to gain
>some understanding why there were so many failures under load.
>The best clue I have so far is that the client experiencing repeated
>failures was receiving offers, but the client continued to send discover's
>as if it didn't notice the offers.
>Is three seconds too long of a ping-timeout for Microsoft clients?
Very probably. It means that each request will be ignored for 3
seconds before answering (or something like that). Some clients just
have no patience !
>One other clue I have is that virtually all of the dynamic dns update
>attempts were failing because the DNS server isn't completely setup yet.
>Could the huge volume of ddns failures contribute to the server causing a
>large number of failures?
Yes, AIUI, the dhcp server is single threaded, and while it's
processing one request it can't handle the next. The delay caused by
the dns update failures could be significant under heavy load.
It's conceivable that the combination of a heavy client load, the dns
failures, and the long ping timeout could result in long delays
responding to clients. The server eventually gets around to each one
but by the time it does so the client has stopped listening for the
reply. The fact that the clients try several times would increase the
load and make the situation worse.
Under normal conditions, dns would be updated only on issue of a new
lease, or on a lease expiring. Renewals in between have no dns
activity, and so with a working ddns setup there is minimal extra
load once clients have their dns entries. If your dns is broken, then
you have this near constant delay in processing all traffic.
I would suggest that you need to fix the dns problem and try turning
on dns updates again. Since nearly all your clients already have
leases then it should result in less problems now than it did when
you were trying to issue new leases as well as do dns updates.
Perhaps do this one subnet at a time to avoid a huge volume of dns
updates the morning after you do it !
In future, you can mimimise problems by setting the leases
sufficiently long that they do not normally expire - perhaps a few
weeks. If you have very short leases (say less than 12 hours), then
every morning you will have a flurry of dhcp & dns updates as client
renew their leases which expired overnight.
>Is it possible that the client experiencing failures are rejecting or
>ignoring offers because they are malformed? If so, then why would the
>failures be randomly scattered across all clients while the majority of
>clients actually succeed.
My guess, as suggested above, is that at peak times, a backlog builds
up of dhcp requests, and the clients simply stop waiting for a reply
before it arrives. They them make matters worse by sending further
requests which also time out.
>To clarify this failure behavior I should explain that a "failure" to me
>means that the ipconfig /renew or a fresh system boot results in an all
>zeros address or a 169.* address. So some of the clients that took over 90
>seconds to get an address are still considered successes in my book.
The fact that some clients did get an address eventually would tend
to support the theory - eventually a response was received within the
clients timeout window. I suspect that had you left things long
enough, AND used long enough leases, then the problem would have
disappeared eventually or at least abated. You would still have had
problems since the failing dns updates would be attempted at every
More information about the dhcp-users