silently neglected requests fixed by restarting dhcpd

Thu May 18 02:59:28 UTC 2006

After making the following three changes, my dhcp server has worked 
flawlessless with the full complement of 45 VLAN's for a full week now:
   (1) turn off dynamic dns updates
   (2) reduce ping-timeout from 3 sec. to 1 sec
   (3) shutdown the secondary server, and eliminate the failover protocol 
from the architecture

But I need to turn dynamic dns updates back on, and I will eventually need 
to reactivate the failover protocol.  Before I can do so, I need to gain 
some understanding why there were so many failures under load.

The best clue I have so far is that the client experiencing repeated 
failures was receiving offers, but the client continued to send discover's 
as if it didn't notice the offers.

Is three seconds too long of a ping-timeout for Microsoft clients?

One other clue I have is that virtually all of the dynamic dns update 
attempts were failing because the DNS server isn't completely setup yet. 
Could the huge volume of ddns failures contribute to the server causing a 
large number of failures?

Is it possible that the client experiencing failures are rejecting or 
ignoring offers because they are malformed?  If so, then why would the 
failures be randomly scattered across all clients while the majority of 
clients actually succeed.

To clarify this failure behavior I should explain that a "failure" to me 
means that the ipconfig /renew or a fresh system boot results in an all 
zeros address or a 169.* address.  So some of the clients that took over 90 
seconds to get an address are still considered successes in my book.

--
Gordon A. Lang

----- Original Message ----- 
From: "Gordon A. Lang" <glang at goalex.com>
To: <dhcp-users at isc.org>
Sent: Wednesday, May 10, 2006 10:06 AM
Subject: Re: silently neglected requests fixed by restarting dhcpd

> My problem is not that the dhcpd fails completely -- it has been working
> fine for weeks with the USE_SOCKETS etc.
>
> It was working fine until I added all the additional VLAN's, which is
> causing problems only in the morning hours when the demand is high.
>
> I found this morning that the problem doesn't complete go away after a
> reboot -- it only goes away after the demand is reduced.
>
> I also found that the server is recieving requests, and issuing leases
> (according to the dhcpd.leases file), but the clients are not actually
> getting the responses -- or maybe the clients are getting the responses, 
> but
> they aren't liking them.  If people do a release and renew enough times, 
> it
> eventually works, but the wait time is averaging 3 to 4 minutes for those
> who weren't fortunate enough to start work early today.
>
> I am setting up some network sniffers to see what's going on.  Meanwhile,
> I'm fishing for information from anybody else that might have experienced
> similar problems.
>
> Our network infrastructure has not changed except to change the "ip
> helper-address" statements on the routers to point the dhcp traffic to the
> unix servers instead of the MS server.  I really hate to say it, and I'll
> clobber the first person here who says it to me, but we didn't have this
> problem with the MS server.
>
> --
> Gordon A. Lang
>
>
> ----- Original Message ----- 
> From: "David W. Hankins" <David_Hankins at isc.org>
> To: <dhcp-users at isc.org>
> Sent: Tuesday, May 09, 2006 4:31 PM
> Subject: Re: silently neglected requests fixed by restarting dhcpd
>
>
>> On Tue, May 09, 2006 at 02:44:29PM -0400, Gordon A. Lang wrote:
>>> Solaris 10 running dhcpd version 3.0.3 in a "whole root zone," which is
>>> Sun's terminology for their virtual-machine-like technology that is
>>> standard
>>> with Solaris 10.
>>> host address 192.168.104.11 (with 32 bit netmask) bound to a loopback
>>> interface on Solaris box.
>>> static route on routers direct 192.168.104.11/32 packets to the server's
>>> native address, which is 10.104.0.11.
>>> two local routers running HSRP on all layer-3 interfaces, plus one 
>>> router
>>> at
>>> each remote site.
>>> all routers have "ip helper-address 192.168.104.11" configurations on 
>>> all
>>> VLAN's.
>>
>> Mm.
>>
>> === Quoting the 3.0.4 readme:
>>
>>                               SOLARIS
>>
>> One problem which has been observed and is not fixed in this
>> patchlevel has to do with using DLPI on Solaris machines.  The symptom
>> of this problem is that the DHCP server never receives any requests.
>> This has been observed with Solaris 2.6 and Solaris 7 on Intel x86
>> systems, although it may occur with other systems as well.  If you
>> encounter this symptom, and you are running the DHCP server on a
>> machine with a single broadcast network interface, you may wish to
>> edit the includes/site.h file and uncomment the #define USE_SOCKETS
>> line.  Then type ``make clean; make''.  As an alternative workaround,
>> it has been reported that running 'snoop' will cause the dhcp server
>> to start receiving packets.  So the practice reported to us is to run
>> snoop at dhcpd startup time, with arguments to cause it to receive one
>> packet and exit.
>>
>>        snoop -c 1 udp port 67 > /dev/null &
>>
>> === end quote
>>
>>
>> I take it to mean that some event causes the DLPI device to stop
>> being able to receive packets...probably a bug in our DLPI
>> implementation.  I think I have patches queued somewhere that I
>> thought were feature enhancements.  I'll have to have a look to
>> make sure I queued those right.
>>
>> Back to your problem...
>>
>> I suggest in the network you describe (loopback aliased globally
>> routable ip address is the only way the dhcp server is reached),
>> you might want to recompile the dhcpd with USE_SOCKETS defined,
>> and then in dhcpd.conf set the 'local-address' statement to the
>> loopback alias (192.168.104.11).
>>
>> So instead of using DLPI at all, the server binds a normal UDP
>> socket to that address only.  So long as it only ever gets to talk
>> to clients either via unicast UDP or via relay agents, it should
>> all just work.
>>
>> -- 
>> David W. Hankins "If you don't do it right the first time,
>> Software Engineer you'll just have to do it again."
>> Internet Systems Consortium, Inc. -- Jack T. Hankins
>>
>>
>> -- 
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
>>
>
>
>
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>