dhcp fails with big dhcpd.leases

Simon Hobson dhcp1 at thehobsons.co.uk
Tue Aug 31 19:10:22 UTC 2010


dorian wrote:

>Here is a little bit longer another log snippet
>Aug 31 13:51:47 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>Aug 31 13:51:47 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>via br0
>Aug 31 13:51:49 [dhcpd] DHCPDISCOVER from 00:23:14:c0:61:28 (BLU060) via br0
>Aug 31 13:51:49 [dhcpd] DHCPOFFER on 172.18.90.186 to 00:23:14:c0:61:28
>(BLU060) via br0
>Aug 31 13:51:50 [dhcpd] DHCPDISCOVER from 00:25:d3:d8:71:1c
>(Malgos-Komputer) via br0
>Aug 31 13:51:50 [dhcpd] DHCPOFFER on 172.18.93.236 to 00:25:d3:d8:71:1c
>(Malgos-Komputer) via br0
>Aug 31 13:51:51 [dhcpd] DHCPDISCOVER from 00:18:41:c7:6c:01
>(Touch_Diamond) via br0
>Aug 31 13:51:51 [dhcpd] DHCPOFFER on 172.19.228.144 to 00:18:41:c7:6c:01
>(Touch_Diamond) via br0
>Aug 31 13:51:55 [dhcpd] DHCPREQUEST for 172.27.140.7 from
>00:18:51:ce:b3:69 via br0: unknown lease 172.27.140.7.
>Aug 31 13:51:56 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>Aug 31 13:51:56 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>via br0
>Aug 31 13:51:59 [dhcpd] DHCPDISCOVER from 00:25:d3:d8:71:1c
>(Malgos-Komputer) via br0
>Aug 31 13:51:59 [dhcpd] DHCPOFFER on 172.18.93.236 to 00:25:d3:d8:71:1c
>(Malgos-Komputer) via br0
>Aug 31 13:52:01 [dhcpd] DHCPDISCOVER from 00:25:bc:0e:09:83
>(iPhone-SZAST) via br0
>Aug 31 13:52:01 [dhcpd] DHCPOFFER on 172.16.215.73 to 00:25:bc:0e:09:83
>(iPhone-SZAST) via br0
>Aug 31 13:52:04 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>Aug 31 13:52:04 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>via br0
>Aug 31 13:52:05 [dhcpd] DHCPDISCOVER from 00:23:14:c0:61:28 (BLU060) via br0
>Aug 31 13:52:05 [dhcpd] DHCPOFFER on 172.18.90.186 to 00:23:14:c0:61:28
>(BLU060) via br0
>Aug 31 13:52:08 [dhcpd] DHCPDISCOVER from 00:18:41:c7:6c:01
>(Touch_Diamond) via br0
>Aug 31 13:52:08 [dhcpd] DHCPOFFER on 172.19.228.144 to 00:18:41:c7:6c:01
>(Touch_Diamond) via br0
>Aug 31 13:52:09 [dhcpd] DHCPDISCOVER from 00:22:43:95:d1:1e via br0
>Aug 31 13:52:10 [dhcpd] DHCPOFFER on 172.18.93.237 to 00:22:43:95:d1:1e
>(TWOJA-6VJZP1GTV) via br0
>Aug 31 13:52:10 [dhcpd] DHCPDISCOVER from 00:25:bc:0e:09:83
>(iPhone-SZAST) via br0
>
>If you wish I can post a whole log file which is rather long but I don't
>think it is any meaning to do that.
>There is nothing interesting inside (a bunch of lines with DHCPDISCOVER
>& DHCPOFFER messages without DHCPACK between them) - no warnings nor errors.
>
>Looking at the above snippet:   host with MAC 7c:c5:37:21:d9:7c asked
>several times for dhcp data.
>The first logs concerning this MAC which can be found are:
>Aug 31 12:54:03 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>Aug 31 12:54:04 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>via br0
>
>It means the host haven't got IP.

But note also, it does NOT request the address. If there is no 
Request, then the server has nothing to Ack. The ONLY request in that 
snippet is where 00:18:51:ce:b3:69 requests 172.27.140.7 but it is 
not a known lease. There isn't another instance of that MAC address 
in the log you posted.

Now, why is it unknown ? Probably because you have broken your DHCP 
server by deleting the leases file. This is something you really, 
really should not be doing as it breaks stuff badly. It means the 
server has no knowledge whatsoever of "promises" it has previously 
made to clients, and so it will tend to make offers for addresses 
that are already in use.

>  > The leases file is a log file - the server only ever appends to it,
>>  and during operations it never reads from it. It is only ever read
>>  during startup when it reads each lease in turn and populates it's
>>  internal tables. Even then, it does not (I assume) read the file into
>>  memory - it just has to parse each lease as it munches through the file.
>>
>Well. Having big dhcpd.leases file (with the size near mentioned above)
>I've found the server has to read the dhcpd.leases since start takes
>~10minutes (it is not an error  -10 minutes!)

Which is what I wrote - it reads the file **during startup** in order 
to populate the internal data structures with the leases that have 
been previously given out. It is never read at any other time.

>According to my experience - removing the dhcpd.leases and restart fixes
>the disfunctionality of the server immediately whereas restarting the
>server with big dhcpd.leases changes nothing (apart from the restart is
>extremely long)

But deleting the leases file DOES fundamentally break your server config.

>  > To avoid the file growing ever larger, the server will periodically
>>  clean up. It does this by writing out it's current in-memory tables to
>>  a new leases file, and swapping it into place by renaming the original
>>  file and then renaming the new file into place.
>>
>How long is the "period" ?
>I've never found the file dhcpd.leases became smaller...

The period is a (compiled in) default of 1 hour. If you look, you 
should see something like "dhcpd.leases" and "dhcpd.leases~". The 
second of these is the previous version.

You should see the new version is slightly smaller than the old one 
immediately after the cleanup. It will never be 'small' on a server 
with that configuration because it will have to keep track of up to 
about 260,000 addresses. Even when a lease has expired, the last 
state of it is kept indefinitely in case the client should return to 
the network - and it is only replaced when the server runs out of 
"never used before" addresses and starts reusing expired leases in a 
"least recently used" manner.

I'm not trying to say you don't have a problem, but so far the log 
snippets don't show it. Have you tried picking a client MAC and 
'grep'ing for that in the log ?



>Sorry. I do not understand.
>What is illegal or unusual with it?
>
>172.16.8.0 belongs to 172.16.0.0/14
>and 172.16.0.0/14 is a part of 172.16.0.0/12 private class
>
>So what does mean 'behave "funny".' ?

There is nothing illegal or funny, but it is known that a small 
number of badly programmed clients cannot cope with the last octet 
being 0 or 255 since everyone "knows" that 0 is the network address 
and 255 is the broadcast address. Complete rubbish, but there are 
people who have never used anything but a /24 subnet and just cannot 
comprehend anything else - and that includes some supposedly 
professional IT people I've worked with !

For that reason alone, it's suggested to avoid them by splitting your 
ranges thus :

range 172.16.8.1 172.16.8.254;
range 172.16.9.1 172.16.9.254;
range 172.16.10.1 172.16.10.254;
...
Something of a pain for the number of addresses you have !



>  > That range is over a quarter of a million addresses. Does the server
>>  still have issues with very large ranges ?
>Yes it is.
>And even if not - in my opinion this doesn't concern the point of the
>problem...

Well the point is that it's a large number of addresses, and from 
memory of threads I didn't pay much attention to as I only run small 
servers, there are aspects (hash table IIRC) that don't scale too 
well for very large address spaces. Even for addresses that aren't 
used, the server must build a them into an internal list.
There are few people running such large spaces, but from memory I 
don't think yours if the biggest that's been mentioned on this list.

>  > I vaguely recall there used to be issues with memory usage and startup
>>  times.
>The host is equipped with 16GB RAM so...
>>
>>  It does sound a rather excessive number of addresses - even for a
>>  public access point.
>>
>As above: this is not a point of the problem - or maybe is it? But if so
>please say it clearly.
>Are there any limits on served IP ranges or classes?

There are no specific limits, other than memory and I/O bandwidth. As 
mentioned above, there are some elements of the design that don't 
scale well - or didn't in earlier versions. On that point - what 
version are you using ?

>I need such big IP range since in fact I have a network of hotspots
>working in bridge and centrally controlled from one host.

How many clients do you normally have on the network in any 2 hour 
period ? Looking at your original log snippet, you seem to have less 
than one request per second. For 250,000 clients and a 2 hours lease, 
you should be seeing not less than about 34 request-ack or 
discover-offer-request-ack exchanges per second.

I'd suggest it's worth cutting back on the address space and see if 
it makes a difference.


Also, almost as an aside, I notice that you have timeouts for DNS 
updates. This suggests that your DDNS isn't set up correctly - it 
might be worth turning it off while you are trying to troubleshoot 
this problem.

-- 
Simon Hobson

Visit http://www.magpiesnestpublishing.co.uk/ for books by acclaimed
author Gladys Hobson. Novels - poetry - short stories - ideal as
Christmas stocking fillers. Some available as e-books.



More information about the dhcp-users mailing list