dhcp fails with big dhcpd.leases
dorian33 at o2.pl
Tue Aug 31 21:05:11 UTC 2010
Simon Hobson wrote:
> dorian wrote:
>> Here is a little bit longer another log snippet
>> Aug 31 13:51:47 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>> Aug 31 13:51:47 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>> via br0
>> Aug 31 13:51:49 [dhcpd] DHCPDISCOVER from 00:23:14:c0:61:28 (BLU060)
>> via br0
>> Aug 31 13:51:49 [dhcpd] DHCPOFFER on 172.18.90.186 to 00:23:14:c0:61:28
>> (BLU060) via br0
>> Aug 31 13:51:50 [dhcpd] DHCPDISCOVER from 00:25:d3:d8:71:1c
>> (Malgos-Komputer) via br0
>> Aug 31 13:52:10 [dhcpd] DHCPOFFER on 172.18.93.237 to 00:22:43:95:d1:1e
>> (TWOJA-6VJZP1GTV) via br0
>> Aug 31 13:52:10 [dhcpd] DHCPDISCOVER from 00:25:bc:0e:09:83
>> (iPhone-SZAST) via br0
>> If you wish I can post a whole log file which is rather long but I don't
>> think it is any meaning to do that.
>> There is nothing interesting inside (a bunch of lines with DHCPDISCOVER
>> & DHCPOFFER messages without DHCPACK between them) - no warnings nor
>> Looking at the above snippet: host with MAC 7c:c5:37:21:d9:7c asked
>> several times for dhcp data.
>> The first logs concerning this MAC which can be found are:
>> Aug 31 12:54:03 [dhcpd] DHCPDISCOVER from 7c:c5:37:21:d9:7c via br0
>> Aug 31 12:54:04 [dhcpd] DHCPOFFER on 172.18.93.227 to 7c:c5:37:21:d9:7c
>> via br0
>> It means the host haven't got IP.
> But note also, it does NOT request the address.
It does. But I omit the request:
Aug 31 13:27:52 [dhcpd] DHCPREQUEST for 10.0.1.8 from 7c:c5:37:21:d9:7c
via br0: wrong network.
Aug 31 13:27:52 [dhcpd] DHCPNAK on 10.0.1.8 to 7c:c5:37:21:d9:7c via br0
I do not remember whole dhcp protocol. So I don't know what it is really
exchanged between client and server.
But according my -maybe naive consideration- the host should be able to
ask for a quite new IP without querying for assigning the "old" one.
Especially when it tries to get dhcp data connecting totally fresh
network: there is no address to ask about.
> If there is no Request, then the server has nothing to Ack.
Ok. I undestand - the DHCPACK is posted only when the host asks about
the IP address and the IP is confirmed.
> The ONLY request in that snippet is where 00:18:51:ce:b3:69 requests
> 172.27.140.7 but it is not a known lease. There isn't another instance
> of that MAC address in the log you posted.
> Now, why is it unknown ? Probably because you have broken your DHCP
> server by deleting the leases file.
First of all - the core of the problem is:
a) when the dhcpd.leases became "big" the server stops serving DHCP data
(or clients don't received them)
b) stopping server, removing dhcpd.leases and starting server - fixes
the problem immediately
And this is the problem being the _main subject_ of my mails.
The message exchange consideration is the results of my suspicions being
a result of my ignorance regarding the protocol.
BTW: I have never wrote that I just delete the lease file.
> This is something you really, really should not be doing as it breaks
> stuff badly. It means the server has no knowledge whatsoever of
> "promises" it has previously made to clients, and so it will tend to
> make offers for addresses that are already in use.
>> > The leases file is a log file - the server only ever appends to it,
>>> and during operations it never reads from it. It is only ever read
>>> during startup when it reads each lease in turn and populates it's
>>> internal tables. Even then, it does not (I assume) read the file into
>>> memory - it just has to parse each lease as it munches through the
>> Well. Having big dhcpd.leases file (with the size near mentioned above)
>> I've found the server has to read the dhcpd.leases since start takes
>> ~10minutes (it is not an error -10 minutes!)
> Which is what I wrote - it reads the file **during startup** in order
> to populate the internal data structures with the leases that have
> been previously given out. It is never read at any other time.
>> According to my experience - removing the dhcpd.leases and restart fixes
>> the disfunctionality of the server immediately whereas restarting the
>> server with big dhcpd.leases changes nothing (apart from the restart is
>> extremely long)
> But deleting the leases file DOES fundamentally break your server config.
Well. So, was Sten Carlsen wrong writing " The leases file is a log file
- the server only ever appends to it, and during operations it never
reads from it." ?
Because if he was right deleting lease file (during server run time)
should not break the server - or there is a bug in the software as
writing to opened (i.e using file handle to) file which has been
removed should be detected in the software.
I know that noone assumes such stupid user action but for the services
running 24/24 everything may happen - the file system can crash (or
whole HDD having the partition with this file can corrupt) so such
service should "behave correctly" and report the error in other way.
I have never wrote that I am deleting file without stopping the server.
And dhcpd.lease file remove is "legal" when server is not running, isn't
>> > To avoid the file growing ever larger, the server will periodically
>>> clean up. It does this by writing out it's current in-memory tables to
>>> a new leases file, and swapping it into place by renaming the original
>>> file and then renaming the new file into place.
>> How long is the "period" ?
>> I've never found the file dhcpd.leases became smaller...
> The period is a (compiled in) default of 1 hour. If you look, you
> should see something like "dhcpd.leases" and "dhcpd.leases~". The
> second of these is the previous version.
Ok. Thanks for info.
> You should see the new version is slightly smaller than the old one
> immediately after the cleanup.
Ok. Yes, it is smaller.
> It will never be 'small' on a server with that configuration because
> it will have to keep track of up to about 260,000 addresses. Even when
> a lease has expired, the last state of it is kept indefinitely in case
> the client should return to the network - and it is only replaced when
> the server runs out of "never used before" addresses and starts
> reusing expired leases in a "least recently used" manner.
Please be so kind and clarify Sten Carlsen info mentioned above: if he
is right and there is no restarts the track should be kept in RAM rather
than on HDD.
> I'm not trying to say you don't have a problem, but so far the log
> snippets don't show it. Have you tried picking a client MAC and
> 'grep'ing for that in the log ?
You are right: so far the log snippets don't show the problem.
And looking at the logs (I am keeping ALL of the logs - nothing is
deleted or rotated) I cannot find the problem.
Server process looks like it is working - but in fact does not.
If you confirm I can send you all logs files (even for whole last month
if you wish).
But I am not sure if it makes sense - grep'ing them for an 'error'
phrase gives nothing.
>> Sorry. I do not understand.
>> What is illegal or unusual with it?
>> 172.16.8.0 belongs to 172.16.0.0/14
>> and 172.16.0.0/14 is a part of 172.16.0.0/12 private class
>> So what does mean 'behave "funny".' ?
> There is nothing illegal or funny, but it is known that a small number
> of badly programmed clients cannot cope with the last octet being 0 or
> 255 since everyone "knows" that 0 is the network address and 255 is
> the broadcast address.
Do you know which ones? Windows? MacOS? Mobile OSes?
Quite new info for me! Detecting net on IP base only?
I've ever assumed that to get net & broadcast I need IP and mask.
Well, its very interesting...
> Complete rubbish, but there are people who have never used anything
> but a /24 subnet and just cannot comprehend anything else - and that
> includes some supposedly professional IT people I've worked with !
> For that reason alone, it's suggested to avoid them by splitting your
> ranges thus :
> range 172.16.8.1 172.16.8.254;
> range 172.16.9.1 172.16.9.254;
> range 172.16.10.1 172.16.10.254;
> Something of a pain for the number of addresses you have !
Good idea, but I am afraid it will not solve the problem.
My linux box could not to obtain the IP when the server became
disfunctional and I assume this OS will not 'behave "funny".'
What is more interesting: the problem doesn't mean total disfunction -
another PC received IP.
And another one - not.
But generally IP is not assigned to the clients.
>> > That range is over a quarter of a million addresses. Does the server
>>> still have issues with very large ranges ?
>> Yes it is.
>> And even if not - in my opinion this doesn't concern the point of the
> Well the point is that it's a large number of addresses, and from
> memory of threads I didn't pay much attention to as I only run small
> servers, there are aspects (hash table IIRC) that don't scale too well
> for very large address spaces. Even for addresses that aren't used,
> the server must build a them into an internal list.
> There are few people running such large spaces, but from memory I
> don't think yours if the biggest that's been mentioned on this list.
>> > I vaguely recall there used to be issues with memory usage and
>> The host is equipped with 16GB RAM so...
>>> It does sound a rather excessive number of addresses - even for a
>>> public access point.
>> As above: this is not a point of the problem - or maybe is it? But if so
>> please say it clearly.
>> Are there any limits on served IP ranges or classes?
> There are no specific limits, other than memory and I/O bandwidth. As
> mentioned above, there are some elements of the design that don't
> scale well - or didn't in earlier versions. On that point - what
> version are you using ?
Version of which stuff?
I am using Gentoo Linux and the dhcp version is 3.1.2_p1
Everything is compiled for 64bit platform.
>> I need such big IP range since in fact I have a network of hotspots
>> working in bridge and centrally controlled from one host.
> How many clients do you normally have on the network in any 2 hour
> period ?
Daily I have about 60 client per point and it grows.
Now I have 10 points. The plans are to have up to 1000 points.
> Looking at your original log snippet, you seem to have less than one
> request per second. For 250,000 clients and a 2 hours lease, you
> should be seeing not less than about 34 request-ack or
> discover-offer-request-ack exchanges per second.
> I'd suggest it's worth cutting back on the address space and see if it
> makes a difference.
The lease time is 72000 not 7200 which gives 20 hours (in practice =
The wide range of IP let me to assume that the same client (=same MAC)
will have same IP a day.
What is more with a high probability he will get the same IP in another
hotspot next day(s).
And differentiating between clients is very important for the business.
> Also, almost as an aside, I notice that you have timeouts for DNS
> updates. This suggests that your DDNS isn't set up correctly - it
> might be worth turning it off while you are trying to troubleshoot
> this problem.
Could you be more precise?
I am not an expert as far as dhcp settings.
In small (<100 hosts) networks I have been involved till now the
settings were ok, so I would be obliged for any advices/directions what
I need to learn else to manage with this case...
More information about the dhcp-users