dhcpd process hitting data size limit

Tue Mar 4 13:31:06 UTC 2008

I've been running ISC dhcpd with failover for more than five years. 
Currently running  V3.0.4rc1 including a patch for DHCPLEASEQUERY (by 
Julien at tayon.net), on Fedora core 3. About 100.000 active leases and 
five hours lease time.

At my servers the size of the dhcpd process is about 20 MBytes at 
startup. After about a week the size has increased to about 1 Gbyte. If 
the dhcpd has not been restarted due to config changes, I usually have 
to restart once a week due the the memory use. The problem with size 
growing has been there as far back as I can remember.

Leif Arne Neset
leifa at alfanett.no

> If those log messages appearred *after* the 512MB limit was reached,
> then I'd say they were to be expected, as every attempt to malloc()
> more memory would fail. The fact that the process didn't just exit or
> crash says a lot for the quality of the error handling within the
> code.
> 
> I suspect just restarting the primary would have been sufficient, but
> no harm in restarting both.
> 
> Perhaps a rogue device or devices were making lots of requests to the one 
> server? Anything earlier in the logs that points to abnormal behaviour?
> 
> No idea on process size, perhaps others with large numbers of lease
> might list their process sizes and number of leases? Anyone like to
> submit numbers for >50k leases?
> 
> regards,
> -glenn
> 
>> Date: Mon, 03 Mar 2008 15:26:12 +0100 (CET)
>> To: dhcp-users at isc.org
>> Subject: dhcpd process hitting data size limit
>> From: sthaug at nethelp.no
>>
>> I have a 3.1.0 server running as primary in a failover configuration,
>> around 100k leases, normal process size is around 90 - 100MB. Today the
>> dhcpd process on this server ballooned to over 500MB, and then hit the
>> default data size limit of 512 MB. In the logs I found the following:
>>
>> Mar  3 14:23:21 dhcp2 dhcpd: dhcp_failover_put_message: something went wrong.
>> Mar  3 14:23:21 dhcp2 dhcpd: peer dhcp1-dhcp2: disconnected
>> Mar  3 14:23:21 dhcp2 dhcpd: failover peer dhcp1-dhcp2: I move from normal to 
> communications-interrupted
>> Mar  3 14:23:22 dhcp2 dhcpd: uid lease 193.71.113.38 for client 
> 00:00:e2:94:6a:61 is duplicate on 193.71.112/21
>> Mar  3 14:23:23 dhcp2 dhcpd: uid lease 81.191.9.183 for client 
> 00:08:da:53:b9:df is duplicate on 81.191.0/20
>> Mar  3 14:23:26 dhcp2 dhcpd: dhcp_failover_put_message: something went wrong.
>> Mar  3 14:23:26 dhcp2 dhcpd: peer dhcp1-dhcp2: disconnected
>> Mar  3 14:23:26 dhcp2 dhcpd: failover: connect: no matching state.
>> Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
>> Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
>> Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
>> Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
>> Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
>> Mar  3 14:23:26 dhcp2 dhcpd: no memory for option buffer.
>> (repeat ad nauseam)
>>
>> On the failover peer, where the dhcpd process stayed at its normal size,
>> I found the following:
>>
>> Mar  3 14:23:21 slam2 dhcpd: peer dhcp1-dhcp2: disconnected
>> Mar  3 14:23:21 slam2 dhcpd: failover peer dhcp1-dhcp2: I move from normal to 
> communications-interrupted
>> Mar  3 14:23:22 slam2 dhcpd: uid lease 195.0.206.75 for client 
> 00:17:3f:96:d8:06 is duplicate on 195.0.200/21
>> Mar  3 14:23:24 slam2 dhcpd: uid lease 81.191.61.134 for client 
> 00:0b:82:0d:06:0a is duplicate on 81.191.48/20
>> Mar  3 14:23:26 slam2 dhcpd: peer dhcp1-dhcp2: disconnected
>> Mar  3 14:23:28 slam2 dhcpd: uid lease 81.191.126.180 for client 
> 00:a0:c5:c0:35:ea is duplicate on 81.191.112/20
>> Mar  3 14:23:31 slam2 dhcpd: uid lease 193.90.168.171 for client 
> 00:a0:c5:db:5a:97 is duplicate on 193.90.160/20
>> Mar  3 14:23:35 slam2 dhcpd: uid lease 81.191.199.70 for client 
> 00:a0:c5:80:84:37 is duplicate on 81.191.192/20
>> Mar  3 14:23:40 slam2 dhcpd: uid lease 193.91.143.135 for client 
> 00:17:3f:5c:28:64 is duplicate on 193.91.128/20
>> Mar  3 14:23:41 slam2 dhcpd: failover: link startup timeout
>> Mar  3 14:23:42 slam2 dhcpd: uid lease 81.191.182.196 for client 
> 00:13:49:4a:c3:b0 is duplicate on 81.191.176/20
>> Mar  3 14:23:44 slam2 dhcpd: uid lease 81.191.2.218 for client 
> 00:a0:c5:56:a5:cc is duplicate on 81.191.0/20
>> Mar  3 14:23:46 slam2 dhcpd: failover: link startup timeout
>> Mar  3 14:23:46 slam2 dhcpd: failover: link startup timeout
>>
>> I ended up restarting the dhcpd process on both servers, and everything
>> seems to be back to normal now. Both servers are running FreeBSD 6.3.
>>
>> So, my questions are:
>>
>> - Any idea what might have happened here? As far as we know there's been
>> communication between the failover peers at all times.
>> - Any rules of thumb for how big the dhcpd process is expected to grow,
>> presumably based on number of leases?
>>
>> Steinar Haug, Nethelp consulting, sthaug at nethelp.no
>>
> 
> 
>