Failover peer separation revisted

Robert Blayzor rblayzor.bulk at inoc.net
Fri Nov 14 14:49:27 UTC 2008


I wrote to the list a few weeks ago about a random problem we're seeing.

We're running ISC DHCPD 3.0.7 on FreeBSD 6.3-64bit via the FreeBSD  
ports.

Everything appears to be working normally, except at random times our  
DHCP servers appear to just disconnect from each other and they NEVER  
reconnect.  It can take just a few days or it can take a few weeks for  
this problem to creep up.  At first we thought maybe this was a  
problem on the network, but everything has been verified.  Ports are  
good, good NIC's, cabling, no errors on interfaces, no other network  
problems reported by any other application or the switch itself.  The  
systems are directly connected to a Cisco 6509 via 1000BaseT.

So the two questions are, why are they disconnecting?  The primary  
server sees it as a timeout, and secondly why are they not attempting  
to reconnect?

Here is what we see in our DHCP log when this problem happens, and the  
TCP dumps of each server follows:

...x.y.z
Nov 13 14:30:14 dhcp1 dhcpd: DHCPDISCOVER from 00:13:20:4e:71:38  
(D2Z2DJ61) via x.y.z.1
Nov 13 14:30:14 dhcp1 dhcpd: DHCPOFFER on x.y.z.169 to 00:13:20:4e: 
71:38 (D2Z2DJ61) via x.y.z.1
Nov 13 14:30:15 dhcp0 dhcpd: DHCPOFFER on x.y.z.202 to 00:13:20:4e: 
71:38 (D2Z2DJ61) via x.y.z.1
Nov 13 14:30:15 dhcp0 dhcpd: DHCPREQUEST for a.b.c.250 from  
00:12:17:d9:75:ce via a.b.c.193
Nov 13 14:30:15 dhcp0 dhcpd: DHCPACK on a.b.c.250 to 00:12:17:d9:75:ce  
via a.b.c.193
Nov 13 14:30:15 dhcp0 dhcpd: LEASE for x.y.z.202 on x.y.z.1 port  
16/12/161 (AID: 1.1.0.0.40.16.24.1.10.c.0.a1)
Nov 13 14:30:15 dhcp1 dhcpd: DHCPREQUEST for a.b.c.250 from  
00:12:17:d9:75:ce via a.b.c.193
Nov 13 14:30:15 dhcp1 dhcpd: DHCPACK on a.b.c.250 to 00:12:17:d9:75:ce  
via a.b.c.193
Nov 13 14:30:15 dhcp1 dhcpd: uid lease x.y.z.169 for client  
00:13:20:4e:71:38 is duplicate on x.y.z/24
Nov 13 14:30:15 dhcp1 dhcpd: DHCPREQUEST for x.y.z.202 (dhcp0) from  
00:13:20:4e:71:38 via x.y.z.1: lease owned by peer
Nov 13 14:30:15 dhcp0 dhcpd: DHCPREQUEST for x.y.z.202 (dhcp0) from  
00:13:20:4e:71:38 (D2Z2DJ61) via x.y.z.1
Nov 13 14:30:15 dhcp0 dhcpd: DHCPACK on x.y.z.202 to 00:13:20:4e:71:38  
(D2Z2DJ61) via x.y.z.1
Nov 13 14:30:39 dhcp0 dhcpd: timeout waiting for failover peer dhcp- 
failover
Nov 13 14:30:39 dhcp0 dhcpd: peer dhcp-failover: disconnected
Nov 13 14:30:39 dhcp0 dhcpd: failover peer dhcp-failover: I move from  
normal to communications-interrupted
Nov 13 14:30:44 dhcp1 dhcpd: DHCPREQUEST for a.b.c.245 from  
00:13:10:36:88:e9 via a.b.c.193
Nov 13 14:30:44 dhcp1 dhcpd: DHCPACK on a.b.c.245 to 00:13:10:36:88:e9  
via a.b.c.193
Nov 13 14:30:44 dhcp1 dhcpd: timeout waiting for failover peer dhcp- 
failover
Nov 13 14:30:44 dhcp1 dhcpd: peer dhcp-failover: disconnected
Nov 13 14:30:44 dhcp1 dhcpd: failover peer dhcp-failover: I move from  
normal to communications-interrupted
Nov 13 14:30:44 dhcp0 dhcpd: DHCPREQUEST for a.b.c.245 from  
00:13:10:36:88:e9 via a.b.c.193
Nov 13 14:30:44 dhcp0 dhcpd: DHCPACK on a.b.c.245 to 00:13:10:36:88:e9  
via a.b.c.193
Nov 13 14:30:48 dhcp0 dhcpd: LEASE for x.y.z.67 on x.y.z.1 port  
16/11/134 (AID: 1.1.0.0.40.16.24.1.10.b.0.86)
Nov 13 14:30:48 dhcp1 dhcpd: LEASE for x.y.z.67 on x.y.z.1 port  
16/11/134 (AID: 1.1.0.0.40.16.24.1.10.b.0.86)
Nov 13 14:30:48 dhcp0 dhcpd: DHCPREQUEST for x.y.z.67 from  
00:13:d4:d4:c7:e4 (your-27e1513d96) via x.y.z.1
Nov 13 14:30:48 dhcp0 dhcpd: DHCPACK on x.y.z.67 to 00:13:d4:d4:c7:e4  
(your-27e1513d96) via x.y.z.1
Nov 13 14:30:48 dhcp1 dhcpd: DHCPREQUEST for x.y.z.67 from  
00:13:d4:d4:c7:e4 (your-27e1513d96) via x.y.z.1
Nov 13 14:30:48 dhcp1 dhcpd: DHCPACK on x.y.z.67 to 00:13:d4:d4:c7:e4  
(your-27e1513d96) via x.y.z.1
Nov 13 14:30:51 dhcp1 dhcpd: DHCPDISCOVER from 00:16:76:85:21:b1 via  
x.y.z.1
Nov 13 14:30:51 dhcp0 dhcpd: DHCPDISCOVER from 00:16:76:85:21:b1 via  
x.y.z.1
Nov 13 14:30:51 dhcp0 dhcpd: DHCPDISCOVER from 00:18:3a:59:78:17 via  
x.y.z.1
...

No other failover info is ever logged... this went on over a day.  The  
only way to "fix it" is to restart each server process.

Last 25 lines from the tcpdump on dhcp0:

14:30:05.006630 IP dhcp1.61146 > dhcp0.520: P 32178607:32178686(79)  
ack 32206878 win 33304 <nop,nop,timestamp 1951122143 1950785953>
14:30:05.006655 IP dhcp0.520 > dhcp1.61146: . ack 32178686 win 33264  
<nop,nop,timestamp 1950787716 1951122143>
14:30:05.017992 IP dhcp0.520 > dhcp1.61146: P 32206878:32206957(79)  
ack 32178686 win 33304 <nop,nop,timestamp 1950787728 1951122143>
14:30:05.018249 IP dhcp1.61146 > dhcp0.520: . ack 32206957 win 33264  
<nop,nop,timestamp 1951122155 1950787728>
14:30:07.564790 IP dhcp1.61146 > dhcp0.520: P 32178686:32178765(79)  
ack 32206957 win 33304 <nop,nop,timestamp 1951124701 1950787728>
14:30:07.564813 IP dhcp0.520 > dhcp1.61146: . ack 32178765 win 33264  
<nop,nop,timestamp 1950790274 1951124701>
14:30:07.566151 IP dhcp0.520 > dhcp1.61146: P 32206957:32207036(79)  
ack 32178765 win 33304 <nop,nop,timestamp 1950790276 1951124701>
14:30:07.566414 IP dhcp1.61146 > dhcp0.520: . ack 32207036 win 33264  
<nop,nop,timestamp 1951124703 1950790276>
14:30:09.007713 IP dhcp1.61146 > dhcp0.520: P 32178765:32178844(79)  
ack 32207036 win 33304 <nop,nop,timestamp 1951126144 1950790276>
14:30:09.007737 IP dhcp0.520 > dhcp1.61146: . ack 32178844 win 33264  
<nop,nop,timestamp 1950791717 1951126144>
14:30:09.020247 IP dhcp0.520 > dhcp1.61146: P 32207036:32207115(79)  
ack 32178844 win 33304 <nop,nop,timestamp 1950791730 1951126144>
14:30:09.020456 IP dhcp1.61146 > dhcp0.520: . ack 32207115 win 33264  
<nop,nop,timestamp 1951126157 1950791730>
14:30:15.089882 IP dhcp0.520 > dhcp1.61146: P 32207115:32207273(158)  
ack 32178844 win 33304 <nop,nop,timestamp 1950797799 1951126157>
14:30:15.090230 IP dhcp1.61146 > dhcp0.520: . ack 32207273 win 33225  
<nop,nop,timestamp 1951132226 1950797799>
14:30:25.000892 IP dhcp0.520 > dhcp1.61146: P 32207273:32207285(12)  
ack 32178844 win 33304 <nop,nop,timestamp 1950807710 1951132226>
14:30:25.001054 IP dhcp1.61146 > dhcp0.520: . ack 32207285 win 33219  
<nop,nop,timestamp 1951142137 1950807710>
14:30:35.001352 IP dhcp0.520 > dhcp1.61146: P 32207285:32207297(12)  
ack 32178844 win 33304 <nop,nop,timestamp 1950817710 1951142137>
14:30:35.001576 IP dhcp1.61146 > dhcp0.520: . ack 32207297 win 33213  
<nop,nop,timestamp 1951152137 1950817710>
14:30:39.001560 IP dhcp0.520 > dhcp1.61146: F 32207297:32207297(0) ack  
32178844 win 33304 <nop,nop,timestamp 1950821710 1951152137>
14:30:39.001785 IP dhcp1.61146 > dhcp0.520: . ack 32207298 win 33213  
<nop,nop,timestamp 1951156137 1950821710>
14:30:44.132971 IP dhcp1.61146 > dhcp0.520: P 32178844:32179002(158)  
ack 32207298 win 33213 <nop,nop,timestamp 1951161268 1950821710>
14:30:44.132974 IP dhcp1.61146 > dhcp0.520: F 32179002:32179002(0) ack  
32207298 win 33304 <nop,nop,timestamp 1951161268 1950821710>
14:30:44.133001 IP dhcp0.520 > dhcp1.61146: R 1530420179:1530420179(0)  
win 0
14:30:44.133007 IP dhcp0.520 > dhcp1.61146: R 1530420179:1530420179(0)  
win 0


Last 25 lines from tcpdump on dhcp1:

14:30:05.006431 IP dhcp1.61146 > dhcp0.520: P 32177603:32177682(79)  
ack 32205872 win 33304 <nop,nop,timestamp 1951122143 1950785953>
14:30:05.006691 IP dhcp0.520 > dhcp1.61146: . ack 32177682 win 33264  
<nop,nop,timestamp 1950787716 1951122143>
14:30:05.018059 IP dhcp0.520 > dhcp1.61146: P 32205872:32205951(79)  
ack 32177682 win 33304 <nop,nop,timestamp 1950787728 1951122143>
14:30:05.018074 IP dhcp1.61146 > dhcp0.520: . ack 32205951 win 33264  
<nop,nop,timestamp 1951122155 1950787728>
14:30:07.564546 IP dhcp1.61146 > dhcp0.520: P 32177682:32177761(79)  
ack 32205951 win 33304 <nop,nop,timestamp 1951124701 1950787728>
14:30:07.564839 IP dhcp0.520 > dhcp1.61146: . ack 32177761 win 33264  
<nop,nop,timestamp 1950790274 1951124701>
14:30:07.566212 IP dhcp0.520 > dhcp1.61146: P 32205951:32206030(79)  
ack 32177761 win 33304 <nop,nop,timestamp 1950790276 1951124701>
14:30:07.566226 IP dhcp1.61146 > dhcp0.520: . ack 32206030 win 33264  
<nop,nop,timestamp 1951124703 1950790276>
14:30:09.007477 IP dhcp1.61146 > dhcp0.520: P 32177761:32177840(79)  
ack 32206030 win 33304 <nop,nop,timestamp 1951126144 1950790276>
14:30:09.007755 IP dhcp0.520 > dhcp1.61146: . ack 32177840 win 33264  
<nop,nop,timestamp 1950791717 1951126144>
14:30:09.020248 IP dhcp0.520 > dhcp1.61146: P 32206030:32206109(79)  
ack 32177840 win 33304 <nop,nop,timestamp 1950791730 1951126144>
14:30:09.020264 IP dhcp1.61146 > dhcp0.520: . ack 32206109 win 33264  
<nop,nop,timestamp 1951126157 1950791730>
14:30:15.089993 IP dhcp0.520 > dhcp1.61146: P 32206109:32206267(158)  
ack 32177840 win 33304 <nop,nop,timestamp 1950797799 1951126157>
14:30:15.090008 IP dhcp1.61146 > dhcp0.520: . ack 32206267 win 33225  
<nop,nop,timestamp 1951132226 1950797799>
14:30:25.000895 IP dhcp0.520 > dhcp1.61146: P 32206267:32206279(12)  
ack 32177840 win 33304 <nop,nop,timestamp 1950807710 1951132226>
14:30:25.000916 IP dhcp1.61146 > dhcp0.520: . ack 32206279 win 33219  
<nop,nop,timestamp 1951142137 1950807710>
14:30:35.001372 IP dhcp0.520 > dhcp1.61146: P 32206279:32206291(12)  
ack 32177840 win 33304 <nop,nop,timestamp 1950817710 1951142137>
14:30:35.001403 IP dhcp1.61146 > dhcp0.520: . ack 32206291 win 33213  
<nop,nop,timestamp 1951152137 1950817710>
14:30:39.001563 IP dhcp0.520 > dhcp1.61146: F 32206291:32206291(0) ack  
32177840 win 33304 <nop,nop,timestamp 1950821710 1951152137>
14:30:39.001584 IP dhcp1.61146 > dhcp0.520: . ack 32206292 win 33213  
<nop,nop,timestamp 1951156137 1950821710>
14:30:44.132759 IP dhcp1.61146 > dhcp0.520: P 32177840:32177998(158)  
ack 32206292 win 33213 <nop,nop,timestamp 1951161268 1950821710>
14:30:44.132787 IP dhcp1.61146 > dhcp0.520: F 32177998:32177998(0) ack  
32206292 win 33304 <nop,nop,timestamp 1951161268 1950821710>
14:30:44.132975 IP dhcp0.520 > dhcp1.61146: R 1530420179:1530420179(0)  
win 0
14:30:44.132978 IP dhcp0.520 > dhcp1.61146: R 1530420179:1530420179(0)  
win 0




-- 
Robert Blayzor, BOFH
INOC, LLC
rblayzor at inoc.net
http://www.inoc.net/~rblayzor/





More information about the dhcp-users mailing list