Dear Folks, On 07/05/07 11:14 +0800, jgomez@infoweapons.com wrote: >> Dear Folks, >> We are experiencing occasional problems in re-establishing normal >> state after the failover pair enter communications-interrupted >> state. We are using ISC DHCP server 3.0.4. >> >> The failover pair enter communications interrupted when one of the >> pair begins to re-write its dhcpd.leases file. >> >> I was woken at midnight last night again to return them to normal, >> which I did by restarting the primary member of the pair. >> >> 1. Have others experienced this problem? >> 2. Is there any known patch or fix? >> 3. Any suggestions on how to go about fixing it? >> 4. Can I use omapi to help restore communication without restarting >> the dhcp server? > >hello, > >i suggest that you use the latest version of ISC DHCP which is >dhcp-3.0.5 because some of the older versions won't support DHCP >Failover. Yes, I agree that this may help; there is one change that *may* have some impact. The changes from 3.0.4 -> 3.0.5 are not great, and I am arguing for this with some members of my team. But this is a production system that looks after more than half a million customers, so changes are not to be made lightly. And I was shouted at very abusively for suggesting this. I continue to argue for this. >yes, u can use omapi protocol to connect to the ISC DHCP server and change >its state without stopping it. But how can I use it to change from "Communications-Interrupted" to "Normal"? The failover pair should enter the "Normal" state from the "Communications-Interrupted" state automatically. The problem is that from time to time, there is a failure to return to normal communication, resulting to a night-time call to the on-call phone under my pillow. >When the DHCP server is configured to use OMAPI, you can connect to >it by using an OMAPI client and issue commands to the server. There >is an interactive OMAPI client called omshell that is ideal for >simple server changes. Yes, well, I suppose I need to investigate this by writing a tool in Perl and then trying it out the next time I'm called out, unless I finish my hacks to the DHCP server to avoid its prolonged stop for a 250MB lease file rewrite, described in another post to this list. Does anyone have any ideas about/experience with the communication failure problem that I attempted to describe in the original post? -- Nick Urbanik http://nicku.org x-71011 nick.urbanik@optusnet.com.au GPG: 7FFA CDC7 5A77 0558 DC7A 790A 16DF EC5B BB9D 2C24 ID: BB9D2C24