Automatically reconnect to failover peer?
David W. Hankins
David_Hankins at isc.org
Wed May 31 15:32:24 UTC 2006
On Thu, Jun 01, 2006 at 12:47:32AM +1000, Glenn Satchell wrote:
> We ran a disaster recovery test the other day. This involved
> disconnecting the network between the two sites that the failover peers
> are in. The disconnect was noticed and they moved to
> communications-interrupted, but upon reconnecting the networks about 3
> hours later the two did not automatically detect each other and return
> to normal mode. We're running 3.0.3, but I am sure this was something
> that was fixed as it did work when we did this test about a year ago
> (3.0.2 perhaps?).
It's hard to say if this is the old bug infoblox sent me a patch for
or a new one...
It could just be you got lucky and excercised a different code path.
> Is this an old bug that has come back, or a different problem
> altogether. We did wait about 40 minutes or so to see if they
> reconnected. During this time we were snooping for traffic, but there
> was nothing on the failover ports.
The retry interval should be more like 90 seconds.
If you have the time, try defining "DEBUG_FAILOVER_TIMING", which will
print out a message prior to every add_timeout() call. Then look
at the syslogs.
The last log lines that look failover related before the failover
timing debug logs is hopefully where the problem lies.
David W. Hankins "If you don't do it right the first time,
Software Engineer you'll just have to do it again."
Internet Systems Consortium, Inc. -- Jack T. Hankins
More information about the dhcp-users