More strange issues after upgrading from bind8 to 9

Tue Aug 28 10:43:28 UTC 2001

Dear All,

A couple of weeks back I mailed regarding crashes we were getting after
upgrading from 8.23 to 9.1.3. We are running on Solaris 8.

This was happening on all three of our systems. Last week I ended up taking
one
of the systems back down to 8.2.4, after seeing yet another strange problem.
The system that crashes, is currently with ISC, but the other two were
showing major timeouts.

I know there has been loads of messages regarding the timeout messages,
which basically indicate either a zone or a network issue. However, what we
are seeing doesn't seem to fall into either of these. The two systems, which
are
a master and slave, are located in two different buildings. We have
our own network, and have been ensured that no work has been carried out at
the point when the problems started to occur. There are no firewalls, or
anything we can think of that may be in the way causing an occurrence which
is turning out to be almost regular as clockwork.

On the slave we start to see the refresh/timeout messages, but they are the
address of the other master and other masters which we are also slave to.
Initially they are not causing much effect, but after a little while, the
intensity of the messages becomes almost constant. The message log on our
master seems to stop logging almost all messages, and appears to go to
sleep. Now customers are now noticing timeouts in resolving, and end up
calling us. At this point we have to do a restart on the process, on both
systems. They both appear to be caught up in what ever is happening here.

Since down grading the master back to 8.2.4, on Thursday last week, this
machine is no longer crashing, or showing the timeouts mentioned above. The
Slave, however, although not timing out to our Master any more, is still
showing the same symptoms.

If this happened only the once, I could possibly understand it, but as I
said, it's as regular as clockwork, as it happens again almost 4 1/2 hrs
later and so on. Our NOC are having to kill the process still.

I know alot of people have seen the same error messages, but has anyone
noticed a degradation in service at the same time, or any regular patterns
to these messages being logged ?

Any help would be much appreciated.

Thanks,
Richard
--------------------------------------------------------------------------
Richard Whelan
Senior Systems Administrator
HighwayOne Ltd
Email: rwhelan at highwayone.net
Mobile: +44 (0) 771 333 1904
Phone: +44 (0) 1904 431895