Bind 8.2.2 P5 hanging up...

Michael Milligan milli at acmebw.com
Wed Feb 9 22:58:15 UTC 2000


>
> > Jim Reid wrote:
> > >
> > >     >> Feb 3 15:48:08 (none) named[43]: Cleaned cache of 41 RRsets.
> > >     >> Feb 3 16:30:18 (none) named[43]: ns_req:
sendto([172.23.9.2].137):
> > Connection refused
> > >
> > > I certainly missed this first time around.
> > >
> > > The "connection refused" report is interesting.
> > ...
> > >
> > > Now the detail in the log message indicates that the name server got
> > > this error when it sent an answer to port 137 of IP address
> > > 172.23.9.2. ie Something at 172.23.9.2 sent a query with the source
> > > port set to 137, but by the time the name server sent a reply back
> > > there was nothing using that port number.
> > ...
> > >
> > > So there probably isn't a problem with the name server at all.
> >
> > There certainly, and likely, is a problem with the name server.  This
will
> > happen if the name server hangs and the remote end gives up (times out)
> > waiting for a response and/or moves on to another server and gets an
answer.
> > That's what is likely happening with this and other similar reports.
> > Getting to the bottom of why the name server is hanging is proving
tough,
> > but is only happening on Linux platforms, AFAIK.  I have not been able
to
> > recreate this on my servers, but some of my customers are experiencing
this.
> >
> > Mark [Andrews], any new specifics about syslog blocking/hanging on Linux
> > flavors?  Kernel and/or syslogd "bug"?  You had mentioned in an earlier
post
> > that that might be the case.
>
> Below is the RH security advisary w.r.t. syslog{d}.  Other
> Linux vendors have issued similar.
> http://www.redhat.com/support/errata/RHSA1999055-01.6.0.html
> We have seen system call traces where named is frozen sending
> a message to syslog.

Thanks Mark.

My customer (using Red Hat 6.0) upgraded sysklogd but was still having the
problem.  They have fixed it by logging directly to files instead of using
syslog.  The lockups were occuring at a specific time on the weekend (4am
Sunday) which corresponds to the point where all the various log files are
rolled and syslog is restarted several times in quick succession.
Obviously, this is Red Hat specific.  (I'm not seeing either problem on SuSE
6.[013]).  I'll of course upgrade sysklogd as appropriate for the DoS issue.

>
> "connection refused"s usually just means that the client
> has timed out, you are still running a old version of bind
> that does not have the SO_LINGER call disabled (the current
> version is #ifdef DO_SO_LINGER, not #ifdef SO_LINGER, see
> ns_main.c), you are being used as a amplifier (middle man)
> in a DOS attack [AL-1999.004] or someone has found a new
> way to stall the server.

Hmm, that might just be it.

>
> A small number of "connection refused"s are *normal* as
> the nameserver has longer timeouts than the clients,
> servers/links may be down, etc.

Right.  In this case, it seems to me that a stop in log activity for a few
minutes, then a whole rash of "connection refused"s indicates to me that
BIND is getting stalled up somewhere.

Thanks again.

Regards,
Mike

--
Michael Milligan - Acme Byte & Wire LLC - milli at acmebw.com





More information about the bind-users mailing list