Bind 9.2.0 and 9.2.1 stop resolving external IPs after a bit. GOT IT FIXED

Sun Oct 3 23:21:05 UTC 2004

At 02:27 AM 9/30/2004, Eddie wrote:
>Let me say thank you for everyones help.
>
>Yesterday, I was sitting in the server room, and noticed the failed
>request count going up. Here we go again. This is the first time it's done
>this when I am on site so I can debug. About time.
>I dumped the packets and saw many request to the root servers going out,
>but nothing returning.  So this time, I did a tcpdump on the external
>side of the NAT/firewall (Linux) box. Strange, I saw tons of DNS request
>to the root servers, all comming from the wrong server on the network.
>That's the backup server, it does not run bind. All it does is samba and
>ntpd. So I did a tcpdump on it and watched a bunch of stuff with the root
>servers sending data to it. On port 123 no less.
>I went into "I have been HACKED" panic mode and checked services and shut
>down programs. Still data.  So I killed the network interface. Still
>data.

port 123 is the ntpd port. If there is still an issue related to ntp please
post it in comp.protocols.time.ntp though it sounds like bad memory.

Danny

>Anyway, after spending a good hour watching packet, I figured out that my
>NAT/firewall box has bad memory or some bug that once a week, it blows up
>the masq table and changes the "from" address of the DNS server, to the
>backup server. So any DNS request sent from the DNS server, are turned to
>the backup server. This is the strangest thing I have ever seen.
>I rebooted the firewall and now all is happy, but I am changing out that
>computer with a nice 486 with no floating point bug. :)
>
>Thanks for all help. Not a Bind bug after all. This is sure going on my
>wall-o-weird.
>
>Eddie
>
>
>
>
>On Sun, 01 Dec 2002 22:45:20 -0800, Mark_Andrews wrote:
>
>
>
> >> My primary DNS server is up to date on the latest RH patches. It runs
> >> Bind 9.2.1. The backup DNS server has not been updated yet and runs
> >> 9.2.0. It suffers the same problem, but since it's not under load, the
> >> problem does not show itself until the primary DNS fails for a bit.
> >>
> >> As for making the root name servers mad, I did a packet capture when
> >> Bind is running correctly. Looking at it in ethereal, I see an A query
> >> to a.gtld-servers.net. a.gtld-servers.net  respoinds back with
> >> "Standard Query Responce, Format Error"
> >>
> >> The request is made again, and this time it works. I see a lot of
> >> "Format" errors in my packet capture and this is when Bind is working.
> >
> >       FORMERR's are responses to EDNS probes.  Named re-tries w/o EDNS.
> >
> >       Everything sounds like normal.
> >
> >> When Bind quit working last, I did a quick tcpdump and noticed that it
> >> was sending request out, but nothing was coming back. I did not get a
> >> chance to do a packet capture or a little sniffing on the external side
> >> of the firewall, but the backup DNS server was running fine at the time
> >> so I don't think it's firewall or network related. It was just like the
> >> root name servers stopped talking to it. Restarting Bind fixed the
> >> problem. Next time it goes out, I will be ready.
> >
> >       I was on a doubly NAT'd net the other day and observed the behaviour.
> >       As this was in a hotel conference room it wasn't worth expending time
> >       and effort to chase the problem down.  Note however the first NAT box
> >       was Linux based.
> >
> >       Restarting named causes named to use a different source port which
> >       would allow the NAT to clear state.
> >
> >       I would be taking packet traces from the outside of the firewall next
> >       time it fails.
> >
> >> Thanks for the tip o the source rpms. When it dies again, I will try
> >> that.
> >>
> >> Thanks
> >> Ed
> >>
> >>
> >       Mark
> > --
> > Mark Andrews, Internet Software Consortium 1 Seymour St., Dundas Valley,
> > NSW 2117, Australia PHONE: +61 2 9871 4742                 INTERNET:
> > Mark.Andrews at isc.org