8.2.2-p5 keeps dying on linux box

Cheng-Jih Chen postmaster at cjc.org
Tue Jun 27 16:09:32 UTC 2000


Hi, starting on Monday, named on our linux box has started to die.
The versions are:

RedHat linux, running on 2.2.13pre15 kernel.
bind 8.2.2-P5-9 from RedHat

Over the weekend, we upgraded the kernel to 2.2.16, but rolled back to
2.2.13pre15 because of some problems.  bind itself was not obviously
affected by the installation of the new kernel, and had been running
find for months.  8.2.2 was upgraded from Patch 3 to 5 about 4 weeks
ago, without any issue.  Around that time, we changed named to run as a
non-root user.  There have been no real problems, certainly nothing like
named falling over several times a day, sometimes a few times an hour.

I should note that we're running internal and external name daemons on
this box.  Both daemons have gone down, but the internal one, which gets
more traffic, goes down more frequently.

I see nothing obvious in the logs.  This is when the internal named went
down, between 11:49 and 11:55:

Jun 27 11:35:59 gate named[6372]: unapproved update from [10.0.0.239].1338 for randomwalk.com
Jun 27 11:38:29 gate named[6372]: Lame server on 'nightengale.org' (in 'nightengale.org'?): [207.183.249.3].53 'QUIKNET3.QUIKNET.COM'
Jun 27 11:45:50 gate named[6372]: unapproved update from [10.0.0.93].3034 for randomwalk.com
Jun 27 11:54:15 gate named[6372]: "judycasey.com IN NS" points to a CNAME (ns2.psnyc.com)
Jun 27 11:55:01 gate named[6372]: named shutting down

The line at 11:55:01 was me shutting down the external named before
bringing up both of them.

There is no core file.

I turned on more extensive logging for internal, and have the following
at the end of the named.run file.  Since there are no time stamps,
I'm just taking the last few dozen lines:

qremove(0x40125c98)
unsched(0x40125c98, 12)
timers after evClearTimer:
  func 0x8060844, uap (nil), due 962121478.195202000, inter 3600.000000000
  func 0x8071d90, uap (nil), due 962121478.195220000, inter 3600.000000000
  func 0x80608b0, uap (nil), due 962121478.195237000, inter 3600.000000000
  func 0x806d9b4, uap (nil), due 962121478.195228000, inter 3600.000000000
evSetTimer(ctx 0x80d6828, func 0x805b528, uap 0, due 962120993.000000000, inter 0.000000000)
timers after evSetTimer:
  func 0x805b528, uap (nil), due 962120993.000000000, inter 0.000000000
  func 0x8060844, uap (nil), due 962121478.195202000, inter 3600.000000000
  func 0x80608b0, uap (nil), due 962121478.195237000, inter 3600.000000000
  func 0x806d9b4, uap (nil), due 962121478.195228000, inter 3600.000000000
  func 0x8071d90, uap (nil), due 962121478.195220000, inter 3600.000000000
ns_freeqry(0x40125c98)
ns_freeqry: ns NS1.OASIS.NET rcnt 0 (freed)
ns_freeqry: nsdata 207.49.135.3 rcnt 1 (busy)
ns_freeqry: ns NS3.OASISTECH.COM rcnt 1 (busy)
ns_freeqry: nsdata 207.49.137.11 rcnt 2 (busy)
ns_freeqry: ns NS2.OASIS.NET rcnt 0 (freed)
ns_freeqry: nsdata 207.49.135.4 rcnt 1 (busy)
evGetNext: fdCount 0
pselect(22, 0x300020, 0x0, 0x0, 2.206572000)
select() returns 0 (err: none)
pselect(22, 0x300020, 0x0, 0x0, 0.004989000)
select() returns 0 (err: none)
Dispatch.Timer: func 0x805b528, uap 0
retry(0x40143650) id=12
resend(addr=0 n=1) -> [207.49.137.11].53 ds=5 nsid=57287 id=12 12ms
unsched(0x40143650, 12)
retrytime: nstime0ms t4 nretry1 u8 : v8
schedretry(0x40143650, 8 sec)
evSetTimer(ctx 0x80d6828, func 0x805b528, uap 0, due 962121001.000000000, inter 0.000000000)
timers after evSetTimer:
  func 0x805b528, uap (nil), due 962120993.000000000, inter 0.000000000
  func 0x8060844, uap (nil), due 962121478.195202000, inter 3600.000000000
  func 0x805b528, uap (nil), due 962121001.000000000, inter 0.000000000
  func 0x806d9b4, uap (nil), due 962121478.195228000, inter 3600.000000000
  func 0x8071d90, uap (nil), due 962121478.195220000, inter 3600.000000000
  func 0x80608b0, uap (nil), due 962121478.195237000, inter 3600.000000000
timers after evClearTimer:
  func 0x805b528, uap (nil), due 962121001.000000000, inter 0.000000000
  func 0x8060844, uap (nil), due 962121478.195202000, inter 3600.000000000
  func 0x80608b0, uap (nil), due 962121478.195237000, inter 3600.000000000
  func 0x806d9b4, uap (nil), due 962121478.195228000, inter 3600.000000000
  func 0x8071d90, uap (nil), due 962121478.195220000, inter 3600.000000000
evGetNext: fdCount 0
pselect(22, 0x300020, 0x0, 0x0, 7.992822000)
select() returns 1 (err: none)
Dispatch.File: fd 20, mask 0x1, func 0x805de9c, uap 0x4011dcd4
datagram from [10.0.0.253].36524, fd 20, len 40
ns_req(from [10.0.0.253].36524)


Can anyone help?  Thanks.





More information about the bind-users mailing list