inn 2.2 and 2.2.1 dying spontaneously

Fri Nov 19 18:21:13 UTC 1999

hello,

sorry if this is the wrong list for asking questions like
these.

i've been running inn-2.2 on a i386 server (Linux Kernel 2.2.10)
for 160 days without any problems. we have a full feed, and
store articles via CNFS. everything was fine until yesterday -
innd will die spontaneously without leaving a trace in the logfiles.
it keeps dying very often, and there is no
rule, i. e. sometimes it will run for 30 min, the other time it
"quits" after 2 minutes.

i checked the usual suspects as filedescriptor limits and the
like, but none of these seem to be the problem, since
inn will take 4096 FDs and it does not exceed this number.
what bothers me is that i really have no clue what
might be going on, as there is no log message; it simply
stops working:

Nov 19 18:03:02 (none) innd: newsfeed01.univie.ac.at:66 checkpoint seconds 117 accepted 156 refused 90 rejected 44
Nov 19 18:03:03 (none) innd: newsfeed01.univie.ac.at flush
Nov 19 18:03:04 (none) innd: newsfeed01.univie.ac.at:63 NCmode "mode stream" received
Nov 19 18:05:00 (none) innd: SERVER descriptors 4096
Nov 19 18:05:00 (none) innd: SERVER outgoing 4081
Nov 19 18:05:01 (none) innd: SERVER ccsetup control:13
Nov 19 18:05:01 (none) innd: SERVER lcsetup localconn:15
Nov 19 18:05:01 (none) innd: SERVER rcsetup remconn:4
Nov 19 18:05:02 (none) innd: localposts opened localposts:17:file
Nov 19 18:05:03 (none) innd: overview! spawned overview!:19:proc:23029
Nov 19 18:05:04 (none) innd: innfeed! spawned innfeed!:20:proc:23030
Nov 19 18:05:04 (none) innd: controlchan! spawned controlchan!:21:proc:23031
Nov 19 18:05:04 (none) innd: INFLOW opened INFLOW:18:file
Nov 19 18:05:04 (none) innd: SERVER perl filtering enabled
Nov 19 18:05:04 (none) innd: SERVER starting
Nov 19 18:05:04 (none) innd: readme.inode.at connected 62 streaming allowed

it died around 18:03, and innwatch restarted it at 18:05

when i do a strace -ff on innd nothing happens, but
it won't do anything useful either, it will accept the
connection but nothing else. attaching gdb to it
won't do either, it doesn't look like there's a particular
problem spot.

another phenomenon is that occasionally "newgroup"
processes skyrocket, giving the system a load of
about 100, even though i run controlchan.

i would be very grateful if someone could point me
to where i could investigate further. the quick-and-dirty
solution i chose is creating a cron job that will
check if innd is alive, and does a 'inndstart' if it's not.
i did not check if innwatch can be configured to check
more often than every 10 minutes, but this is not a
satisfactory solution, anyway. as you may see, i'm not very
experienced in administrating innd.

any help is appreciated,
-- 
Toni Andjelkovic
toni at telecom.at