write(...) to stats file: -1 EFBIG (File too large)

Fri Sep 16 15:35:47 UTC 2005

Hello Inn workers ...

I've encountered a problem that I had never seen before, and although I
believe I've successfully worked around it, I don't doubt that this
could happen again, so I'm hoping for some advice.

The news server is Inn-2.4.2 on Linux-2.4.26 (Slackware-9.1, plus
patches).  It mostly runs beautifully, except for the fact that innd very
occasionally and apparently randomly dies.

That isn't the problem (though perhaps I should report it separately),
though, as I have been able to simply restart it when it dies and life
goes on until the next time.  This got to be such a simple matter of
restarting Inn, that I modified the Innwatch script to restart innd
before sending its alert that Inn is not running.  That has been in use
and working fine for the past two months, approximately.

Last night at approximately 21:08 local time, though, I ran into a
problem where Innd had died, and no amount of restarting it would result
in it continuing to run properly again.  I didn't look very closely into
the problem, though, until I got to work this morning.

At first glance, the service would start up fine, but then as soon as one
of my peers attempted to go into streaming mode, the process would die.
After checking some of the simpler things, none of which provided any
useful results, I decided to trace the innd process, to get a clear
idea of what it did just before whatever condition that was causing it
to die was triggered, and I'm glad I did, because that made resolving
the problem, at least in the short term, very simple.

The last lines in my trace output are as follows:

write(17, "@05000000054300048E330000000000000000@ 3230 feed.peer.edu \n",
    63) = -1 EFBIG (File too large)
--- SIGXFSZ (File size limit exceeded) @ 0 (0) ---

Another quick check showed that file descriptor 17 was the /news/log/stats
file, and sure enough, the file had grown to be 2GB in size!  (in fact
its time stamp is how I know when this problem started.)

As a work-around, I renamed that file, restarted Inn, and it has been
working fine since.  I imagine that the stats in my next news.daily
report will be skewed by the missing data, but I can live with that if
it means that the news service is running properly again.

My question, though, is what should the proper, long-term solution to
this be?  Is there a way I can have the kernel accept larger than 2GB
files? (the file system is ext3, and as far as I know, 2GB is a hard
limit)  Would running news.daily more frequently reduce the chance that
this can happen again?  Are there better approaches to resolving this
than the one I took, or those I'm considering?

Thanks for any advice.

-- 
----------------------------------------------------------------------
Sylvain Robitaille                              syl at alcor.concordia.ca

Systems analyst / Newsmaster                      Concordia University
Instructional & Information Technology        Montreal, Quebec, Canada
----------------------------------------------------------------------