OVDB corruption while running expireover
Ben Rosengart
br+inn at panix.com
Thu Mar 28 17:04:19 UTC 2002
We've had a problem where expireover takes a long time; longer than a
day, in fact, and we end up with several running at once.
While trying to debug this, I disabled news.daily in cron and ran
it from the command line:
/news/bin/news.daily expireover expireoverflags='-N' delayrm
Groupbaseexpiry is set to "true". Expire.ctl looks like this:
/remember/:.1
*:A:1:never:never
At 2:30 AM, something went drastically wrong, as can be seen in these
entries from news.notice:
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: Unable to allocate
8247 bytes from mpool shared region: Cannot allocate memory
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: unable to
create/retrieve page 6 8080
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: PANIC: Input/output error
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: expiregroup: c_get:
Cannot allocate memory
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: groupnum: get:
DB_RUNRECOVERY: Fatal error, run database recovery
Mar 28 02:32:56 reader2 last message repeated 9 times
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: delete_old_stuff:
groupstats->cursor: DB_RUNRECOVERY: Fatal error, run database recovery
One weird thing is that it looks like expireover ran to completion.
Here's expire.log:
expireover start Wed Mar 27 21:14:41 EST 2002: (-N
-z/news/log/expire.rm)
Article lines processed 16200254
Articles dropped 629
Overview index dropped 629
expireover end Thu Mar 28 02:32:57 EST 2002
expirerm start Thu Mar 28 02:32:57 EST 2002
expirerm end Thu Mar 28 02:32:57 EST 2002
expire begin Thu Mar 28 02:33:27 EST 2002: (-v1)
Can't reserve server
Article lines processed 0
Articles retained 0
Entries expired 0
Old entries dropped 0
Old entries retained 0
expire end Thu Mar 28 02:33:27 EST 2002
all done Thu Mar 28 02:33:27 EST 2002
expireover start Thu Mar 28 11:02:51 EST 2002: (-N -z/news/log/expire.rm)
The first error in news.notice looks like an out-of-memory problem.
But we have the limits set fairly high:
cpu time (seconds) unlimited
file size (blocks) unlimited
data seg size (kbytes) 819200
stack size (kbytes) 2048
core file size (blocks) unlimited
resident set size (kbytes) 1022316
locked-in-memory size (kb) 340772
processes 1000
file descriptors 1024
If it is running up against the data seg limit, then we have a
problem -- we can't set it much higher, we only have 1 gigabyte of
physical RAM.
What do you think is going on here?
--
Ben Rosengart (212) 741-4400 x215
1. A robot may not injure entertainment industry profits, or, through inaction,
allow entertainment industry profits to come to harm. --Matt McLeod
More information about the inn-workers
mailing list