OVDB corruption while running expireover

Ben Rosengart br+inn at panix.com
Thu Mar 28 17:04:19 UTC 2002


We've had a problem where expireover takes a long time; longer than a
day, in fact, and we end up with several running at once.

While trying to debug this, I disabled news.daily in cron and ran
it from the command line:

  /news/bin/news.daily expireover expireoverflags='-N' delayrm

Groupbaseexpiry is set to "true".  Expire.ctl looks like this:

  /remember/:.1
  *:A:1:never:never

At 2:30 AM, something went drastically wrong, as can be seen in these
entries from news.notice:

Mar 28 02:32:56 reader2 expireover[3319]: OVDB: Unable to allocate
8247 bytes from mpool shared region: Cannot allocate memory 
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: unable to
create/retrieve page 6 8080
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: PANIC: Input/output error
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: expiregroup: c_get:
Cannot allocate memory
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: groupnum: get:
DB_RUNRECOVERY: Fatal error, run database recovery
Mar 28 02:32:56 reader2 last message repeated 9 times
Mar 28 02:32:56 reader2 expireover[3319]: OVDB: delete_old_stuff:
groupstats->cursor: DB_RUNRECOVERY: Fatal error, run database recovery

One weird thing is that it looks like expireover ran to completion.
Here's expire.log:

expireover start Wed Mar 27 21:14:41 EST 2002: (-N
-z/news/log/expire.rm)
    Article lines processed 16200254
    Articles dropped             629
    Overview index dropped       629
expireover end Thu Mar 28 02:32:57 EST 2002
        expirerm start Thu Mar 28 02:32:57 EST 2002
        expirerm end Thu Mar 28 02:32:57 EST 2002
expire begin Thu Mar 28 02:33:27 EST 2002: (-v1)
    Can't reserve server
    Article lines processed        0
    Articles retained              0
    Entries expired                0
    Old entries dropped            0
    Old entries retained           0
expire end Thu Mar 28 02:33:27 EST 2002
        all done Thu Mar 28 02:33:27 EST 2002
expireover start Thu Mar 28 11:02:51 EST 2002: (-N -z/news/log/expire.rm)

The first error in news.notice looks like an out-of-memory problem.
But we have the limits set fairly high:

cpu time (seconds)         unlimited
file size (blocks)         unlimited
data seg size (kbytes)     819200
stack size (kbytes)        2048
core file size (blocks)    unlimited
resident set size (kbytes) 1022316
locked-in-memory size (kb) 340772
processes                  1000
file descriptors           1024

If it is running up against the data seg limit, then we have a
problem -- we can't set it much higher, we only have 1 gigabyte of
physical RAM.

What do you think is going on here?

-- 
Ben Rosengart     (212) 741-4400 x215

1. A robot may not injure entertainment industry profits, or, through inaction,
   allow entertainment industry profits to come to harm.     --Matt McLeod


More information about the inn-workers mailing list