buffindexed expireover

Mon Oct 11 22:39:13 UTC 1999

Is anyone else having a problem with expireover taking a long time
to run with buffindexed?  On my server with 24G of cnfs spool and 4G
of buffindexed, and about 4.5 million articles, the expireover is
taking 10-12 hours to run.  Plus, during that time, innd will often
stop for long periods of time (up to an hour).

I did some investigation of the code, and found that for each group,
buffindexed:
  1) locks the group
  2) iterates over the overview records for the group, and re-writes
     them all, omitting the records for articles that are no longer in
     in the spool.
  3) unlocks the group

There are two major problems here... one is that this process seems
inefficient.  Could we make it so that instead of re-writing the records,
it makes modifications in-place?  So that expired article records are
removed, but the remaining records are left in-place.

The second problem is that because the group remains locked during
its expire, innd will block when it goes to store an overview record
(when the buffindexed code goes to get a lock).  When it's blocked, it
will not respond to connections, or ctlinnd, and basically appear to
be hung.  Since it takes expireover about an hour just to do control.cancel,
innd will patiently sit there for that hour, waiting to write a cancel
message, and do nothing else.  So what we need is to have the
expiregroup function do a periodic checkpoint, where it could
periodically unlock the group for a bit, and then relock and continue.

If no one else has similar problems, then I might just be running into
limitations of my platform.  Otherwise, I'm willing to try and implement
solutions to either or both of the above problems.

--Heath