Optimization for the expireover procedure.

Russ Allbery rra at stanford.edu
Wed Oct 17 00:17:10 UTC 2007

Kirill Berezin <kyb at online.ru> writes:

> As far as I can see the current expiration procedure is based on lookup
> of all articles of a group, that is we look through the list of groups
> and check all articles for a group starting from lowest number. It is a
> good approach, because we remove all ill articles, but it is extremely
> slow.

> I think that it is possible to speedup an expiration of the overview by
> looking articles through the time of arrival rather look through the
> group. For example, we can keep a list of all incoming articles, just
> like history: time of arrival, group id and article number, sorted by
> the time of arrival, look through this list starting from the oldest
> article to newest one and apply group-based expiry to each article. It
> seems this requires rather simple extension to current storage modules.

> Any suggestions, complaints?

The tradindexed overview method needs to be rebased from time to time, and
if you don't do that during nightly expire, you need to figure out when
you're going to do that and how.

That method doesn't cope with cancels, which may or may not be a problem
for you.

Also, expireover isn't actually the slow part.  The slow part is the
history rebuild.

expireover start Tue Oct 16 07:11:08 ESTEDT 2007:
    Article lines processed  2787588
    Articles dropped          113463
    Overview index dropped    118785
expireover end Tue Oct 16 07:56:46 ESTEDT 2007
lowmarkrenumber begin Tue Oct 16 07:56:46 ESTEDT 2007:
lowmarkrenumber end Tue Oct 16 07:56:46 ESTEDT 2007
        expirerm start Tue Oct 16 07:56:49 ESTEDT 2007
        expirerm end Tue Oct 16 08:14:58 ESTEDT 2007
expire begin Tue Oct 16 08:15:28 ESTEDT 2007: (-v1)
    Article lines processed  2999159
    Articles retained        2870176
    Entries expired           128983
expire end Tue Oct 16 09:46:13 ESTEDT 2007
        all done Tue Oct 16 09:46:13 ESTEDT 2007

expireover is 45 minutes.
Deleting the articles takes 18 minutes.
Rebuilding history takes 91 minutes.

So while optimizing expireover helps some, I think you may be optimizing
the wrong thing.  History is the major pain in nightly expire, and it's
not fixable without completely redesigning how history is managed (at
least as far as I can tell).

Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.

More information about the inn-workers mailing list