Optimization for the expireover procedure.
rra at stanford.edu
Wed Oct 17 00:17:10 UTC 2007
Kirill Berezin <kyb at online.ru> writes:
> As far as I can see the current expiration procedure is based on lookup
> of all articles of a group, that is we look through the list of groups
> and check all articles for a group starting from lowest number. It is a
> good approach, because we remove all ill articles, but it is extremely
> I think that it is possible to speedup an expiration of the overview by
> looking articles through the time of arrival rather look through the
> group. For example, we can keep a list of all incoming articles, just
> like history: time of arrival, group id and article number, sorted by
> the time of arrival, look through this list starting from the oldest
> article to newest one and apply group-based expiry to each article. It
> seems this requires rather simple extension to current storage modules.
> Any suggestions, complaints?
The tradindexed overview method needs to be rebased from time to time, and
if you don't do that during nightly expire, you need to figure out when
you're going to do that and how.
That method doesn't cope with cancels, which may or may not be a problem
Also, expireover isn't actually the slow part. The slow part is the
expireover start Tue Oct 16 07:11:08 ESTEDT 2007:
Article lines processed 2787588
Articles dropped 113463
Overview index dropped 118785
expireover end Tue Oct 16 07:56:46 ESTEDT 2007
lowmarkrenumber begin Tue Oct 16 07:56:46 ESTEDT 2007:
lowmarkrenumber end Tue Oct 16 07:56:46 ESTEDT 2007
expirerm start Tue Oct 16 07:56:49 ESTEDT 2007
expirerm end Tue Oct 16 08:14:58 ESTEDT 2007
expire begin Tue Oct 16 08:15:28 ESTEDT 2007: (-v1)
Article lines processed 2999159
Articles retained 2870176
Entries expired 128983
expire end Tue Oct 16 09:46:13 ESTEDT 2007
all done Tue Oct 16 09:46:13 ESTEDT 2007
expireover is 45 minutes.
Deleting the articles takes 18 minutes.
Rebuilding history takes 91 minutes.
So while optimizing expireover helps some, I think you may be optimizing
the wrong thing. History is the major pain in nightly expire, and it's
not fixable without completely redesigning how history is managed (at
least as far as I can tell).
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
Please send questions to the list rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.
More information about the inn-workers