Optimization for the expireover procedure.
kyb at online.ru
Wed Oct 17 08:17:41 UTC 2007
Russ Allbery ?????:
> Kirill Berezin <kyb at online.ru> writes:
>> As far as I can see the current expiration procedure is based on lookup
>> of all articles of a group, that is we look through the list of groups
>> and check all articles for a group starting from lowest number. It is a
>> good approach, because we remove all ill articles, but it is extremely
>> I think that it is possible to speedup an expiration of the overview by
>> looking articles through the time of arrival rather look through the
>> group. For example, we can keep a list of all incoming articles, just
>> like history: time of arrival, group id and article number, sorted by
>> the time of arrival, look through this list starting from the oldest
>> article to newest one and apply group-based expiry to each article. It
>> seems this requires rather simple extension to current storage modules.
>> Any suggestions, complaints?
> The tradindexed overview method needs to be rebased from time to time, and
> if you don't do that during nightly expire, you need to figure out when
> you're going to do that and how.
> That method doesn't cope with cancels, which may or may not be a problem
> for you.
> Also, expireover isn't actually the slow part. The slow part is the
> history rebuild.
> expireover start Tue Oct 16 07:11:08 ESTEDT 2007:
> Article lines processed 2787588
> Articles dropped 113463
> Overview index dropped 118785
> expireover end Tue Oct 16 07:56:46 ESTEDT 2007
> lowmarkrenumber begin Tue Oct 16 07:56:46 ESTEDT 2007:
> lowmarkrenumber end Tue Oct 16 07:56:46 ESTEDT 2007
> expirerm start Tue Oct 16 07:56:49 ESTEDT 2007
> expirerm end Tue Oct 16 08:14:58 ESTEDT 2007
> expire begin Tue Oct 16 08:15:28 ESTEDT 2007: (-v1)
> Article lines processed 2999159
> Articles retained 2870176
> Entries expired 128983
> expire end Tue Oct 16 09:46:13 ESTEDT 2007
> all done Tue Oct 16 09:46:13 ESTEDT 2007
> expireover is 45 minutes.
> Deleting the articles takes 18 minutes.
> Rebuilding history takes 91 minutes.
> So while optimizing expireover helps some, I think you may be optimizing
> the wrong thing. History is the major pain in nightly expire, and it's
> not fixable without completely redesigning how history is managed (at
> least as far as I can tell).
Just check this one.
expireover start Wed Oct 17 03:03:01 MSD 2007: ( -z/news/log/expire.rm -Z/news/log/expire.lowmark)
expireover end Wed Oct 17 11:45:37 MSD 2007
lowmarkrenumber begin Wed Oct 17 11:45:37 MSD 2007: (/news/log/expire.lowmark)
lowmarkrenumber end Wed Oct 17 11:45:38 MSD 2007
expire begin Wed Oct 17 11:46:11 MSD 2007: (-v1)
Article lines processed 14841845
Articles retained 12560464
Entries expired 2281381
expire end Wed Oct 17 11:53:27 MSD 2007
all done Wed Oct 17 11:53:28 MSD 2007
8 hours to expireover and a lot of missing articles due to feeder disk space limits. We are using ovdb. We tried buffindexed, but there were a lot of mmaps over a large area, and as a result it was run even slower.
I believe the use of extra index can greatly improve performance. And may be a history will our next obstacle.
More information about the inn-workers