Optimization for the expireover procedure.

Kirill Berezin kyb at online.ru
Wed Oct 17 08:17:41 UTC 2007


Russ Allbery ?????:
> Kirill Berezin <kyb at online.ru> writes:
>
>   
>> As far as I can see the current expiration procedure is based on lookup
>> of all articles of a group, that is we look through the list of groups
>> and check all articles for a group starting from lowest number. It is a
>> good approach, because we remove all ill articles, but it is extremely
>> slow.
>>     
>
>   
>> I think that it is possible to speedup an expiration of the overview by
>> looking articles through the time of arrival rather look through the
>> group. For example, we can keep a list of all incoming articles, just
>> like history: time of arrival, group id and article number, sorted by
>> the time of arrival, look through this list starting from the oldest
>> article to newest one and apply group-based expiry to each article. It
>> seems this requires rather simple extension to current storage modules.
>>     
>
>   
>> Any suggestions, complaints?
>>     
>
> The tradindexed overview method needs to be rebased from time to time, and
> if you don't do that during nightly expire, you need to figure out when
> you're going to do that and how.
>
> That method doesn't cope with cancels, which may or may not be a problem
> for you.
>
> Also, expireover isn't actually the slow part.  The slow part is the
> history rebuild.
>
> expireover start Tue Oct 16 07:11:08 ESTEDT 2007:
>     Article lines processed  2787588
>     Articles dropped          113463
>     Overview index dropped    118785
> expireover end Tue Oct 16 07:56:46 ESTEDT 2007
> lowmarkrenumber begin Tue Oct 16 07:56:46 ESTEDT 2007:
> lowmarkrenumber end Tue Oct 16 07:56:46 ESTEDT 2007
>         expirerm start Tue Oct 16 07:56:49 ESTEDT 2007
>         expirerm end Tue Oct 16 08:14:58 ESTEDT 2007
> expire begin Tue Oct 16 08:15:28 ESTEDT 2007: (-v1)
>     Article lines processed  2999159
>     Articles retained        2870176
>     Entries expired           128983
> expire end Tue Oct 16 09:46:13 ESTEDT 2007
>         all done Tue Oct 16 09:46:13 ESTEDT 2007
>
> expireover is 45 minutes.
> Deleting the articles takes 18 minutes.
> Rebuilding history takes 91 minutes.
>
> So while optimizing expireover helps some, I think you may be optimizing
> the wrong thing.  History is the major pain in nightly expire, and it's
> not fixable without completely redesigning how history is managed (at
> least as far as I can tell).
>
>   
Just check this one.

expireover start Wed Oct 17 03:03:01 MSD 2007: ( -z/news/log/expire.rm -Z/news/log/expire.lowmark)
expireover end Wed Oct 17 11:45:37 MSD 2007
lowmarkrenumber begin Wed Oct 17 11:45:37 MSD 2007: (/news/log/expire.lowmark)
lowmarkrenumber end Wed Oct 17 11:45:38 MSD 2007
expire begin Wed Oct 17 11:46:11 MSD 2007: (-v1)
    Article lines processed 14841845
    Articles retained       12560464
    Entries expired          2281381
expire end Wed Oct 17 11:53:27 MSD 2007
	all done Wed Oct 17 11:53:28 MSD 2007

8 hours to expireover and a lot of missing articles due to feeder disk space limits. We are using ovdb. We tried buffindexed, but there were a lot of mmaps over a large area, and as a result it was run even slower.

I believe the use of extra index can greatly improve performance. And may be a history will our next obstacle. 






More information about the inn-workers mailing list