Optimization for the expireover procedure.

Wed Oct 17 00:20:05 UTC 2007

Kirill Berezin <kyb at online.ru> writes:

> At the moment I am sure not. Current implementation of overview database
> is optimized for user access only, that is the key for any search
> operation is a group number and an article number. But in the case of
> expiry the main keys are time of the arrival and group number. In the
> current implementation we have to retrieve every overview record and use
> arrival time stored in it. For example there are about 15 millions
> overview records in our server, number of expired articles is about 2
> millions, so we have to analyze extra 13 millions before removal. This
> is not good for me.

The overview database is heavily optimized for reads.  Looking at lots of
articles is fast.

What makes it slow is rebasing the overview files for tradindexed and
thereby rewriting the .DAT files, not the scanning of all the articles.
This is an inherent performance drawback to the very simple data structure
that they use.  I'm fairly sure that there are better data structures that
would be faster, but tradindexed is also very robust, and I wouldn't want
to lose that.

> My proposal is to have a separate storage for the expiry procedure. The
> structure of this storage must be optimized to perform expiration as
> fast as possible. For example we can use a list of pointers to overview
> records sorted according to arrival date or even expiration date ( this
> is a little bit tricky). I believe this will be much more faster.

This won't make it any faster if you still have to rebase the tradindexed
data structures.

If you're using some other overview method, I can't really comment, since
I don't know much about their internals.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.