Speeding tradspool expiry

Russ Allbery rra at stanford.edu
Thu Dec 28 23:05:27 UTC 2000


Fabien Tassin <fta at sofaraway.org> writes:
> According to Russ Allbery:
 
>> The approach that I've come up with is to modify OVgroupbasedexpire to
>> not destroy the Xref information while it does its thing, have it pass
>> the Xref information to OVEXPremove, and then add a new method to each
>> storage method that, given a token, its Xref information, and a FILE *,
>> writes out the file names corresponding to that token to that FILE or
>> just the textual representation of the token if no file applies.  (The
>> trash method would just not print out anything.)  This method would
>> only be called if EXPunlinkfile is non-NULL; if we're not using a
>> separate file of articles to be removed, we'll continue to use the
>> existing SMcancel mechanism.

> What will happen if an article is requested in the meantime ?  I mean,
> if only the Xref information is available..  I assume that nnrpd should
> be teached about this new OV style.

The overview information would be expired immediately, like always, and
just the deletion of the articles themselves would be deferred.  Since
nnrpd now takes such things as the bounds of newsgroups and the list of
available articles from overview, that should mean that the article would
disappear from the perspective of readers right after expire finished and
would just be removed from the disk some time later.

This change wouldn't actually affect the steps that are currently done;
all of the same things would be done as are being done now.  The only
difference would be that instead of giving fastrm tokens and making it
convert the tokens to file names (a tedious affair for tradspool tokens),
the part of the process that already has access to the necessary
information will do that conversion before writing out the list of things
to delete.

>> Does this sound okay to people, and does anyone have any other ideas?
>> I'll be keeping the current expiration logic for the time being (so all
>> of the links for a tradspool article will remain until it expires out
>> of all of its newsgroups, modulo flags passed to expireover, and then
>> they'll all be removed at once).

> Do you already have an idea of the gain (in time) that this will give ?
> how long for a complete expire/expireover by million of articles ??

I haven't done measurements myself, but from the accounts of the people
who have converted to tradspool in 2.3 from earlier old traditional spool
code, the difference in expire times is something like an order of
magnitude.  Given the code, I can readily believe that; not only are we
not using the optimized fastrm procedure now and instead doing something
that's much closer to xargs rm (which is what fastrm was designed to
replace and speed up), but we're opening and parsing every tradspool
article before removing it.  That takes a *huge* amount of time.

This change should pretty much put the speed of the fastrm step back at
where it was with the old traditional spool code, at the cost of
marginally more CPU time during expireover.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>



More information about the inn-workers mailing list