Speeding tradspool expiry
rra at stanford.edu
Thu Dec 28 12:37:01 UTC 2000
I want to use a large tradspool spool for my new reader server, so I'm
trying to tackle the slow expiration speed problem that was introduced
when we went to the storage API.
The root problem is that we lost the ability to print out a list of file
names to remove that could be passed to fastrm and instead started to
remove articles with individual SMcancel calls. To add insult to injury,
the tradspool implementation of SMcancel has to open the article and find
the Xref header to figure out what additional paths were linked to the
base article so that they can be removed as well.
We should be able to do better when using group-based expiration, but I'm
still trying to puzzle out the best place to put the hooks.
In my local tree, I've already resurrected the old fastrm code to allow it
to remove files as well as tokens. I've cleaned it up and converted it to
use warn and die and so forth, and have tested it lightly. So that part
The remaining part that needs to be done is to convince OVgroupbasedexire
to write out file names to be removed instead of storage tokens where
appropriate. This affects tradspool primarily, but timehash could also
benefit from fastrm to cut down on directory searches. CNFS and timecaf
should continue to be handled solely via tokens.
One of the things that makes this more complicated is that tradspool needs
the Xref information in order to generate a complete list of files to be
deleted (timehash doesn't). OVgroupbasedexpire grabs that Xref
information from the overview, which will be much faster than tradspool's
habit of retrieving it from the actual article, but it munges it (dropping
the article numbers) in the process of figuring out whether the article
can be expired.
The approach that I've come up with is to modify OVgroupbasedexpire to not
destroy the Xref information while it does its thing, have it pass the
Xref information to OVEXPremove, and then add a new method to each storage
method that, given a token, its Xref information, and a FILE *, writes out
the file names corresponding to that token to that FILE or just the
textual representation of the token if no file applies. (The trash method
would just not print out anything.) This method would only be called if
EXPunlinkfile is non-NULL; if we're not using a separate file of articles
to be removed, we'll continue to use the existing SMcancel mechanism.
That doesn't do much in the way of general cleanup (I think at some point
we should revisit the entire expiration process, but I think it may be
best to wait until after we have a better history mechanism since the
ability to move articles from one token to another would make a lot of
things much simpler), but I think it will solve the immediate problem and
not make anything worse.
Does this sound okay to people, and does anyone have any other ideas?
I'll be keeping the current expiration logic for the time being (so all of
the links for a tradspool article will remain until it expires out of all
of its newsgroups, modulo flags passed to expireover, and then they'll all
be removed at once).
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the inn-workers