INN -BETA expire vs. traditional spool

Thu Jun 29 01:44:38 UTC 2000

Charles Fluffie <FLUFFIE at ILOVEYOU.int.tele.dk> writes:

> Is it just me, or does the performance of recent INN BETA snapshots
> absolutely reek when it comes to expire times, where many articles are
> written to traditional one-article-per-file spool?

It's not just you.  Cc'd to inn-workers at isc.org, since I did some
investigation recently.

First off, the current expire stuff has grown and shrunk organically
enough that it's currently extremely difficult to trace through.  At some
point, we should rethink the interface somewhat so that it's easier to
track down all the pieces; right now, there are calls that go into and out
of about five different source files in three different directories to
figure out what's going on.

If you don't use group-based expire with traditional spool, you're pretty
much going to hurt.  The class-based expiration is extremely unfriendly to
traditional spool since it's based on storage API token and there's no
good, clean way of going from storage API token to a list of file names
present on disk (without using overview) without opening up the article
and grovelling through for the Xref line.  This is, unsurprisingly,
exceedingly slow.

If you're using group-based expire, the problem is that expire figures out
whether a given article should be removed and then writes it out to the
batch file for fastrm to deal with, but what it writes out is a storage
API token.  And fastrm, rather than just unlinking files, does a cancel
operation for each article, which for tradspool involves grovelling
through the article to find all of its links and removing them all.  This
is going to take eons for a non-trivial sized tradspool spool.

I want to largely use traditional spool for my new reader server, so I
need to fix this for my own purposes anyway.  So expect this to be fixed
by the end of the summer at the latest, but probably only in 2.4 (2.3
works fine with CNFS and the like, and is functional for tradspool
provided you're not putting a large feed into it, and we really need to
get it released so that people can start using it for the things it's good
at).

What I think needs to happen instead is that more of the expire process
needs to be dependent on the underlying storage implementation.  For
tradspool, what you want to do is exactly what older versions of INN did;
write out the path names for each article you're expiring and then have
fastrm do what it did before, namely just unlink them.  Looking at
tradspool_cancel, the storage backend will deal with that just fine.
fastrm should look at each line and decide from the first character if
it's a path or a storage API token and do the appropriate thing.

Now, the other thing that I noticed in poking around this is that the
current tradspool expiration mechanism breaks traditional semantics for
crossposted articles (in order to make it look more like the way
crossposted articles are handled in the other storage methods).  Obviously
with CNFS, timehash, or the like, there's only one copy of the article on
the spool, so the expiration for that article has to be the longest of the
expiration times of all the groups that it's in.  For tradspool, though,
you really want to unlink each link of the article as it expires
independently out of each group it's crossposted to; otherwise, you get
articles sticking around for a long time in groups with a short expire
time because they're crossposted.

So what I think should happen here is that the guts of OVgroupbasedexpire
should become dependent on the storage method, and that the tradspool code
should look significantly different than the code for the other storage
methods (namely expiring the article independently out of each of the
newsgroups to which it was posted, based on the overview information, and
generating file names for the fastrm file rather than storage API tokens).

Someone else check me on this?  Does this sound like a good approach?

> And when expire chewed through the history text file at the start of the
> expire run, that took a few hours, which was rather more than I
> anticipated.  Plus at the end of the run of ``fast''rm, it seemed to run
> expire over the text file yet again, which I wasn't expecting at all.

Looking at news.daily, I'm pretty confused about the order in which things
are done.  I think I need to stare at that a few more times to figure it
out.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>