Natterings about history files
Forrest J. Cavalier III
mibsoft at epix.net
Wed Feb 28 22:21:00 UTC 2001
(Second send from the subscribed account this time. oops!)
While people are throwing out ideas, I'll throw in
mine. (I'd love to see someone fund me doing it. Hint.
Hint. This is the kind of contract work I do.)
-----------------------------------------------------------
Rolling history files are very nice, except the downside
pointed out by Russ. But I think that can be solved
this way:
1. Keep an N hour cache of articles offered/accepted. (N==4
should be about right, I think, but it can be larger.)
2. When offered, check ONLY the cache, not the history file
for that message ID. If not found, tell the peer to
send it. (HISlookup times will be VERY short.)
You say: "Wait! That causes duplicates!"
Hold on there, I'm not finished....
3. When you finally do get the article, get the date header. If
it was in the last N hours, you presume the lookup from
the cache was correct. Store it.
If the article date header shows it is older than N hours,
do the full lookup. Check the history file corresponding to
that date. (This can be done by a downstream process,
actually, since the number of articles you get this way
is presumably small.)
nnrpd lookup by message ID should change to be a look
through the overview file, and then a look through every
per-date file.
----------------------------------------------------------
I have done the on-paper planning enough to see that the
rest of the pieces fall into place. But I won't explain
them in detail here. Just an outline:
To cancel/expire an article, you just write a hole, leaving
the file offset position of all articles in the history
intact. This means you can do on-the-fly expire, instead of
one big grind-to-a-crawl nightly process.
Eventually, at /remember/ you just delete a per-date history
file, as long as it is entirely empty.
If I recall, even the INN 2.2 style overview gets WAY nicer,
because you have history files that are "addressable" and
don't change. But we don't use that overview any more anyway.
The downside of having holes in the history file is
more required storage. But already INN needs 2X the
history file for temp storage during expire, so it's
a big win.
For long-lived archives, it still would be possible to
rewrite history files and reclaim space. IMPORTANT: dbz
is not going to work for this. Deleting a key from that
kind of hashed index (linear probing on collision insert!
blech!) is not acceptable, it breaks a data structure
invariant condition.
------------------------------------------------------
Forrest
More information about the inn-workers
mailing list