Natterings about history files
rra at stanford.edu
Sat Mar 3 21:23:47 UTC 2001
Forrest J Cavalier <mibsoft at epix.net> writes:
> Great observation. You can't assume any correlation between article Date
> header and arrival time.
Well, you can assume *some*. The number of articles dated into the future
is pretty low and I bet that people who wanted maximum history performance
could easily reject any article dated more than about two hours into the
future and not lose anything they care about.
> I thought of that, but forgot to mention something....
> You have to make sure that when you are sweeping entries out of the
> in-memory cache, they go out to the approprate date history file.
Ah... save it by Date header. Hm. That's interesting... but yeah, that
would work fine, now that I think about it.
> In other words, you can't write the entry until you receive the article
> and look at the date header. There are a couple of ways to do this, but
> one way of looking at is there is not one history file you are writing
> to. You probably need 2 or 3 open always, because it is common to get
> articles delayed by a while. Most will go into the "current" day, but
> some will go in the -1 day.
> And you may have to rarely open stuff even earlier.
You also need a different cache model than we have currently if you want
to make sure that you have *every* article you've seen in the past N
hours. Right now, if there happens to be a hash collision in history, it
just kicks the message ID with which it collides out of the cache.
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the inn-workers