Natterings about history files

Russ Allbery rra at stanford.edu
Sat Mar 3 21:23:47 UTC 2001


Forrest J Cavalier <mibsoft at epix.net> writes:

> Great observation. You can't assume any correlation between article Date
> header and arrival time.

Well, you can assume *some*.  The number of articles dated into the future
is pretty low and I bet that people who wanted maximum history performance
could easily reject any article dated more than about two hours into the
future and not lose anything they care about.

> I thought of that, but forgot to mention something....

> You have to make sure that when you are sweeping entries out of the
> in-memory cache, they go out to the approprate date history file.

Ah... save it by Date header.  Hm.  That's interesting... but yeah, that
would work fine, now that I think about it.

> In other words, you can't write the entry until you receive the article
> and look at the date header.  There are a couple of ways to do this, but
> one way of looking at is there is not one history file you are writing
> to. You probably need 2 or 3 open always, because it is common to get
> articles delayed by a while.  Most will go into the "current" day, but
> some will go in the -1 day.

> And you may have to rarely open stuff even earlier.

Yup.

You also need a different cache model than we have currently if you want
to make sure that you have *every* article you've seen in the past N
hours.  Right now, if there happens to be a hash collision in history, it
just kicks the message ID with which it collides out of the cache.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>


More information about the inn-workers mailing list