I would really like to see expire gone

Wed Jan 29 08:51:43 UTC 2003

Russ Allbery <rra at stanford.edu> writes:

> bill davidsen <davidsen at tmr.com> writes:
> 
> >   Has someone looked into this, and if so are there any notes on why it
> > isn't (shouldn't be) done with a database allowing nice gradual
> > deletions as the articles vanish, and no changes in service centered on
> > one part of the day?
> 
> Figure out how to do it with acceptable performance, and I think it's a
> great idea.  We have a history API that should allow experiments with new
> history formats without breaking the world.
> 

We'd need to figure a way to communicate the article expires into the
DB which I think would mean needing to pull an article from CNFS for
every one you write.

> I've not managed to figure out how to do it with acceptable performance.
> My intuition says that using something like BerkeleyDB will be *far* too
> slow.  But the BerkeleyDB folks didn't really agree with me, so I could
> well be wrong.  Someone may have to run an experiment to see.
> 

I started doing some testing (which broke overnight as the berkeley DB
I built had broken large file support).

I had a history file with ~46M entries on a Sun E250 w/ 1 250MHz CPU,
4 disks in a RAID 1 stripe and 1.5GB of RAM (with 1.25GB given over to
the cache).

I used a single database w/ the binary hash as the key and a binary
structure for the data with the 3 time stamps and the token.

The first ~15M articles inserted in ~30 minutes, then the cache ran
out and the performance slowed to the speed of the disks (one
read/write per insert). It got as far as 24M articles before it
overflowed the 2GB mark which took ~15 hours.

So long as we can parallelise article reception with writing to disk,
that sounds OK to me (its not as if the machine I was using is
particularly "hot").

I'm going to fix my berkeley DB build and move the history file which
is being read from off of the device (and play with directio and
different block sizes) and try again.

Something else which occurred whilst I was testing - I started of
thinking I'd just use the history traversal routines, but that doesn't
give you the message id, indeed we don't have the underlying message
ID in the current history implementation. In the future I'd be really
tempted to change the history interface (and by implication the rest
of innd) so it only real deals with message ID hashes.

-- 
Alex Kiernan, Principal Engineer, Development, THUS plc