expire info in history when using CNFS

Mon Jan 22 08:07:12 UTC 2001

Rich Skrenta <skrenta at textport.net> writes:

> Use a mask of MD5(message-ID) to address into a file array of 512 byte
> blocks.  The number of blocks must be a power of 2.  Each block contains
> an offset to the next hash within the block to be written (0-31),
> followed by 32 MD5's.

> To see if a message ID is in the file, read the block addressed by the
> hash of the message ID and linearly scan for the message ID.  4 bytes of
> the hash may be compared at time; fail scan would be 32 or so int
> compares.

> To add a new message ID, read the block, store the new hash, increment
> the pointer, and write the block.

> The file can be mmaped to avoid read/writes and for speed.

That's basically the structure of the existing history.hash file.  :)  The
only difference between the current structure and your proposed structure
is that it's currently not block-oriented and instead just uses straight
linear probing (which I think is actually more efficient space-wise),
although it's a bit slower to write if you have to search forward for a
slot to write into.

> Locking (esp for a single writer, multiple readers) can be avoided;
> readers can ignore the offset, and a partial write of a hash would be
> unlikely to match a real hash.

> A 134MB file could hold 8 million message-IDs this way, quasi-cyclicly.
> A 262MB file could hold 16 million.

However, you've dropped the storage token out of the history file, which
makes article retrieval by message ID impossible.  You have to keep the
storage token, which reduces the number of message IDs you can handle.

Apart from that, it's basically a design for a binary history format,
which is definitely something I want to try after the new history API
stuff is in place (which I'm still thinking about in the background while
I'm catching up on pending INN mail).

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>