expire info in history when using CNFS

Rich Skrenta skrenta at textport.net
Fri Jan 19 03:36:20 UTC 2001


> Yes, I understand that expiration data would
> be needed to allow /remember/ to work precisely, but... what if
> history were to "cyclicalized" as well? (e.g., pick a fixed
> size history database, and then reuse entries when that max
> size has been reached)

How about something like this:

Use a mask of MD5(message-ID) to address into a file array of 512
byte blocks.  The number of blocks must be a power of 2.  Each block
contains an offset to the next hash within the block to be written
(0-31), followed by 32 MD5's.

To see if a message ID is in the file, read the block addressed by the
hash of the message ID and linearly scan for the message ID.  4 bytes of
the hash may be compared at time; fail scan would be 32 or so int compares.

To add a new message ID, read the block, store the new hash, increment
the pointer, and write the block.

The file can be mmaped to avoid read/writes and for speed.

Locking (esp for a single writer, multiple readers) can be avoided;
readers can ignore the offset, and a partial write of a hash would be
unlikely to match a real hash.

A 134MB file could hold 8 million message-IDs this way, quasi-cyclicly.
A 262MB file could hold 16 million.

If my stats were better I could say the odds of a bucket wrapping in
half the expected time.  I suspect it is quite small, like the odds of
16 hashes in a row going to the same bucket or something, which should
be very unlikely.




More information about the inn-workers mailing list