expire info in history when using CNFS
Rich Skrenta
skrenta at textport.net
Fri Jan 19 03:36:20 UTC 2001
> Yes, I understand that expiration data would
> be needed to allow /remember/ to work precisely, but... what if
> history were to "cyclicalized" as well? (e.g., pick a fixed
> size history database, and then reuse entries when that max
> size has been reached)
How about something like this:
Use a mask of MD5(message-ID) to address into a file array of 512
byte blocks. The number of blocks must be a power of 2. Each block
contains an offset to the next hash within the block to be written
(0-31), followed by 32 MD5's.
To see if a message ID is in the file, read the block addressed by the
hash of the message ID and linearly scan for the message ID. 4 bytes of
the hash may be compared at time; fail scan would be 32 or so int compares.
To add a new message ID, read the block, store the new hash, increment
the pointer, and write the block.
The file can be mmaped to avoid read/writes and for speed.
Locking (esp for a single writer, multiple readers) can be avoided;
readers can ignore the offset, and a partial write of a hash would be
unlikely to match a real hash.
A 134MB file could hold 8 million message-IDs this way, quasi-cyclicly.
A 262MB file could hold 16 million.
If my stats were better I could say the odds of a bucket wrapping in
half the expected time. I suspect it is quite small, like the odds of
16 hashes in a row going to the same bucket or something, which should
be very unlikely.
More information about the inn-workers
mailing list