pure transit server
rra at stanford.edu
Sat Mar 3 02:10:56 UTC 2001
Kai Henningsen <kaih at khms.westfalen.de> writes:
> As for MD5 not being slow, well, compared to a lot of other hash
> functions it certainly is.
It's not *horrible*... see the table at the end of:
although that's MD4; IIRC, MD5 is slightly slower. The problem is that it
has a large constant factor, so its worst case is short strings, namely
exactly what we're doing. For longer data, like entire files, it's
actually pretty competative.
The advantages of MD5 are that it's widely used, widely analyzed, has no
known major flaws for this purpose, produces long hashes without requiring
host support for >32-bit data types, and is readily available as freely
reusable source. I'd be happy to try out some other hashes, but it would
be good if whatever we used had those properties.
As for a test set of message IDs, seems like the easiest thing to do would
be for someone who's taking a full feed to dump all the message IDs they
see into a file for a month or so and then stick that somewhere... it's
likely to be a pretty huge file, though, even compressed.
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the inn-workers