History hash collisions

Fri Nov 12 15:59:37 UTC 1999

> To:            inn-workers at isc.org
> Subject:       Re: History hash collisions
> From:          Russ Allbery <rra at stanford.edu>

> Yeah, I wasn looking at this the other day too, and I think this explains
> some of the duplicate article messages that people get.  We should
> consider increasing it as part of the history API rework.
> 

Any history API work should be so major that it is not
rework, but a ground-up rewrite.  In this case, "increasing"
the size of the stored hash doesn't ensure correctness, it
just makes failure less likely.  I don't like solutions like
that.

I apologize in advance that I did not fully follow the history
API discussion on n.s.n.  Here are my thoughts....

There are three fundamental problems with the INN history
architecture.  They have been self-evident for quite some
time now.

In no particular order...
   1. Deletions are expensive.  You have to rewrite the data file,
      and rebuild the .pag hash table.

   2. The locations stored in the hash table are 4 byte offsets to
      a single file.

   3. Linear probing for empty slots degrades performance quickly,
      even though the probing is run-length limited, with the 
      spillover/expansion hash tables.

You can't solve all of these problems by making adjustments to dbz.

Obviously, there other problems which you would want to tackle at
the same time, like integration of cache, pre-commit cache, memory
mapping, the problem of hash collision I mentioned, and probably
more I can't remember off the top of my head.  Is there a list
you are working on, Russ, Katsuhiro?  I guess I should slog through
that thread on n.s.n....

Forrest