hash buckets

Wed Jan 5 00:01:52 UTC 2000

from the quill of Ted Lemon <mellon at isc.org> on scroll
<200001042330.SAA09897 at grosse.manhattan.fugue.com>
> 
> > Meaning a number of hash buckets as hight as 10k right?
> 
> Right.
> 
> > > I'd have to think about it before recommending that you go higher.
> > 
> > Than 10k hash buckets or 97.  I am pretty sure you mean 10k.
> 
> 10k.

Cool.  I will look into upping the number of buckets when I can more
closely determine how many leases we do keep outstanding.

> It's superstition.   I read somewhere once that it was a good idea.

Ahhhh.  Interesting, because the idea of it being a prime number is not all
that foreign to me.

> I have no idea whether or not it makes sense mathematically, but
> intuitively it seems sensible.   The hash algorithm could probably use
> some tuning if you're feeling like being scientific about it... :')

Maybe, likely not though.  I am not particularly a math genius nor do I
enjoy it much.

> Given that leases are the biggest database being hashed, it would
> probably make sense to just count how many leases are created during
> the parsing of the configuration file, and then resize all the hashes
> based on that.

Oh man!  You do throw me a looper every now and then Ted.  I was thinking
of a hashing API that would resize a table when it determined the load was
too high, but thought you would skin me for suggesting it.  :-)

> You don't need to reparse - just
> create a new hash structure with more buckets and re-hash what's in
> the old structure into the new.

That was exactly what I was thinking about.  Would have to be careful not
to resize too often and chew up your savings though.  This would make an
interesting case study in C++ (or any OO I guess) class design and writing
methinks.  A hashing class that manages itself to be efficient.

> It's probably reasonably winning at high volumes, yes.

Sounds like me.  :-)

> If you have a
> hundred entries in a bucket, it's probably taking about the same
> amount of time to find a given entry as it is to do the rest of the
> processing for that entry.

Hmmmmm.

> If you have ten entries in a bucket, it's
> probably not worth the effort, because theoretically we only do a hash
> lookup between one and three times for each packet processed, and
> there are other things in packet processing that can be equally
> time-consuming.

Ahhhhh, cool.  It was this analysis that I was looking for.  Another
interesting case study would to be putting timers into the server like Joe
Grecco's (I hope I have the reference correct) INND timers.  I did this
with my SPF packet filter code and it was quite educational to discover
where all the time was being spent.  I realize you can do this with
"profiling", but it's more interesting to see it based on roles in the code
rather than broken down to functions.

b.

--
Brian J. Murrell                                    Brian_Murrell at ssd.bctel.net
BCTel Advanced Communications                                      604 454 5279
Vancouver, B.C.