BIND 10 #1603: replace name compression with more efficient one

Thu Mar 8 11:29:05 UTC 2012

#1603: replace name compression with more efficient one
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120320
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  libdns++                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  7
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:  auth   |
  performance                        |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 First, on my system the new one is also slower on NXDOMAINs. I fear it's
 because the clear or initialization is quite heavyweight. Maybe we could
 just call clear() on each vector that is not empty? Or, can the renderer
 know how many names there will be, to create large enough hash table and
 use open addressing instead of separate chaining? We could get rid of so
 many vectors.

 Speaking about the hash table and hashing, I have few questions:
  * Why does the hash function take 16 characters only? What is the meaning
 of 16? Why not hash the whole thing?
  * Why do we create so „high“ hash table, 64 buckets and 16 slots in each
 bucket? If we expect so many collisions, shouldn't we create more buckets
 instead?
  * Why do we use 64 buckets, when 64 is nice round number? AFAIK there are
 less collisions with prime number sizes.
  * Isn't there a hash table in the boost library to just reuse?

 What does the swap do here? Why is the thing copied and new vector
 created?
 {{{#!c++
     for (size_t i = 0; i < MessageRendererImpl::BUCKETS; ++i) {
         if (impl_->table_[i].size() > MessageRendererImpl::RESERVED_ITEMS)
 {
             impl_->table_[i].reserve(MessageRendererImpl::RESERVED_ITEMS);
 vector<MessageRendererImpl::OffsetItem>(impl_->table_[i].begin(),
 impl_->table_[i].end()).
                 swap(impl_->table_[i]);
         }
 }}}

 About the map used for lower case ‒ the whole thing with private header
 seems not really nice. For one, why is the variable declared in both
 name_internal.h and name.h? And, could we put it as a private static
 member of Name, so others could really not get to it, but the
 !LabelSequence (being a friend) could?

 About the !LabelSequence changes, I'd have few suggestions:
  * Currently, a default-constructed label sequence is kind of broken. And
 the pointer there looks hairy too. Could it be initialized with
 Name::ROOT_NAME() instead? That way it would kind of correspond to a
 default-constructed Name.
  * As the hash of label sequence is such a lightweight type (`size_t`),
 would it make sense to cache it for future use? Like having two size_t
 variables (array) and two bools, each for one of sensitive and
 insensitive, the bool saying if the size_t is filled in and when the hash
 is computed, it would be stored. Whenever the sequence would strip on
 either end, it would reset them to false. This would speed up computing
 the hash again (which is AFAIK happening ‒ it is first computed and looked
 in the hash table, then it is computed again when adding to the table). It
 would also could speed up comparison in the negative case ‒ if they are
 both computed and they are not the same (the hashes), nor the sequences
 can be the same. If this case doesn't happen, the real comparison would
 happen. This could speed up lookup in case of collision.

 Also, the old renderer. Would it make sense to preserve it, so future
 benchmarks still work? To put it under the benchmark directory, so we
 don't poison the namespace of the libdns++?

-- 
Ticket URL: <http://bind10.isc.org/ticket/1603#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development