BIND 10 #1603: replace name compression with more efficient one

Fri Mar 9 10:41:25 UTC 2012

#1603: replace name compression with more efficient one
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120320
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  libdns++                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  7
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:  auth   |
  performance                        |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------
Changes (by vorner):

 * owner:  vorner => jinmei

Comment:

 Hello

 Replying to [comment:10 jinmei]:
 > I suspect one reason for the poorer performance with fewer names
 > was because we needed to make copies of names.  The new version
 > eliminates the need, so I think it's now better.  At least in my
 > environment the new version is more than 3 times faster in the case of
 > NXDOMAIN.

 I found the reason. I use -O0 for faster compilation O:-). Anyway, with
 -O2, the new one is several times faster now, which is good.

 > These are all good questions, and regarding the parameters, the short
 > answer is they were derived from the BIND 9 implementation (I also
 > noted that in the doxygen comment).  That does not necessarily mean
 > these are the best choice, of course, and I don't really know how
 > these parameters were chosen in BIND 9.  But if we don't have a strong
 > reason for a particular choice I think it's generally a good start.

 Well, I fear they might be chosen randomly at some point in history and
 now everybody copies them „because that's how it was done all the time“.
 But it is probably OK to keep it this way, it might not matter much.

 > Regarding a prime number vs 64, again, I don't know why the BIND 9
 > developer chose it, but in this case I actually thought about it
 > myself.  In this case 64 should be as good/bad as a prime or any other
 > number close it because we don't really know the distribution of the
 > hash results.  Choosing a prime number is a good practice when we know
 > the input has a particular pattern of bias (such as if we use one
 > character of a domain name as a hash key directly - when most of the
 > input would be between 'a' to 'z' and '0' to '9' while the input space
 > is the whole 8 bits), but since our input is a result of hash
 > function, we cannot simply know (we may, after analyzing the hash
 > function with the knowledge of typical bias of inputs, but that's not
 > obvious).  So I didn't bother to tweak it.

 Well, the hash function might need a pattern. The prime number should help
 no matter what pattern length it is (chance of it being prime is lower),
 but it might not be that important anyway.

 > > About the map used for lower case ‒ the whole thing with private
 header seems not really nice. For one, why is the variable declared in
 both name_internal.h and name.h? And, could we put it as a private static
 member of Name, so others could really not get to it, but the
 !LabelSequence (being a friend) could?
 >
 > Yeah I know, and I didn't like it either.  But in the end, the new
 > version still keeps it: First, the garbage in name.h was just a
 > leftover.  I removed it.  Second, as a result of revision
 > MessageRenderer also uses it (otherwise it would endure a bit more
 > poorer performance), so it cannot be hidden in a Name object.  But if
 > you have a better idea of achieving the goal: sharing this for Name,
 > LabelSequence and MessageRenderer, I'm happy to hear and adopt it.

 Maybe as it is getting to be more generally used, it would make sense to
 make it generally available utility ‒ make it public instead.

 > >  * As the hash of label sequence is such a lightweight type
 (`size_t`), would it make sense to cache it for future use? Like having
 two size_t variables (array) and two bools, each for one of sensitive and
 insensitive, the bool saying if the size_t is filled in and when the hash
 is computed, it would be stored. Whenever the sequence would strip on
 either end, it would reset them to false. This would speed up computing
 the hash again (which is AFAIK happening ‒ it is first computed and looked
 in the hash table, then it is computed again when adding to the table). It
 would also could speed up comparison in the negative case ‒ if they are
 both computed and they are not the same (the hashes), nor the sequences
 can be the same. If this case doesn't happen, the real comparison would
 happen. This could speed up lookup in case of collision.
 >
 > That seemed to be a good idea, so I implemented it, although not in
 > the LabelSequence class, but in the logic of MessageRenderer
 > implementation.  I compared the resulting performance; it was not
 > super significant, but I saw some non negligible improvement, so I
 > kept it.

 Hmm, still, I wanted to avoid some comparisons in case of bucket
 collision. Maybe storing the hash in the OffsetItem as well and looking
 into it first before doing the real comparison?

 Also, a typo here in the comment (thrower should be slower):
 {{{#!c++
     /// \brief A common helper to throw an exception on invalid operation.
     ///
     /// Experiments showed that throwing from each method makes the buffer
     /// operation thrower, so we consolidate it here, and let the methods
     /// call this.
 }}

-- 
Ticket URL: <http://bind10.isc.org/ticket/1603#comment:13>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development