BIND 10 #1603: replace name compression with more efficient one

Fri Mar 9 19:12:22 UTC 2012

#1603: replace name compression with more efficient one
-------------------------------------+-------------------------------------
                   Reporter:         |                 Owner:  jinmei
  jinmei                             |                Status:  reviewing
                       Type:  task   |             Milestone:
                   Priority:         |  Sprint-20120320
  medium                             |            Resolution:
                  Component:         |             Sensitive:  0
  libdns++                           |           Sub-Project:  DNS
                   Keywords:         |  Estimated Difficulty:  7
            Defect Severity:  N/A    |           Total Hours:  0
Feature Depending on Ticket:  auth   |
  performance                        |
        Add Hours to Ticket:  0      |
                  Internal?:  0      |
-------------------------------------+-------------------------------------

Comment (by jinmei):

 Replying to [comment:13 vorner]:
 > Hello
 > > I suspect one reason for the poorer performance with fewer names
 > > was because we needed to make copies of names.  [...]
 >
 > I found the reason. I use -O0 for faster compilation O:-). Anyway, with
 -O2, the new one is several times faster now, which is good.

 Okay, good to know that.

 > > These are all good questions, and regarding the parameters, the short
 > > answer is they were derived from the BIND 9 implementation (I also
 > > noted that in the doxygen comment).  [...]
 >
 > Well, I fear they might be chosen randomly at some point in history and
 now everybody copies them „because that's how it was done all the time“.
 But it is probably OK to keep it this way, it might not matter much.

 I have to admit you're probably right that these were arbitrary
 choice.  But unless we can suggest something better with a reason, I'm
 afraid all we can do is to note that it was derived from the prior
 implementation.

 BTW I realized I didn't answer one specific question: "why up to 16".
 Still I don't know (why BIND 9 does this), but I guess it's for not
 spending too much time for a calculating ideal hash against some
 unusually long names.  I think the motivation is okay for the intended
 purpose here, although it may be a marginal optimization in practice.

 > > Regarding a prime number vs 64, again, I don't know why the BIND 9
 > > developer chose it, but in this case I actually thought about it
 > > myself.  In this case 64 should be as good/bad as a prime or any other
 > > number close it [...]
 >
 > Well, the hash function might need a pattern. The prime number should
 help no matter what pattern length it is (chance of it being prime is
 lower),

 I don't think so; when we use a bucket size of n and represent the
 resulting values from the hash function in the form of nX + Y (0 <= Y
 < n), the probability of collisions depend on how many values are
 produced for each Y by the combination of the original input and the
 algorithm of the hash function.  It all depends on the details of the
 function and bias in the original data, and just choosing a prime
 number for n doesn't help in itself (consider, e.g, a stupid "hash"
 function f(x) = 31 * x (ignore overflow for brevity).  If we choose 31
 (which is a prime number) for the hash bucket size, we have 100%
 collisions).

 That said, it doesn't mean 64 is better than a prime number.  As I
 said, both are just equally bad.  So,

 > but it might not be that important anyway.

 I don't mind changing it to a prime number.  I just pointed out it's
 not an improvement.

 > > > About the map used for lower case ‒ [...]

 > > Yeah I know, and I didn't like it either.  But in the end, the new
 > > version still keeps it: [...]
 >
 > Maybe as it is getting to be more generally used, it would make sense to
 make it generally available utility ‒ make it public instead.

 For now, I'd still keep it semi-private.  After all, normal
 applications can (and should) just use std::tolower() for this
 purpose.  If we still see more cases where this level of micro
 optimization is required, we can consider moving it more public.
 I noted this point in the header file.

 > > >  * As the hash of label sequence is such a lightweight type
 (`size_t`), would it make sense to cache it for future use? [...]

 > > That seemed to be a good idea, so I implemented it, although not in
 > > the LabelSequence class, but in the logic of MessageRenderer
 > > implementation.  I compared the resulting performance; it was not
 > > super significant, but I saw some non negligible improvement, so I
 > > kept it.
 >
 > Hmm, still, I wanted to avoid some comparisons in case of bucket
 collision. Maybe storing the hash in the OffsetItem as well and looking
 into it first before doing the real comparison?

 Ah, okay, I didn't fully understand what you meant.  I did it in the
 latest version.  As you probably realized, however, this would be
 rather additional overhead than optimization when we have no or very
 few collisions (and we'd probably expect that in usual cases; when we
 need to render too many names like in AXFR-out that can cause many
 collisions, other overhead due to the larger number of RRs and message
 size and the use of TCP would make this optimization moot.)  The
 benchmark showed storing and comparing the full hash indeed makes it a
 bit slower, but at the micro benchmark level it was not so significant
 either.  So for now I've kept the code in the branch.  I'm okay either
 with or without it.

 > Also, a typo here in the comment (thrower should be slower):
 > {{{#!c++
 >     /// \brief A common helper to throw an exception on invalid
 operation.
 >     ///
 >     /// Experiments showed that throwing from each method makes the
 buffer
 >     /// operation thrower, so we consolidate it here, and let the
 methods
 >     /// call this.
 > }}

 Ah, thanks for catching it.  Fixed.

 The branch was revised with the suggested hash update and some other
 minor cleanups.

-- 
Ticket URL: <http://bind10.isc.org/ticket/1603#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development