BIND 10 #1603: replace name compression with more efficient one
BIND 10 Development
do-not-reply at isc.org
Fri Mar 9 19:12:22 UTC 2012
#1603: replace name compression with more efficient one
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
jinmei | Status: reviewing
Type: task | Milestone:
Priority: | Sprint-20120320
medium | Resolution:
Component: | Sensitive: 0
libdns++ | Sub-Project: DNS
Keywords: | Estimated Difficulty: 7
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: auth |
performance |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:13 vorner]:
> Hello
> > I suspect one reason for the poorer performance with fewer names
> > was because we needed to make copies of names. [...]
>
> I found the reason. I use -O0 for faster compilation O:-). Anyway, with
-O2, the new one is several times faster now, which is good.
Okay, good to know that.
> > These are all good questions, and regarding the parameters, the short
> > answer is they were derived from the BIND 9 implementation (I also
> > noted that in the doxygen comment). [...]
>
> Well, I fear they might be chosen randomly at some point in history and
now everybody copies them „because that's how it was done all the time“.
But it is probably OK to keep it this way, it might not matter much.
I have to admit you're probably right that these were arbitrary
choice. But unless we can suggest something better with a reason, I'm
afraid all we can do is to note that it was derived from the prior
implementation.
BTW I realized I didn't answer one specific question: "why up to 16".
Still I don't know (why BIND 9 does this), but I guess it's for not
spending too much time for a calculating ideal hash against some
unusually long names. I think the motivation is okay for the intended
purpose here, although it may be a marginal optimization in practice.
> > Regarding a prime number vs 64, again, I don't know why the BIND 9
> > developer chose it, but in this case I actually thought about it
> > myself. In this case 64 should be as good/bad as a prime or any other
> > number close it [...]
>
> Well, the hash function might need a pattern. The prime number should
help no matter what pattern length it is (chance of it being prime is
lower),
I don't think so; when we use a bucket size of n and represent the
resulting values from the hash function in the form of nX + Y (0 <= Y
< n), the probability of collisions depend on how many values are
produced for each Y by the combination of the original input and the
algorithm of the hash function. It all depends on the details of the
function and bias in the original data, and just choosing a prime
number for n doesn't help in itself (consider, e.g, a stupid "hash"
function f(x) = 31 * x (ignore overflow for brevity). If we choose 31
(which is a prime number) for the hash bucket size, we have 100%
collisions).
That said, it doesn't mean 64 is better than a prime number. As I
said, both are just equally bad. So,
> but it might not be that important anyway.
I don't mind changing it to a prime number. I just pointed out it's
not an improvement.
> > > About the map used for lower case ‒ [...]
> > Yeah I know, and I didn't like it either. But in the end, the new
> > version still keeps it: [...]
>
> Maybe as it is getting to be more generally used, it would make sense to
make it generally available utility ‒ make it public instead.
For now, I'd still keep it semi-private. After all, normal
applications can (and should) just use std::tolower() for this
purpose. If we still see more cases where this level of micro
optimization is required, we can consider moving it more public.
I noted this point in the header file.
> > > * As the hash of label sequence is such a lightweight type
(`size_t`), would it make sense to cache it for future use? [...]
> > That seemed to be a good idea, so I implemented it, although not in
> > the LabelSequence class, but in the logic of MessageRenderer
> > implementation. I compared the resulting performance; it was not
> > super significant, but I saw some non negligible improvement, so I
> > kept it.
>
> Hmm, still, I wanted to avoid some comparisons in case of bucket
collision. Maybe storing the hash in the OffsetItem as well and looking
into it first before doing the real comparison?
Ah, okay, I didn't fully understand what you meant. I did it in the
latest version. As you probably realized, however, this would be
rather additional overhead than optimization when we have no or very
few collisions (and we'd probably expect that in usual cases; when we
need to render too many names like in AXFR-out that can cause many
collisions, other overhead due to the larger number of RRs and message
size and the use of TCP would make this optimization moot.) The
benchmark showed storing and comparing the full hash indeed makes it a
bit slower, but at the micro benchmark level it was not so significant
either. So for now I've kept the code in the branch. I'm okay either
with or without it.
> Also, a typo here in the comment (thrower should be slower):
> {{{#!c++
> /// \brief A common helper to throw an exception on invalid
operation.
> ///
> /// Experiments showed that throwing from each method makes the
buffer
> /// operation thrower, so we consolidate it here, and let the
methods
> /// call this.
> }}
Ah, thanks for catching it. Fixed.
The branch was revised with the suggested hash update and some other
minor cleanups.
--
Ticket URL: <http://bind10.isc.org/ticket/1603#comment:14>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list