BIND 10 #1603: replace name compression with more efficient one
BIND 10 Development
do-not-reply at isc.org
Fri Mar 9 07:54:43 UTC 2012
#1603: replace name compression with more efficient one
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
jinmei | Status: reviewing
Type: task | Milestone:
Priority: | Sprint-20120320
medium | Resolution:
Component: | Sensitive: 0
libdns++ | Sub-Project: DNS
Keywords: | Estimated Difficulty: 7
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: auth |
performance |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Comment (by jinmei):
Replying to [comment:7 vorner]:
> First, on my system the new one is also slower on NXDOMAINs. I fear it's
because the clear or initialization is quite heavyweight. Maybe we could
just call clear() on each vector that is not empty? Or, can the renderer
know how many names there will be, to create large enough hash table and
use open addressing instead of separate chaining? We could get rid of so
many vectors.
I suspect one reason for the poorer performance with fewer names
was because we needed to make copies of names. The new version
eliminates the need, so I think it's now better. At least in my
environment the new version is more than 3 times faster in the case of
NXDOMAIN.
> Speaking about the hash table and hashing, I have few questions:
> * Why does the hash function take 16 characters only? What is the
meaning of 16? Why not hash the whole thing?
> * Why do we create so „high“ hash table, 64 buckets and 16 slots in
each bucket? If we expect so many collisions, shouldn't we create more
buckets instead?
> * Why do we use 64 buckets, when 64 is nice round number? AFAIK there
are less collisions with prime number sizes.
> * Isn't there a hash table in the boost library to just reuse?
These are all good questions, and regarding the parameters, the short
answer is they were derived from the BIND 9 implementation (I also
noted that in the doxygen comment). That does not necessarily mean
these are the best choice, of course, and I don't really know how
these parameters were chosen in BIND 9. But if we don't have a strong
reason for a particular choice I think it's generally a good start.
Regarding a prime number vs 64, again, I don't know why the BIND 9
developer chose it, but in this case I actually thought about it
myself. In this case 64 should be as good/bad as a prime or any other
number close it because we don't really know the distribution of the
hash results. Choosing a prime number is a good practice when we know
the input has a particular pattern of bias (such as if we use one
character of a domain name as a hash key directly - when most of the
input would be between 'a' to 'z' and '0' to '9' while the input space
is the whole 8 bits), but since our input is a result of hash
function, we cannot simply know (we may, after analyzing the hash
function with the knowledge of typical bias of inputs, but that's not
obvious). So I didn't bother to tweak it.
> What does the swap do here? Why is the thing copied and new vector
created?
> {{{#!c++
> for (size_t i = 0; i < MessageRendererImpl::BUCKETS; ++i) {
> if (impl_->table_[i].size() >
MessageRendererImpl::RESERVED_ITEMS) {
>
impl_->table_[i].reserve(MessageRendererImpl::RESERVED_ITEMS);
>
vector<MessageRendererImpl::OffsetItem>(impl_->table_[i].begin(),
>
impl_->table_[i].end()).
> swap(impl_->table_[i]);
> }
> }}}
It's for trimming an excessive capacity. I added some more comments
about it.
> About the map used for lower case ‒ the whole thing with private header
seems not really nice. For one, why is the variable declared in both
name_internal.h and name.h? And, could we put it as a private static
member of Name, so others could really not get to it, but the
!LabelSequence (being a friend) could?
Yeah I know, and I didn't like it either. But in the end, the new
version still keeps it: First, the garbage in name.h was just a
leftover. I removed it. Second, as a result of revision
MessageRenderer also uses it (otherwise it would endure a bit more
poorer performance), so it cannot be hidden in a Name object. But if
you have a better idea of achieving the goal: sharing this for Name,
LabelSequence and MessageRenderer, I'm happy to hear and adopt it.
> About the !LabelSequence changes, I'd have few suggestions:
> * Currently, a default-constructed label sequence is kind of broken.
And the pointer there looks hairy too. Could it be initialized with
Name::ROOT_NAME() instead? That way it would kind of correspond to a
default-constructed Name.
As a result of revise, at least this branch doesn't need the default
constructor any more. I suspect we'll have a similar need in future,
but for now I revered this part to the original implementation.
> * As the hash of label sequence is such a lightweight type (`size_t`),
would it make sense to cache it for future use? Like having two size_t
variables (array) and two bools, each for one of sensitive and
insensitive, the bool saying if the size_t is filled in and when the hash
is computed, it would be stored. Whenever the sequence would strip on
either end, it would reset them to false. This would speed up computing
the hash again (which is AFAIK happening ‒ it is first computed and looked
in the hash table, then it is computed again when adding to the table). It
would also could speed up comparison in the negative case ‒ if they are
both computed and they are not the same (the hashes), nor the sequences
can be the same. If this case doesn't happen, the real comparison would
happen. This could speed up lookup in case of collision.
That seemed to be a good idea, so I implemented it, although not in
the LabelSequence class, but in the logic of MessageRenderer
implementation. I compared the resulting performance; it was not
super significant, but I saw some non negligible improvement, so I
kept it.
> Also, the old renderer. Would it make sense to preserve it, so future
benchmarks still work? To put it under the benchmark directory, so we
don't poison the namespace of the libdns++?
Okay. I've moved it to the benchmark directory with some note.
--
Ticket URL: <http://bind10.isc.org/ticket/1603#comment:10>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list