BIND 10 #1603: replace name compression with more efficient one
BIND 10 Development
do-not-reply at isc.org
Thu Mar 8 11:29:05 UTC 2012
#1603: replace name compression with more efficient one
-------------------------------------+-------------------------------------
Reporter: | Owner: jinmei
jinmei | Status: reviewing
Type: task | Milestone:
Priority: | Sprint-20120320
medium | Resolution:
Component: | Sensitive: 0
libdns++ | Sub-Project: DNS
Keywords: | Estimated Difficulty: 7
Defect Severity: N/A | Total Hours: 0
Feature Depending on Ticket: auth |
performance |
Add Hours to Ticket: 0 |
Internal?: 0 |
-------------------------------------+-------------------------------------
Changes (by vorner):
* owner: vorner => jinmei
Comment:
Hello
First, on my system the new one is also slower on NXDOMAINs. I fear it's
because the clear or initialization is quite heavyweight. Maybe we could
just call clear() on each vector that is not empty? Or, can the renderer
know how many names there will be, to create large enough hash table and
use open addressing instead of separate chaining? We could get rid of so
many vectors.
Speaking about the hash table and hashing, I have few questions:
* Why does the hash function take 16 characters only? What is the meaning
of 16? Why not hash the whole thing?
* Why do we create so „high“ hash table, 64 buckets and 16 slots in each
bucket? If we expect so many collisions, shouldn't we create more buckets
instead?
* Why do we use 64 buckets, when 64 is nice round number? AFAIK there are
less collisions with prime number sizes.
* Isn't there a hash table in the boost library to just reuse?
What does the swap do here? Why is the thing copied and new vector
created?
{{{#!c++
for (size_t i = 0; i < MessageRendererImpl::BUCKETS; ++i) {
if (impl_->table_[i].size() > MessageRendererImpl::RESERVED_ITEMS)
{
impl_->table_[i].reserve(MessageRendererImpl::RESERVED_ITEMS);
vector<MessageRendererImpl::OffsetItem>(impl_->table_[i].begin(),
impl_->table_[i].end()).
swap(impl_->table_[i]);
}
}}}
About the map used for lower case ‒ the whole thing with private header
seems not really nice. For one, why is the variable declared in both
name_internal.h and name.h? And, could we put it as a private static
member of Name, so others could really not get to it, but the
!LabelSequence (being a friend) could?
About the !LabelSequence changes, I'd have few suggestions:
* Currently, a default-constructed label sequence is kind of broken. And
the pointer there looks hairy too. Could it be initialized with
Name::ROOT_NAME() instead? That way it would kind of correspond to a
default-constructed Name.
* As the hash of label sequence is such a lightweight type (`size_t`),
would it make sense to cache it for future use? Like having two size_t
variables (array) and two bools, each for one of sensitive and
insensitive, the bool saying if the size_t is filled in and when the hash
is computed, it would be stored. Whenever the sequence would strip on
either end, it would reset them to false. This would speed up computing
the hash again (which is AFAIK happening ‒ it is first computed and looked
in the hash table, then it is computed again when adding to the table). It
would also could speed up comparison in the negative case ‒ if they are
both computed and they are not the same (the hashes), nor the sequences
can be the same. If this case doesn't happen, the real comparison would
happen. This could speed up lookup in case of collision.
Also, the old renderer. Would it make sense to preserve it, so future
benchmarks still work? To put it under the benchmark directory, so we
don't poison the namespace of the libdns++?
--
Ticket URL: <http://bind10.isc.org/ticket/1603#comment:7>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development
More information about the bind10-tickets
mailing list