[bind10-dev] Friday mail: radix trees for in-memory lookups

Fri Jun 19 23:12:27 UTC 2009

At Fri, 19 Jun 2009 17:58:45 +0000,
Evan Hunt <each at isc.org> wrote:

> > But then I thought... the *fastest* thing might be a radix tree, with
> > each layer being a character in the label. So, something like:
> 
> I had that same notion a year or so ago when I was working on the
> radix-based ACLs in BIND 9.5.  My next crazy thought was to do the
> comparison backwards (i.e., scan "www.isc.org." as ".gro.csi.www"),
> so that the most significant label is scanned for a match first.

My past profiling tests for BIND9 have showed one big performance
bottleneck is in the DB search (via dns_db_find()), so it's indeed one
promising area to optimize.

Name search in DNS is similar to IP address lookup in that they both
search for the longest matching entry, and since finding faster
algorithms for the latter is a very popular research area we might be
able to exploit research results in this field (of course, there are
substantial differences between these two, too, such as that
possible key lengths are much longer in DNS, etc).

We should also probably be careful about memory layout of DB in
addition to, or even rather than, the search algorithm.  The current
rbtdb links each tree node via pointers, where each node is allocated
separately (even for an authoritative zone not allowing dynamic
updates).  So, search in a large zone would require many random
accesses, reducing efficiency of CPU caches or even causing frequent
paging.  If, for example, we could make the portion of DB structure
where the search runs sufficiently compact, we may be able to improve
lookup speed substantially.

One other crazy idea: I've noticed many queries to caching servers (at
least those in the data sets I often used for performance tests) are
reverse lookup queries.  Since DNS search for a reverse lookup query
is essentially a longest prefix match on IP prefixes, we may be able
to apply efficient algorithm for routers directly.  For example, when
we search for 192.187.152.204.in-addr.arpa, we might look for a
longest matching prefix for 204.152.187.192 (the key is a 32-bit
integer rather than a string-based DNS name) in a "forwarding table"
using some magically efficient lookup algorithm.  Again, of course,
there are non trivial differences between these two.  For example,
"classless" reverse query (a la RFC2317) would be very tricky, if not
impossible, in this approach.

One other trivial optimization on the current BIND9 implementation: it
always makes full-compare even in search within a single domain.  For
example, when a query name "www.isc.org" is looked up in the "isc.org"
zone DB, the common "isc.org" part is always compared as the search
algorithm traverses the RBT, even if it should have been known that
the search key shares the zone name.  This is an obvious waste and
could be omitted.  I don't know how effective it is, though.

---
JINMEI, Tatuya