[bind10-dev] cache effectiveness

Wed Feb 27 03:43:47 UTC 2013

For ticket #2777 I've updated the documentation in the branch with my
analysis of effectiveness of DNS cache (in terms of response
performance).  It suggests that it would be reasonable to assume 50%
or more running time of the server will be used to answer queries
directly from the cache, and that by highly optimizing the cache it
wouldn't be too difficult to achieve 100Kqps or more including
the overhead on cache miss at the cache hit rate of 90%.

I guess that's a good subject of wider discussion (and it's quite
possible I'm wrong in some part of the analysis), so I'm copying the
text below.  Any comments/suggestions/corrections are welcome.

---
JINMEI, Tatuya

In some simplified model, we can express the amount of running time
for answering queries directly from the cache in the total running
time including that used for recursive resolution due to cache miss as
follows:

A = r*Q2*/(r*Q2+ Q1*(1-r))
where
A: amount of time for answering queries from the cache per unit time
   (such as sec, 0<=A<=1)
r: cache hit rate (0<=r<=1)
Q1: max qps of the server with 100% cache hit
Q2: max qps of the server with 0% cache hit

Q1 can be measured easily for given data set; measuring Q2 is tricky
in general (it requires many external queries with unreliable
results), but we can still have some not-so-unrealistic numbers
through controlled simulation.

As a data point for these values, see a previous experimental results
of mine:
https://lists.isc.org/pipermail/bind10-dev/2012-July/003628.html

Looking at the "ideal" server implementation (no protocol overhead)
with the set up 90% and 85% cache hit rate with 1 recursion on cache
miss, and with the possible maximum total throughput, we can deduce
Q1 and Q2, which are: 170591qps and 60138qps respectively.

This means, with 90% cache hit rate (r = 0.9), the server would spend
76% of its run time for receiving queries and answering responses
directly from the cache: 0.9*60138/(0.9*60138 + 0.1*170591) = 0.76.

I also ran more realistic experiments: using BIND 9.9.2 and unbound
1.4.19 in the "forward only" mode with crafted query data and the
forwarded server to emulate the situation of 100% and 0% cache hit
rates.  I then measured the max response throughput using a
queryperf-like tool.  In both cases Q2 is about 28% of Q1 (I'm not
showing specific numbers to avoid unnecessary discussion about
specific performance of existing servers; it's out of scope of this
memo).  Using Q2 = 0.28*Q1, above equation with 90% cache hit rate
will be: A = 0.9 * 0.28 / (0.9*0.28 + 0.1) = 0.716. So the server will
spend about 72% of its running time to answer queries directly from
the cache.

Of course, these experimental results are too simplified.  First, in
these experiments we assumed only one external query is needed on
cache miss.  In general it can be more; however, it may not actually
too optimistic either: in my another research result:
http://bind10.isc.org/wiki/ResolverPerformanceResearch
In the more detailed analysis using real query sample and tracing what
an actual resolver would do, it looked we'd need about 1.44 to 1.63
external queries per cache miss in average.

Still, of course, the real world cases are not that simple: in reality
we'd need to deal with timeouts, slower remote servers, unexpected
intermediate results, etc.  DNSSEC validating resolvers will clearly
need to do more work.

So, in the real world deployment Q2 should be much smaller than Q1.
Here are some specific cases of the relationship between Q1 and Q2 for
given A (assuming r = 0.9):

70%: Q2 = 0.26 * Q1
60%: Q2 = 0.17 * Q1
50%: Q2 = 0.11 * Q1

So, even if "recursive resolution is 10 times heavier" than the cache
only case, we can assume the server spends a half of its run time for
answering queries directly from the cache at the cache hit rate of
90%.  I think this is a reasonably safe assumption.

Now, assuming the number of 50% or more, does this suggest we should
highly optimize the cache?  Opinions may vary on this point, but I
personally think the answer is yes.  I've written an experimental
cache only implementation that employs the idea of fully-rendered
cached data.  On one test machine (2.20GHz AMD64, using a single
core), queryperf-like benchmark shows it can handle over 180Kqps,
while BIND 9.9.2 can just handle 41K qps.  The experimental
implementation skips some necessary features for a production server,
and cache management itself is always inevitable bottleneck, so the
production version wouldn't be that fast, but it still suggests it may
not be very difficult to reach over 100Kqps in production environment
including recursive resolution overhead.