[bind10-dev] cache effectiveness

Wed Feb 27 08:08:45 UTC 2013

Hello

On Tue, Feb 26, 2013 at 07:43:47PM -0800, JINMEI Tatuya / 神明達哉 wrote:
> I also ran more realistic experiments: using BIND 9.9.2 and unbound
> 1.4.19 in the "forward only" mode with crafted query data and the
> forwarded server to emulate the situation of 100% and 0% cache hit
> rates.  I then measured the max response throughput using a
> queryperf-like tool.  In both cases Q2 is about 28% of Q1 (I'm not
> showing specific numbers to avoid unnecessary discussion about
> specific performance of existing servers; it's out of scope of this
> memo).  Using Q2 = 0.28*Q1, above equation with 90% cache hit rate
> will be: A = 0.9 * 0.28 / (0.9*0.28 + 0.1) = 0.716. So the server will
> spend about 72% of its running time to answer queries directly from
> the cache.

That 28% somehow surprises me. That means only like 5 times slower for recursive
lookups than getting directly from cache. I would expect factor of 50, not 5, at
least if the cache was well-optimised.

It may mean there's a lot of time spent in non-DNS related processing, like
network IO. Or possibly things like rendering of messages.

It's not really related to cache, but I discovered the sendmmsg and recvmmsg
functions yesterday. Unfortunately, they are linux-specific (and one of them is
in linux 3.0 and newer only), but they look like directly designed for DNS
servers. I'll do some experiments with them today, I want to see if they can
help.

> Now, assuming the number of 50% or more, does this suggest we should
> highly optimize the cache?  Opinions may vary on this point, but I
> personally think the answer is yes.  I've written an experimental
> cache only implementation that employs the idea of fully-rendered
> cached data.  On one test machine (2.20GHz AMD64, using a single
> core), queryperf-like benchmark shows it can handle over 180Kqps,
> while BIND 9.9.2 can just handle 41K qps.  The experimental
> implementation skips some necessary features for a production server,
> and cache management itself is always inevitable bottleneck, so the
> production version wouldn't be that fast, but it still suggests it may
> not be very difficult to reach over 100Kqps in production environment
> including recursive resolution overhead.

Well, I guess we need a fast resolver. And if there's 50% or more, the cache is
probably the easiest place to optimise. I don't think we want to go down to
tuning machine instructions, but I don't think we want to use stl::map either.

As a side question, what did you use for the cache in your experiment? Was it
some kind of tree, or a hash table? I believe we can use hash table in case of
the cache, since we don't need the ordering.

With regards

-- 
This side up =>

Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <https://lists.isc.org/pipermail/bind10-dev/attachments/20130227/2e205dde/attachment.bin>