DNS server caching performance test results.

Thu May 17 17:59:40 UTC 2001

> -----Original Message-----
> From: Brad Knowles [mailto:brad.knowles at skynet.be]
> Sent: Wednesday, May 16, 2001 6:45 PM
> To: Matt Simerson; 'bind-users at isc.org'
> Subject: Re: DNS server caching performance test results.
> 
> At 6:12 PM -0600 5/16/01, Matt Simerson wrote:
> 
> >  Configuring the caches was pretty basic. <configure them both as
> >  forwarders...>.
> 
> 	IMO, forwarding is almost always a problem. 

It possible but it shouldn't be. Many sites use forwarding using it's
forwarding mechanism darned well better work. If it doesn't, that's
something I'd rather learn about in testing rounds rather than production.
As it turns out, it worked just fine.

> I think a better 
> trick would have been to configure the caching nameservers to be 
> authoritative for the 216.in-addr.arpa zone, and delegate 
> 122.216.in-addr.arpa to the "walldns" machine, at which point the 
> data would be cached. 

That seems to be a moot point because after they (forward) query the data
once from walldns, they have the data cached anyway. Either way works in
pretty much the same fashion. My way removes the additional query for
216.in-addr.arpa.

> So long as you restrict yourself to just PTR 
> queries within this zone, I believe that this would be a cleaner 
> test, and would avoid potentially problematical forwarding code or 
> accidental improper configuration of forwarding.

I haven't done A queries yet but the output of the PTR queries includes the
hostname. I'll pipe that output through sed and get a list of hostnames that
I can do A queries on. My theory is that the results will be nearly
identical. My only problem is that I don't have a tool that can generate
that many A queries. :-(

> 	I'd be real interested to see a graph of number of parallel 
> threads against total time to resolve the queries.  You might be 
> surprised to find that the optimal number of threads was five, or it 
> might turn out that the optimal number of threads is twenty-five. 
> But only by testing the whole range of possibilities in the various 
> test cases below, would you be likely to find out.

Funny you should ask. :-)  I actually did this while testing dnsfilter. I
posted the results to the djbdns mailing list because the behavior didn't
seem quite right to me. I would have a expected that increasing the number
of parallel queries would continually increase the qps rating until the
maximum ability of the name server was reached. That turned out not to be
the case. I tested with values ranging from 1 to 10,000 and found any value
between 5 and 20 to be optimum number. Not coincidentally, 10 is the
default.

However, when you start throwing in queries that time out (in a 50/50 ratio)
you need to crank that value way up to achieve the highest levels of
performance. So, it basically means more work for me. Before each test run I
have to tune dnsfilter for that test to make sure it's keeping the name
server as busy as it possibly can. 

> >  What follows is the output of my first batch of tests. I ran the
following
> >  command 3 times for each dns cache: "time dnsfilter < iplist.wall >
> >  out[1-3]". The first test reflects the caches need to fetch the results
from
> >  the dns wall and return them to the client. The two subsequent tests
reflect
> >  the caches ability to server results from it's cache. The file
iplist.wall
> >  simply contains 65,536 ip addresses representing the class B address of
> >  216.122.0.0.
> 
> 	This is with just one client, right?

Correct.

> >  Name Server               time(s)   qps
> >  dnscache - 290MB RAM      63        1040
> >                            29        2260
> >                            29        2260
> >
> >  BIND 8 - 8MB RAM          21        3120
> >                            40        1638
> >                            39        1680
> >
> >  BIND 9 - 12MB RAM         81        809
> >                            29        2260
> >                            29        2260
> 
> 	Fascinating.  I wonder if BIND 8 is returning the answer before 
> storing it in the cache, while the other two programs are storing it 
> in the cache first and then returning the answer?

I'm guessing that's exactly what BIND 8 does. How else could you explain
numbers like that?

> 	Have you looked at "dents" as another alternative 
> nameserver program?

Briefly. I don't recall why I stopped looking at it but we determinted early
on that it wasn't well suited for what we're doing. 

> 	I agree that it is better to specify the maximum, but IMO it 
> would be less confusing to display the starting memory utilized, the 
> ending memory utilized, and then in a footnote indicate what memory 
> limitation parameters there may have been supplied.  I believe that 
> the focus should be on the actual utilization, and not on the 
> parameters.

I mostly agree with you. However, I think that most people don't care
exactly how efficient it is, they want to know how much memory it's going to
use. What I've provided gives them some decent rules of thumb (like take
whatever BIND 8 is using and add 20% for BIND 9). Your question has made me
curious though so I've added a new field to my test spreadsheet. I can watch
BIND via top or ps aux to see how much RAM it's grown by and dnscache logs
how many bytes of data it's written to cache so I'll record that data next
time.

This is all still "pre-test" testing. I'm still defining the criteria by
which QA will run the actual tests and record values (that I ask them for).
So, asking questions like this for more data is very helpful.

> 	To get a better idea of the steady-state performance of these 
> programs, I'd like to see two clients continually running through the 
> entire list of IP addresses, as quickly as they possibly can (no 
> timing, direct all output to /dev/null, etc...).  Then, once they've 
> cycled through at least a couple of times, fire up a third client 
> which runs through the entire list of IP addresses several times, but 
> this time you keep track of the output and the times.  You would then 
> throw out the highest and lowest numbers, and average the rest.  I'd 
> encourage at least seven runs like this, and the more the better.

Can you define this test methodology better?  Am I just keeping track of the
third clients output?  Am I running the test in client 3 at the same time as
1 and 2 are cycling through? If so, I'm just going to see extended times for
client 3 to resolve the data. I have to keep track (on all clients) of how
many queries are being answered and correlate that number to elapsed test
time to get a meaningful qps rating.

> >  Name Server                client-time    time      qps
> >      dnscache - 290MB RAM   200             67       981
> >                             93              31       2112
> >                             86              29       2286
> >
> >      BIND 8 - 8MB RAM       51              17       3855
> >                             114             38       1725
> >                             114             38       1725
> >
> >      BIND 9 - 12MB RAM      239             80       822
> >                             82              27       2397
> >                             81              27       2427
> 
> 	Previously, you gave us information about system utilization on 
> both client and server during the tests -- is this information 
> available for this run?  I'd be very interested to know how hard the 
> various programs ended up pushing the system in order to deliver the 
> numbers you've gotten.

Nope, but it can be for the next one...  I think I'll write a little script
that records the CPU % of the name server process every second during the
run. So what's the best number to record, MAX untilization? Disk activity is
pretty meaningless since each of the systems have 32MB of cache on the RAID
controller and the output files are 3MB in size. 

> 	Also, have you considered running this kind of test where the 
> nameserver was running on a multi-processor machine (ideally both 
> dual-processor and quad-processor)?  I'd be really interested to see 
> how these numbers change when you throw extra processors into the mix.

The test machines I have for this are all single procs. :-(  Uh, hmmm,
what's this dual 700 under my other desk doing next week?  Hmmm, maybe that
will generate some fun numbers. :-) Hi ho, hi ho, off to the NOC I go.

Matt