[bind10-dev] Resolver testing

Tue Nov 2 13:30:22 UTC 2010

All, 

Here are some ideas about how to functionally test resolver code:

https://bind10.isc.org/wiki/TestingRecursion

And in text (for commenting here if you want):

Problem
=======
While the authoritative side of DNS is fairly heavily tested, the
recursive side is basically completely untested.

The reason for this is that testing an authoritative server is
relatively simple - the server is basically doing a lookup into a
database and returning a result, so you can check various contents of
the database compared to various mixes of query type, concurrency, and
rate. However, testing the recursive side involves testing not only a
single authoritative server, or a cluster of computers acting as an
authoritative server, but rather the recursive server AND the set of
servers that are authoritative for domains being queried. The presence
of a cache complicates testing further.

However, in order to both verify that resolution is working correctly
and also to optimize the entire recursive resolution process, we
should create a system that does functional testing of the recursive
side of DNS.

Test Solution Space
===================
In order to systematically test the system, we create a test platform
that works the same as all other test systems, with the following
characteristics:

  * System base state
  * Test inputs
  * Expected test outputs

A successful test is one were the ACTUAL test outputs match the
expected test outputs. A failed test is one where these do not match.

There are 3 types of data that are involved in a resolution:

  1. The DNS computers. This includes the clients, recursive servers,
     and  authoritative servers.
  2. The network connecting the DNS computers.
  3. The DNS zone contents. This includes the root zone definition,
     and all other domains under that.

Test DNS Computers
------------------
We do not need separate computers for the DNS computers, but we do
need to simulate them.

For the clients, this is something sending queries. We can possibly
use queryperf (which makes DNS queries based on various input
parameters) for this, although something like tcpreplay (which plays
network traffic from pcap files) may be necessary for proper control.

For the recursive server, we should use the server software that we
are testing. For us, this will probably be BIND 10 and BIND 9,
although we may also wish to test other products, like Unbound,
PowerDNS Recursor, or dnsmasq.

For the authoritative servers, the minimum for this is something
answering DNS queries (port 53) at unique IP addresses. Probably the
best thing for this is a simple Python program using our DNS library,
running on IP addresses configured for the test. These can be started
quickly, and can also implement any special handling necessary (for
example sending duplicate replies, or sending incoherent responses).

Test Network
------------
We do not need to actually build a network, but can simulate the
effects of the network on the DNS resolution process. For example,
if we want to test the RTT algorithm, we can insert artificial network
latency by configuring specific simulated authoritative servers to add
a delay before answering.

All types of network issues only need to be simulated on the
authoritative server side. The reason for this is that we are testing
the recursive server, and that the recursive server does not maintain
state when communicating with clients, so its behavior is not affected
by networking between itself and its clients, beyond the arrival
patter of queries.

A full list of tweakable parameters will appear below.

Test DNS Zone Contents
----------------------
We need to test all manner of DNS zone contents. This includes
properly configured zones that include things like out-of-zone name
servers, as well as broken configurations like lame delegations or
zones with CNAME and other RTYPE of the same ownername.

These can be expressed as normal text zone files.

It may be necessary that a zone change contents during the delegation
process, but it may also be that there are no conditions that arise
from this that are different from a carefully-configured set of zones.

Proposed Solution
=================
What we need is both a test framework and the actual tests.

The test framework should be a program that works by reading a set of
desired tests and then for each one of them:

  1. Configures the IP addresses needed for the test.
  2. Starts up the authoritative servers with the correct zones.
  3. Starts up the recursive server.
  4. Executes a set of DNS queries to get the recursive server in the
     correct state. (For example to populate the cache.)
  5. Begins recording network traffic from the recursive server (both
     to the authoritative servers and to the clients).
  6. Executes a set of DNS queries (the actual test).
  7. Compares the recorded network traffic to the expected network
     traffic.
  8. Stops the recursive server down.
  9. Stops the authoritative servers.
 10. Deconfigures the IP addresses.

Note that the test must be run as root, because we need to configure
and deconfigure IP addresses, and also to bind to port 53.

The tests need to define a number of things:

  * A list of IPv4 and IPv6 addresses for authoritative servers
  * Whether UDP works for a given IP address + domain
  * Whether TCP works for a given IP address + domain
  * How EDNS works for a given IP address + domain (fully, only for
    packets of X bytes or less, or not at all)
  * Response delay for a given IP address
  * Drops for a given IP address (should be a pattern, not a
    percentage, for reproducibility - so perhaps "10111" if we want to
    drop the 2nd packet of a 5 packet sequence)
  * Zone contents for each IP address (this allows us to test for
    things like SOA & other zone mismatches, some lame delegations,
    and so on)
  * DNSSEC parameters (mostly T.B.D., but for example using the NSEC3
    RFC as a starting point is a good idea)
  * Queries, including which queries occur at the same time (needed to
    check behavior of simultaneous queries)
  * The important data from packets sent (more below)

Given the amount of data per test, it probably makes sense to use a
directory to store a set of files that define everything. (A database
might make sense, but that is probably more trouble than it is worth.)

As far as the "important data" from each packet, this depends on the
particular test. For example:

  * Malformed query from client should reply with an error to that
    client.
  * Bogus TLD query should send a packet to one of the root servers,
    and then a reply to the client.
  * "Normal" WWW.DOMAIN-1.TEST query should send a packet to one of
    the root servers, then to one of the TEST TLD servers, then to one
    of the authoritative servers for DOMAIN-1.TEST, and finally send a
    reply to the client.
  * Simple cache test should immediately send a reply to the client
    from cache. (Here the setup for the test involves sending a query
    to populate the cache on the resolver, but that is not the test
    itself.)
  * Lame delegation test for DOMAIN-2.TEST should send a query to the
    root, then to one of the TEST TLD servers, then to one of the
    authoritative servers for DOMAIN-2.TEST, then try again at a
    different authoritative server for DOMAIN-2.TEST, then finally
    send an answer to the client. (Note that here the authoritative
    servers should work together to insure that the first reply is
    always an error!)

Probably a simple language to define how these packets look is needed,
so these can be defined via data files and not require programming for
each test. A file defining packets in this language may look something
like this:

    # define the packets in a simple A lookup of www.domain-1.test
    # target(s)         time   query/answer  contents
    a.root,b.root       *      q             www.domain-1.test a
    ns1.test,ns2.test   *      q             www.domain-1.test a
    $client             *      a             www.domain-1.test a 10.0.0.1

    # define the packets with a non-responding name server
    # target(s)                time   query/answer  contents
    a.root,b.root              *      q             domain-2.test a
    ns1.test,ns2.test          *      q             domain-2.test a
    $last                      100    q             domain-2.test a
    (ns1.test,ns2.test)-$last  100    q             domain-2.test a
    $client                    *      a             domain-2.test a 10.0.0.2

I'm not sure about the exact retry behavior, but this last one means
we retry a server once after 0.1 sec (100 msec) and then try a
different server from the set. Note this language is probably not the
best, just something to illustrate the basic idea.

--
Shane