BIND 10 master, updated. 0fdc68bb68b017223f35a3dc39e5255a5f255d10 Merge #2776

Thu Jun 13 09:23:43 UTC 2013

The branch, master has been updated
       via  0fdc68bb68b017223f35a3dc39e5255a5f255d10 (commit)
       via  3a98bed98afbf27ec36a941936d1bf11c656178d (commit)
       via  668a507bdb0dd52368621fe7b8aed41b796a9529 (commit)
       via  b14e0fb808dd161c89a094924f153393b3fc1348 (commit)
       via  fc735da7e6210f87e4e5a36fce91eb990b8bbbc8 (commit)
       via  3a0d2b90ef3766a8a105a8a9b04c92b8620eaeb1 (commit)
       via  edfad4a9bb65049599adaf9cf68baedf7bc072b7 (commit)
      from  e93ef80d1c7e0f431f37f75c68f45c3aa78b70f7 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
commit 0fdc68bb68b017223f35a3dc39e5255a5f255d10
Merge: e93ef80 3a98bed
Author: Michal 'vorner' Vaner <vorner at vorner.cz>
Date:   Thu Jun 13 11:21:19 2013 +0200

    Merge #2776
    
    Resolver research document about mixed resolver/authoritative mode

commit 3a98bed98afbf27ec36a941936d1bf11c656178d
Author: Mukund Sivaraman <muks at isc.org>
Date:   Thu Jun 13 11:13:08 2013 +0200

    [2776] Minor tweaks and fixes

commit 668a507bdb0dd52368621fe7b8aed41b796a9529
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Tue Mar 5 10:07:42 2013 +0100

    [2776] Provide a proposal for the design
    
    As a result of the experiment, come to the conclusion the receptionist
    is easy enough to implement, flexible enough and fast enough to try and
    implement it.

commit b14e0fb808dd161c89a094924f153393b3fc1348
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Tue Feb 26 12:45:12 2013 +0100

    [2776] Some design considerations for sharing a socket
    
    Write some advantages and disadvantages of each method how to solve it.

commit fc735da7e6210f87e4e5a36fce91eb990b8bbbc8
Author: JINMEI Tatuya <jinmei at isc.org>
Date:   Mon Feb 25 20:43:29 2013 -0800

    [res-design] added some small idea for the hybrid auth/resolver server

commit 3a0d2b90ef3766a8a105a8a9b04c92b8620eaeb1
Author: Michal 'vorner' Vaner <michal.vaner at nic.cz>
Date:   Thu Feb 21 09:35:33 2013 +0100

    Some random suggestions to sharing of port

commit edfad4a9bb65049599adaf9cf68baedf7bc072b7
Author: Shane Kerr <shane at isc.org>
Date:   Tue Feb 19 14:58:35 2013 +0100

    Some starting questions for recursive resolver research.

-----------------------------------------------------------------------

Summary of changes:
 doc/design/resolver/01-scaling-across-cores        |   21 +++
 .../resolver/02-mixed-recursive-authority-setup    |  150 ++++++++++++++++++++
 doc/design/resolver/03-cache-algorithm             |   22 +++
 doc/design/resolver/README                         |    5 +
 4 files changed, 198 insertions(+)
 create mode 100644 doc/design/resolver/01-scaling-across-cores
 create mode 100644 doc/design/resolver/02-mixed-recursive-authority-setup
 create mode 100644 doc/design/resolver/03-cache-algorithm
 create mode 100644 doc/design/resolver/README

-----------------------------------------------------------------------

diff --git a/doc/design/resolver/01-scaling-across-cores b/doc/design/resolver/01-scaling-across-cores
new file mode 100644
index 0000000..8fc376b
--- /dev/null
+++ b/doc/design/resolver/01-scaling-across-cores
@@ -0,0 +1,21 @@
+01-scaling-across-cores
+
+Introduction
+------------
+The general issue is how to insure that the resolver scales.
+
+Currently resolvers are CPU bound, and it seems likely that both
+instructions-per-cycle and CPU frequency will not increase radically,
+scaling will need to be across multiple cores.
+
+How can we best scale a recursive resolver across multiple cores?
+
+Some possible solutions:
+
+a. Multiple processes with independent caches
+b. Multiple processes with shared cache
+c. A mix of independent/shared cache
+d. Thread variations of the above
+
+All of these may be complicated by NUMA architectures (with
+faster/slower access to specific RAM).
diff --git a/doc/design/resolver/02-mixed-recursive-authority-setup b/doc/design/resolver/02-mixed-recursive-authority-setup
new file mode 100644
index 0000000..a1cc5f6
--- /dev/null
+++ b/doc/design/resolver/02-mixed-recursive-authority-setup
@@ -0,0 +1,150 @@
+Mixed recursive & authoritative setup
+=====================================
+
+Ideally we will run the authoritative server independently of the
+recursive resolver.
+
+We need a way to run both an authoritative and a recursive resolver on
+the same machine and listening on the same IP/port. But we need a way to
+run only one of them as well.
+
+This is mostly the same problem as we have with DDNS packets and xfr-out
+requests, but they aren't that performance sensitive as auth & resolver.
+
+There are a number of possible approaches to this:
+
+One fat module
+--------------
+
+With some build system or dynamic linker tricks, we create three modules:
+
+ * Stand-alone auth
+ * Stand-alone resolver
+ * Compound module containing both
+
+The user then chooses either one stand-alone module, or the compound one,
+depending on the requirements.
+
+Advantages
+~~~~~~~~~~
+
+ * It is easier to switch between processing and ask authoritative questions
+   from within the resolver processing.
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * The code is not separated (one bugs takes down both, admin can't see which
+   one takes how much CPU).
+ * BIND 9 does this and its code is a jungle. Maybe it's not just a
+   coincidence.
+ * Limits flexibility -- for example, we can't then decide to make the resolver
+   threaded (or we would have to make sure the auth processing doesn't break
+   with threads, which will be hard).
+
+There's also the idea of putting the auth into a loadable library and the
+resolver could load and use it somehow. But the advantages and disadvantages
+are probably the same.
+
+Auth first
+----------
+
+We do the same as with xfrout and ddns. When a query comes, it is examined and
+if the `RD` bit is set, it is forwarded to the resolver.
+
+Advantages
+~~~~~~~~~~
+
+ * Separate auth and resolver modules
+ * Minimal changes to auth
+ * No slowdown on the auth side
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * Counter-intuitive asymmetric design
+ * Possible slowdown on the resolver side
+ * Resolver needs to know both modes (for running stand-alone too)
+
+There's also the possibility of the reverse -- resolver first. It may make
+more sense for performance (the more usual scenario would probably be a
+high-load resolver with just few low-volume authoritative zones). On the other
+hand, auth already has some forwarding tricks.
+
+Auth with cache
+---------------
+
+This is mostly the same as ``Auth first'', however, the cache is in the auth
+server. If it is in the cache, it is answered right away. If not, it is then
+forwarded to the resolver. The resolver then updates the cache too.
+
+Advantages
+~~~~~~~~~~
+
+ * Probably good performance
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * Cache duplication (several auth modules, it doesn't feel like it would work
+   with shared memory without locking).
+ * Cache is probably very different from authoritative zones, it would
+   complicate auth processing.
+ * The resolver needs own copy of cache (to be able to get partial results),
+   probably a different one than the auth server.
+
+Receptionist
+------------
+
+One module does only the listening. It doesn't process the queries itself, it
+only looks into them and forwards them to the processing modules.
+
+Advantages
+~~~~~~~~~~
+
+ * Clean design with separated modules
+ * Easy to run modules stand-alone
+ * Allows for solving the xfrout & ddns forwarding without auth running
+ * Allows for views (different auths with different configurations)
+ * Allows balancing/clustering across multiple machines
+ * Easy to create new modules for different kinds of DNS handling and share
+   port with them too
+
+Disadvantages
+~~~~~~~~~~~~~
+
+ * Need to set up another module (not a problem if we have inter-module
+   dependencies in b10-init)
+ * Possible performance impact. However, experiments show this is not an issue,
+   and the receptionist can actually increase the throughput with some tuning
+   and the increase in RTT is not big.
+
+Implementation ideas
+~~~~~~~~~~~~~~~~~~~~
+
+ * Let's have a new TCP transport, where we send not only the DNS messages,
+   but also the source and destination ports and addresses (two reasons --
+   ACLs in target module and not keeping state in the receptionist). It would
+   allow for transfer of a batch of messages at once, to save some calls to
+   kernel (like a length of block of messages, it is read at once, then they
+   are all parsed one by one, the whole block of answers is sent back).
+ * A module creates a listening socket (UNIX by default) on startup and
+   contacts all the receptionists. It sends what kind of packets to send
+   to the module and the address of the UNIX socket. All the receptionists
+   connect to the module. This allows for auto-configuring the receptionist.
+ * The queries are sent from the receptionist in batches, the answers are sent
+   back to the receptionist in batches too.
+ * It is possible to fine-tune and use OS-specific tricks (like epoll or
+   sending multiple UDP messages by single call to sendmmsg()).
+
+Proposal
+--------
+
+Implement the receptionist in a way we can still work without it (not throwing
+the current UDPServer and TCPServer in asiodns away).
+
+The way we handle xfrout and DDNS needs some changes, since we can't forward
+sockets for the query. We would implement the receptionist protocol on them,
+which would allow the receptionist to forward messages to them. We would then
+modify auth to be able to forward the queries over the receptionist protocol,
+so ordinary users don't need to start the receptionist.
diff --git a/doc/design/resolver/03-cache-algorithm b/doc/design/resolver/03-cache-algorithm
new file mode 100644
index 0000000..42bfa09
--- /dev/null
+++ b/doc/design/resolver/03-cache-algorithm
@@ -0,0 +1,22 @@
+03-cache-algorithm
+
+Introduction
+------------
+Cache performance may be important for the resolver. It might not be
+critical. We need to research this.
+
+One key question is: given a specific cache hit rate, how much of an
+impact does cache performance have?
+
+For example, if we have 90% cache hit rate, will we still be spending
+most of our time in system calls or in looking things up in our cache?
+
+There are several ways we can consider figuring this out, including
+measuring this in existing resolvers (BIND 9, Unbound) or modeling
+with specific values.
+
+Once we know how critical the cache performance is, we can consider
+which algorithm is best for that. If it is very critical, then a
+custom algorithm designed for DNS caching makes sense. If it is not,
+then we can consider using an STL-based data structure.
+
diff --git a/doc/design/resolver/README b/doc/design/resolver/README
new file mode 100644
index 0000000..b6e9285
--- /dev/null
+++ b/doc/design/resolver/README
@@ -0,0 +1,5 @@
+This directory contains research and design documents for the BIND 10
+resolver reimplementation.
+
+Each file contains a specific issue and discussion surrounding that
+issue.