* Proposal for a BIND 10 shared cache resolver

** Design goals:

- simple, but powerful design
- possible to assign a cache module to one or more resolvers in a NUMA
  architecture to make sure the DNS cache data resides locally in the
  same CPU cache that the resolver uses
- possible to operate cache modules on dedicated machines in a BIND 10
  cluster environment
- reuse proven existing (DNS) technologies and tools (known to DNS
  operators), reducing support and training investment
- flexible configuration (all local, mixed local and remote caches)

** Design summary:

- communication between cache module and resolver modules using the
  DNS protocol (over IP or other IPC)
- cache module implemented as a special purpose DNS server

** Design detail

*** Cache module

- the cache module is a dedicated DNS server, similar to the "Auth"
  module, but instead serving resource records from files it serves
  DNS records from a cache memory structure
- the cache module "listens" on DNS queries on regular DNS INET socket
  (UDP/TCP port 53), but can also configured to use fast IPC methods
  available on the OS platform (unix domain sockets, named-pipes,
  shared memory ...)
- the cache module does not implement any resolver function. It can
  only lookup records in the cache. Successful lookups will be
  returned to the requestor (NOERROR or NXDOMAIN), cache misses return
  a RCODE 9 (NotAuth) (or another RCODE that makes sense to indicate a
  cache miss)
- the cache module can receive dynamic DNS updates. These updates will
  update the internal DNS cache memory structure. Dynamic DNS is the
  way how resolvers (and operators) will maintain the cache data.

*** Resolver module

- a resolver module can have one or more cache modules configured. The
  cache module configuration has an order ("local" cache modules are
  queried first, "remote" caches later)
- cache modules can be configured to be asynchronous, so that the
  resolver can dispatch queries to more than one cache module at a
  time (concurrent cache queries)
- the resolver will "forward" DNS queries to the cache modules in
  order or parallel. If no cache module responds with a positive
  (NOERROR or NXDOMAIN) answer, the resolver will start iterative
  recursion. For performance reasons, it might be possible to
  configure a resolver module to do "speculative" recursion, that is
  to start recursion at the same time that the caches are queried. The
  first positive answer received (either from cache or from recursion)
  will be used, all other queries (cache or recursion) will be stopped
- the resolver will do DNSSEC validation
- on successful iterative resolution, the resolver will update the
  cache using dynamic DNS update
- if both the cache module and the resolver are on the same physical
  machine, they should use fast IPC available to that OS platform

*** cache maintenance

Assumtion: DNS wildcards "*" in resource records only appear in
authoritative zones, never in cache data (as the wildcard is expanded
by the authoritative server). Please correct me if a wildcard can be
seen in a cache.

- the cache content can be inspected by normal DNS queries (dig or
  similar tools)
- the cache content can be listed by a "wildcard zone transfer" (dig
  (at)server "*.domain.tld" AXFR)
- the cache content can be updated by dynamic updates (implements the
  equivalent to "rndc flushname": "update delete domain.tld")
- a DNS tree hierachy in the cache can be removed by using a special
  "wildcard update": "update delete *.domain.tld" (implements the
  equivalent to "rndc flushtree")
- operators can "spoof" their cache by entering records into the cache
  with high TTLs
- security for dynamic updates over network is provided by TSIG

** benefits

- DNS operators and developers can use existing DNS lookup tools (dig)
  to examine the content of the cache and troubleshoot the operation
  of the cache module
- DNS operators and developers can use existing DDNS update tools
  (nsupdate) to maintain the cache
- existing sniffing tools (tcpdump, Etheral, snoop) that understand
  the DNS protocol can be used to inspect the communication between
  resolver modules and caches

** Example for a NUMA architecture configuration:

machine with 32 cores, 8 groups of 4 cores that share the same local CPU cache

- 24 resolvers started (3 pinned to each CPU group)
- 8  cache modules started (1 pinned to each CPU group)
- resolvers are configured to query their local (on the same CPU
  group) cache module first, then a cache that is on a different CPU
  group
- if none of the local cache modules know an answer, they could be
  configured to dispatch a query to a external cache system in the
  same datacenter, wich might be still faster than recursive
  resolution

** Problems

- DNSSEC validation is done in the resolver module. the result of a
  DNSSEC validation should be also stored in the cache, to prevent
  costly "re-validation" of already validated data. The DNS protocol
  does not have a default mechanism to update the state of DNSSEC
  validation over DDNS. It might be possible to augment the record
  update with a special, private record type that will trigger the
  result of DNSSEC validation to be stored in the cache. For
  signalling successful DNSSEC validation from the cache to the
  resolver, the AD flag could be used.