[bind10-dev] DNS cache persistence, was Resolver Address Database - Requirements and Design

Fri Oct 8 08:31:31 UTC 2010

Stephen, 

On Thu, 2010-10-07 at 16:11 +0100, Stephen Morris wrote:
> On 6 Oct 2010, at 15:56, Shane Kerr wrote:
> 
> > Does it make sense to add a requirement to allow the database to be
> > serialized in some way? This will allow us to store it to disk between
> > boots, and migrate it between machines.
> > 
> > This has certain design implications, depending on how it is done.
> > 
> > I suppose this may be feature bloat though, and perhaps added to a
> > wishlist.

I think for this we can view the Nameserver Address Store (NSAS?) as a
special-case DNS cache, and so think about the general problem of DNS
cache persistence.

> I had wondered about that, but had two questions:
> 
> a) How long will it take for the resolver to build up a full address
> database?  If only a short time, is there much to be gained by loading
> an old one from disk?

I am told that it can take 15 minutes or so for a busy name server to
build up its cache. This doesn't sounds like much, but if you have a
pair of resolvers at your ISP (for redundancy), that means that a
restart for whatever reason means a half hour of time when some of your
users are getting degraded service.

Less busy servers take longer - which actually means degraded service
for longer. Although less visible on the server side, the delays are
just as real for the users!

> b) How long would the system be down for?  The addresses "age" - after
> a couple of days (a typical A/AAAA record TTL) the addresses will need
> to be re-fetched anyway.  (Although the TTL is much shorter for the
> larger sites:  www.google.com points to the CNAME www.l.google.com
> which has a A record with a TTL of 300s.  www.facebook.com has a TTL
> of 120s.  And even the BBC is fairly short at 900s.)

I vaguely remember some studies of TTL - perhaps by KC at Caida? - which
showed that TTLs tend to fall into a few obvious "buckets". So, 5
minutes, 15 minutes, 1 hour, 1 day, and so on.

For my own systems, which I suspect is not unusual, the main time when a
system would be down is when the system has to be rebooted to run a new
kernel (either due to a security patch or a system upgrade). This will
take a few minutes (ironically longer on a server than on a netbook),
but I would guess that a large percentage of the cache is still usable
after this.

> In some circumstances though I can see users wanting to do it.  I
> think that adding it to a feature backlog is the best approach.

Makes sense.

--
Shane