Really odd one: parts of global DNS just dropped off the map

Jim Reid jim at rfc1035.com
Thu Nov 25 07:59:44 UTC 2004


>>>>> "Andy" == Andy Holyer <andy at holyer.org> writes:

    Andy> It seemed a bunch of (mainly US-based) sites were failing
    Andy> DNS. Other (UK-based) worked fine. Trying another server in
    Andy> the same facility gave the same result. Finally I switced
    Andy> forwarders to another ISP and called it a day.

Sounds like there was some sort of routing or connectivity problem:
nothing to do with the DNS. Take this up with your ISP.

Why are you using forwarding? This is silly, dangerous and
pointless. Consult the list archives for an explanation. Perhaps your
DNS infrastructure has been forwarding queries to servers that were
broken or had connectivity problems? This is one of the reasons why
people should run their own name servers: when something goes wrong,
there are less links in the chain to troubleshoot.

    Andy> This morning I switched things back and all appears
    Andy> fine. however digging around, I don't get ping response from
    Andy> about half the hosts in named.root. b.root-servers.net, for
    Andy> example. Now, I can understand that root servers would just
    Andy> turn off ICMP echo since they're busy enough as it is, but
    Andy> it still worries me a bit.

Why? If you want to know if a name server is running, query it!
Sending a ping only establishes if there's connectivity: it doesn't
prove the target is running a working name server. And many busy name
servers (and networks) rate-limit inbound ICMP traffic or don't let it
ping traffic through at all.

    Andy> I've never seen this sort of behaviour before, and I'm not
    Andy> at all sure where to start in finding out what's going on,
    Andy> and whether there's some subtle mis-configuration on my
    Andy> part. From my part, the serial number in my root db files
    Andy> tells me that I haven't touched the named config since early
    Andy> June,

Serial numbers in zone files tell you nothing. They're only used for
comparisons: ie which version of a zone is most recent. They don't
(have to) relate to dates. They have nothing to do with the
configuration of a name server. Or the network. If what you meant to
say was your DNS configuration hasn't changed for a while, then the
problem you describe was most likely caused by something else. So
check those things: routing, firewalls, peering at upstream providers,
etc, etc.,

    Andy> *Any* advice as to where I could go from here to ensure
    Andy> integrity of DNS is most gratefully recieved.

Are you volunteering to fix every broken name server configuration and
zone file on the planet? :-) Good for you! :-)

All you can do is make sure your own servers are working correctly:

[1] Don't use forwarding. Ever.
[2] Always run up to date DNS software.
[3] Put the zone files and config files under version control.
[4] Check these files before feeding them to a name server.
    named-checkzone and named-checkconf are your friends.
[5] Disable recursive service for non-local users.
[6] Monitor the name server's logs and act on any errors.
[7] Document your DNS installation & management processes: zone file
    changes, upgrades, where name servers are located and which users
    they serve, contact names and addresses, SLAs with slave server
    operators, problem esclation procedurs, support arrangements, etc, etc
[8] Follow the advice in RFC2182 and BCPs on DNS operations.
    google will point you at these.



More information about the bind-users mailing list