Help with unresolvable domain (subdomain, actually)

Kevin Darcy kcd at chrysler.com
Tue Mar 1 22:25:44 UTC 2011


I got a trouble ticket on this too.

 From the looks of things, Cisco is using GSSes to load-balance this 
site. GSSes return SERVFAIL if all of the resources behind the 
load-balancer are down (which it determines via a heartbeat mechanism). 
So I think this is a "simple" case of a website (or cluster) going down. 
It was down earlier today, then up again, as of this writing, it is down 
again.

DNS doesn't really have a response code of "requested resource not 
available", so SERVFAIL is Cisco's closest approximation. It has the 
drawback, however, of often making other sorts of problems appear to be 
DNS problems. That's just a cross that we DNS admins have to bear...

                                                                         
                                                                         
                                             - Kevin

On 3/1/2011 4:08 PM, Mike Bernhardt wrote:
> I should add that tools.cisco.com was resolvable at one time, so either
> Cisco's behavior has changed, or our firewall's behavior has changed. We
> obviously haven't upgraded our BIND version in a while (9.4.3P3), so I don't
> think the problem is BIND.
>
> -----Original Message-----
> From: Mike Bernhardt [mailto:bernhardt at bart.gov]
> Sent: Tuesday, March 01, 2011 12:40 PM
> To: bind-users at lists.isc.org
> Subject: Help with unresolvable domain (subdomain, actually)
>
> For some reason, we can no longer resolve tools.cisco.com. there are several
> clues to the problem but I can't put them together. Here is some dig output.
> I know that the time stamps don't all match up below, but the results are
> typical:
>
> [root at ns1 ~]# dig +trace -b 148.165.3.10 tools.cisco.com
>
> ;<<>>  DiG 9.4.3-P3<<>>  +trace -b 148.165.3.10 tools.cisco.com
> ;; global options:  printcmd
> .                       90550   IN      NS      i.root-servers.net.
> .                       90550   IN      NS      h.root-servers.net.
> .                       90550   IN      NS      e.root-servers.net.
> .                       90550   IN      NS      d.root-servers.net.
> .                       90550   IN      NS      j.root-servers.net.
> .                       90550   IN      NS      k.root-servers.net.
> .                       90550   IN      NS      l.root-servers.net.
> .                       90550   IN      NS      g.root-servers.net.
> .                       90550   IN      NS      f.root-servers.net.
> .                       90550   IN      NS      a.root-servers.net.
> .                       90550   IN      NS      m.root-servers.net.
> .                       90550   IN      NS      c.root-servers.net.
> .                       90550   IN      NS      b.root-servers.net.
> ;; Received 512 bytes from 148.165.3.10#53(148.165.3.10) in 0 ms
>
> com.                    172800  IN      NS      l.gtld-servers.net.
> com.                    172800  IN      NS      e.gtld-servers.net.
> com.                    172800  IN      NS      k.gtld-servers.net.
> com.                    172800  IN      NS      i.gtld-servers.net.
> com.                    172800  IN      NS      m.gtld-servers.net.
> com.                    172800  IN      NS      j.gtld-servers.net.
> com.                    172800  IN      NS      a.gtld-servers.net.
> com.                    172800  IN      NS      g.gtld-servers.net.
> com.                    172800  IN      NS      c.gtld-servers.net.
> com.                    172800  IN      NS      f.gtld-servers.net.
> com.                    172800  IN      NS      b.gtld-servers.net.
> com.                    172800  IN      NS      d.gtld-servers.net.
> com.                    172800  IN      NS      h.gtld-servers.net.
> ;; Received 505 bytes from 198.41.0.4#53(a.root-servers.net) in 13 ms
>
> cisco.com.              172800  IN      NS      ns1.cisco.com.
> cisco.com.              172800  IN      NS      ns2.cisco.com.
> ;; Received 101 bytes from 192.54.112.30#53(h.gtld-servers.net) in 154 ms
>
> tools.cisco.com.        86400   IN      NS
> rcdn9-14p-dcz05n-gss1.cisco.com.
> tools.cisco.com.        86400   IN      NS      rtp5-dmz-gss1.cisco.com.
> tools.cisco.com.        86400   IN      NS      sjck-dmz-gss1.cisco.com.
> tools.cisco.com.        86400   IN      NS
> cax01-bb14-dcz01n-gss1.cisco.com.
> ;; Received 226 bytes from 64.102.255.44#53(ns2.cisco.com) in 75 ms
>
> ;; Received 33 bytes from 72.163.4.28#53(rcdn9-14p-dcz05n-gss1.cisco.com) in
> 47 ms
>
> Now, focusing in on rtp5-dmz-gss1.cisco.com for further analysis (just
> picked it out of the group):
> [root at ns1 ~]# dig -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com tools.cisco.com
>
> ;<<>>  DiG 9.4.3-P3<<>>  -b 148.165.3.10 @rtp5-dmz-gss1.cisco.com
> tools.cisco.com
> ; (1 server found)
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 5165
> ;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
> ;; WARNING: recursion requested but not available
>
> ;; QUESTION SECTION:
> ;tools.cisco.com.               IN      A
>
> ;; Query time: 75 msec
> ;; SERVER: 64.102.246.5#53(64.102.246.5)
> ;; WHEN: Tue Mar  1 12:22:57 2011
> ;; MSG SIZE  rcvd: 33
>
>
> Here is the output of tcpdump on my server, querying the same server via
> nslookup elsewhere:
> [root at ns1 ~]# tcpdump host -i bond0 64.102.246.5 -n -p -vvv
> tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size 96
> bytes
> 12:14:53.373614 IP (tos 0x0, ttl  64, id 45237, offset 0, flags [none],
> proto: UDP (17), length: 61) 148.165.3.10.18673>  64.102.246.5.domain: [bad
> udp cksum a78b!]  26095 A? tools.cisco.com. (33)
> 12:14:53.455684 IP (tos 0x0, ttl  54, id 7623, offset 0, flags [DF], proto:
> UDP (17), length: 61) 64.102.246.5.domain>  148.165.3.10.18673: [udp sum ok]
> 26095 ServFail- q: A? tools.cisco.com. 0/0/0 (33)
>
> Lastly, I see on our firewall log that we have a Checkpoint Smart Defense
> log entry due to it's belief that Cisco is sending us a malformed query
> packet, and it's being dropped. I don't know why they're sending the query
> in the first place.
> Number:                	2595791
> Date:                      	1Mar2011
> Time:                     	12:22:53
> Type:                     	Log
> Action:                   	Drop
> Service:                 	domain-udp (53)
> Source Port:          	domain-udp
> Source:                  	rtp5-dmz-gss1.cisco.com
> Destination:           	ns
> Protocol:                	udp
> Information:           	Packet info: Packet data size: 28
> Attack:                    	Malformed Packet
> Attack Information:	UDP length error
>
>
> Any ideas as to where the problem lies so I can pursue it further?
>
>
>
> _______________________________________________
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
>
>
>





More information about the bind-users mailing list