Strange problem with a query deleting a record...

Fri Aug 23 15:19:29 UTC 2013

On 8/22/2013 12:55 PM, johnh at primebuchholz.com wrote:
> Greetings All,
>
> First of all, I apologize if this is out of place - I'm having a very
> strange issue that is either a problem with bind itself, or at least,
> affecting it.  Summary:
>
> For only ONE address, whenever I attempt to access it through my squid
> proxy, the record disappears from DNS, and the retry time changes too.
> Essentially, accessing www.thisdomain.com works, but a link to a portal on
> that page to the subdomain login.thisdomain.com causes the problem.  I'm
> willing to bet the problem lies with squid, but as to how it could
> possibly change a record in bind... Well, I'm stumped.  If you don't go
> through squid, everything works.  All other requests to bind for the
> address of the host in question work fine. Here's a the output of dig from
> before accessing the page through squid:
>
> ; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 45037
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;login.thisdomain.com.            IN      A
>
> ;; ANSWER SECTION:
> login.thisdomain.com.     17      IN      A       111.222.333.123
>
> ;; AUTHORITY SECTION:
> thisdomain.com.         168319  IN      NS      ns1.thisdomain.com.
> thisdomain.com.         168319  IN      NS      ns2.thisdomain.com.
>
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Thu Aug 22 12:29:57 2013
> ;; MSG SIZE  rcvd: 88
>
> You can do anything to request the address from bind and it works,
> *except* try to access it through squid.  Bypassing squid and going
> directly through the firewall works fine.
>
> Now, immediately after you try to access it through squid:
>
> ; <<>> DiG 9.4.1-P1 <<>> login.thisdomain.com
> ;; global options:  printcmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 43943
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
>
> ;; QUESTION SECTION:
> ;login.thisdomain.com.            IN      A
>
> ;; AUTHORITY SECTION:
> thisdomain.com.         298     IN      SOA     ns1.thisdomain.com.
> serv.anotherdomain.com. 2006062510 3600 3600 2592000 300
>
> ;; Query time: 0 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Thu Aug 22 12:30:06 2013
> ;; MSG SIZE  rcvd: 95
>
> After the 5-minute retry shown above expires, the original record
> reappears.
>
> Ideas?  I'm stumped.  It seems like squid is somehow able to corrupt
> bind's info, but I can't imagine how.
I have a theory. If this is a name that's hosted on a stupid 
load-balancer, and that load-balancer doesn't understand non-A-record 
query types, then if Squid is sending a non-A query type (e.g. SRV, 
possibly even AAAA, if it's *really* stupid), then the load-balancer may 
be erroneously "poisoning" your cache with an NXDOMAIN response.

We ran into this many years ago with Cisco GSSes (Global Site Selectors) 
and work around it by having a "shadow" version of the zone, which the 
GSSes proxy to for QTYPEs they don't handle. That "shadow" version of 
the zone has a wildcard entry in it which forces responses to be NODATA 
instead of NXDOMAIN, and this prevents the cache poisoning.

                                                             - Kevin