Ross.Boylan at ucsf.edu
Mon Nov 16 22:58:22 UTC 2020
I have been experiencing NXDOMAIN errors persistently, though not 100% of the time, for a machine I am trying to reach. The queries worked OK before today. I not only don't know what's causing it, but am having trouble tracing what's going on inside of bind. I'd be grateful for help on either front, getting DNS to work or debugging.
There are a lot of complications. In brief, the machine and name resolution for it are only available through VPN; I have a search list which should cause some failed lookups if the original doesn't work; and I'm using views. Some details follow, and then discussion of my debugging attempts.
The remote machine is only accessible though VPN, and the nameserver that knows how to find it is also accessible only through VPN. The IP of that nameserver is first on my forwarders list on my local machine. When failures happen the replies indicate the request was addressed to the public-facing nameservers; it is good that they don't provide any info, but they shouldn't be getting the request.
I also added the target domain (ucsf.edu) to my search list. So when I ask for mymachine.ucsf.edu, this will also generate a query for mymachine.ucsf.edu.ucsf.edu if the first query fails. The second query is asking for a non-existent domain, and so maybe that is the proximate source of the NXDOMAIN.
The machine I'm making the query from is in my own domain, which is why I'm running BIND. I use views, and the query is processed through my "inside" view according to the logs. ucsf.edu is NOT a domain I manage.
I directed, either explicitly or via default, all channels to a file and I have tried rndc trace as high as 4. But I can't tell if the values are coming from the cache or where external queries are going. Even after flushing the cache I didn't see any info on outbound queries. I tried using the query-errors channel first, but it didn't seem to capture anything. I guess NXDOMAIN is not considered an error.
Occasionally I've had success, particularly after flushing the cache (though that doesn't always work). But when I try 30 seconds later I get NXDOMAIN.
Every query I have directed explicitly (with dig) at the campus nameserver has succeeded.
The VPN connection has always been a bit touchy, and the problem first arose immediately after it went down for somewhere between 30 seconds and a couple of minutes. My initial theory was that had caused a failure to be cached, but the way I get failures right after successes is not consistent with that.
Thanks for any help.
More information about the bind-users