bind 9.11.3 - resolving troubles running as a caching server

Bind Mailinglist bindbandbund at ggaweb.ch
Thu Nov 21 10:55:17 UTC 2019



Am 21.11.2019 um 11:47 schrieb Bind Mailinglist:
> Hello Ondřej
> Interesting case and not so easy to detect. But I was able to get a
> few steps further.
> As I have always to clear cache for host
> tm.inregion.waas.oci.oraclecloud.net I focused monitoring on that.
> 1.
> On my caching servers I was tracing this host with wireshark. In most
> cases my other servers replyed to the queries (most A, some CNAME)
> with an other CNAME.
> When the problem appears, the last reply was a SOA from my DNS server.
> So why sends my DNS server such a SOA reply to the cache server?
> 2.
> So I was trying to do the same on my DNS servers.
> And there all A queries for tm.inregion.waas.oci.oraclecloud.net were
> replied from authoritative servers with a CNAME and a very dynamic
> host. Maybe quite normal for this oracle cloud.
> But there were a few CNAME queries for the same host. And for CNAME
> queries I allways got an SOA answer.
> About 1.5s my server queries again for an A record which has been
> answered.
>
> What happens when my cache queries my DNS server for the same host at
> the time between SOA reply and next A reply from the authoritative server?
>
> I can reproduce it like this:
>
> The CNAME query:
>
>     $ dig @ns1.p17.dynect.net tm.inregion.waas.oci.oraclecloud.net CNAME
>
>     ; <<>> DiG 9.9.5-3ubuntu0.19-Ubuntu <<>> @ns1.p17.dynect.net
>     tm.inregion.waas.oci.oraclecloud.net CNAME
>     ; (2 servers found)
>     ;; global options: +cmd
>     ;; Got answer:
>     ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24630
>     ;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
>     ;; WARNING: recursion requested but not available
>
>     ;; OPT PSEUDOSECTION:
>     ; EDNS: version: 0, flags:; udp: 4096
>     ;; QUESTION SECTION:
>     ;tm.inregion.waas.oci.oraclecloud.net. IN CNAME
>
>     ;; AUTHORITY SECTION:
>     inregion.waas.oci.oraclecloud.net. 1800 IN SOA 
>     ns1.p17.dynect.net. hostmaster.inregion.waas.oci.oraclecloud.net.
>     1574248545 3600 600 604800 1800
>
>     ;; Query time: 15 msec
>     ;; SERVER: 2001:500:90:1::17#53(2001:500:90:1::17)
>     ;; WHEN: Thu Nov 21 11:44:41 CET 2019
>     ;; MSG SIZE  rcvd: 127
>
>
> The A query:
>
>     $ dig @ns1.p17.dynect.net tm.inregion.waas.oci.oraclecloud.net A
>
>     ; <<>> DiG 9.9.5-3ubuntu0.19-Ubuntu <<>> @ns1.p17.dynect.net
>     tm.inregion.waas.oci.oraclecloud.net A
>     ; (2 servers found)
>     ;; global options: +cmd
>     ;; Got answer:
>     ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55743
>     ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 1
>     ;; WARNING: recursion requested but not available
>
>     ;; OPT PSEUDOSECTION:
>     ; EDNS: version: 0, flags:; udp: 4096
>     ;; QUESTION SECTION:
>     ;tm.inregion.waas.oci.oraclecloud.net. IN A
>
>     ;; ANSWER SECTION:
>     tm.inregion.waas.oci.oraclecloud.net. 30 IN CNAME
>     eu-switzerland.inregion.waas.oci.oraclecloud.net.
>
>     ;; AUTHORITY SECTION:
>     inregion.waas.oci.oraclecloud.net. 86400 IN NS  ns4.p17.dynect.net.
>     inregion.waas.oci.oraclecloud.net. 86400 IN NS  ns3.p17.dynect.net.
>     inregion.waas.oci.oraclecloud.net. 86400 IN NS  ns1.p17.dynect.net.
>     inregion.waas.oci.oraclecloud.net. 86400 IN NS  ns2.p17.dynect.net.
>
>     ;; Query time: 14 msec
>     ;; SERVER: 2001:500:90:1::17#53(2001:500:90:1::17)
>     ;; WHEN: Thu Nov 21 11:45:38 CET 2019
>     ;; MSG SIZE  rcvd: 255
>
> But I'm still if that is my problem.
> Regard Florian
>
>
>
> Am 20.11.2019 um 18:16 schrieb Ondřej Surý:
>> The cache shows you that the forwarder reported that there’s no such record returned from the upstream resolvers.
>>
>> The NXRRSET means - Non-eXistant Resource Record Set, e.g. your resolvers cached the non-existence of the name returned from the upstream resolvers.
>>
>> The other option would be running the affected query against the upstream resolvers in a semi-tight loop and log the results.
>>
>> while true; do echo "$(date -R): $(dig +short IN A <domain> @<forwarder>)“; sleep 1; done
>>
>> Ondrej
>> --
>> Ondřej Surý
>> ondrej at isc.org
>>
>>> On 21 Nov 2019, at 01:09, Bind Mailinglist <bindbandbund at ggaweb.ch> wrote:
>>>
>>> Hello Ondřej
>>> Many thanks for your answer. Hope debugging can help me without server overloading.
>>> They have around 1500 queries/s peakload during eveninghours. It will need some time to log exactly this effect.
>>> At the moment I have the following lines disabled:
>>>         // forwarders {
>>>         //        213.160.41.2;
>>>         //        213.160.40.34;
>>>         // };
>>> About the AAAA answer. Does it matter if I query A or AAAA if there is only a CNAME as an answer?
>>> My last test shows me following cache entry. This has happend around 20min after restarting bind with my forwarders enabled.
>>> ; answer
>>> tm.inregion.waas.oci.oraclecloud.net. 1697 \-A ;-$NXRRSET
>>> Could a server timeout ends up in such a cache entry? Or does it need a valid answer from the forwarders? What you think.
>>> I tried to force forwarding by adding "forwarding only" but the result was the same.
>>>
>>> Regards Florian
>>>
>>>
>>> Am 20.11.2019 um 11:58 schrieb Ondřej Surý:
>>>> Hi,
>>>>
>>>> you mentioned “forwarders” - what are these and how does AAAA answer look like on the upstream forwarders?
>>>>
>>>> I would recommend enabling higher debug level (start with -d 1) and look into logs what was the answer from the forwarders preceding the failure.
>>>>
>>>> Ondrej
>>>> --
>>>> Ondřej Surý — ISC
>>>>
>>>>
>>>>> On 20 Nov 2019, at 18:44, Bind Mailinglist <bindbandbund at ggaweb.ch>
>>>>>  wrote:
>>>>>
>>>>> Hello list
>>>>> I'm glad there is such an active list. Hope there is anybody out there
>>>>> who can help me with my little problem. :-)
>>>>> We are running six bind server ( all Ubuntu LTS 18.04 with bind 9.11.3
>>>>> ), so they are pretty up to date.
>>>>> Three of them have authoritative zones, one is for testing and two are
>>>>> just caching servers. And there starts my problem.
>>>>> 1. It only appears on my caching servers and only if I use my other
>>>>> servers as forwarders.
>>>>> 2. At the moment the problem appears on my chaching servers I'm still
>>>>> able to let it resolve through my forwarders.
>>>>> 3. Only one organisation with several newspapers are affected. There may
>>>>> be others but I don't know at the moment.
>>>>>
>>>>> Ok, all these newspapers are hosted on oraclecloud with short timers
>>>>> around 30s.
>>>>>
>>>>> # dig 
>>>>> www.20min.ch
>>>>>
>>>>> ;; ANSWER SECTION:
>>>>>
>>>>> www.20min.ch
>>>>> .           39      IN      CNAME  
>>>>> tamedia.a.inregion.waas.oci.oraclecloud.net.
>>>>> tamedia.a.inregion.waas.oci.oraclecloud.net. 16 IN CNAME
>>>>> tm.inregion.waas.oci.oraclecloud.net.
>>>>> tm.inregion.waas.oci.oraclecloud.net. 16 IN CNAME
>>>>> eu-london.inregion.waas.oci.oraclecloud.net.
>>>>> eu-london.inregion.waas.oci.oraclecloud.net. 28 IN A 138.1.82.213
>>>>> eu-london.inregion.waas.oci.oraclecloud.net. 28 IN A 147.154.234.67
>>>>> eu-london.inregion.waas.oci.oraclecloud.net. 28 IN A 147.154.228.138
>>>>>
>>>>> # dig 
>>>>> www.tagesanzeiger.ch
>>>>>
>>>>> ;; ANSWER SECTION:
>>>>>
>>>>> www.tagesanzeiger.ch
>>>>> .   113     IN      CNAME   cnp-a-cre-p.newsnetz.ch.
>>>>> cnp-a-cre-p.newsnetz.ch. 113    IN      CNAME  
>>>>> tamedia.a.inregion.waas.oci.oraclecloud.net.
>>>>> tamedia.a.inregion.waas.oci.oraclecloud.net. 11 IN CNAME
>>>>> tm.inregion.waas.oci.oraclecloud.net.
>>>>> tm.inregion.waas.oci.oraclecloud.net. 12 IN CNAME
>>>>> eu-switzerland.inregion.waas.oci.oraclecloud.net.
>>>>> eu-switzerland.inregion.waas.oci.oraclecloud.net. 12 IN A 192.29.59.121
>>>>> eu-switzerland.inregion.waas.oci.oraclecloud.net. 12 IN A 192.29.58.46
>>>>> eu-switzerland.inregion.waas.oci.oraclecloud.net. 12 IN A 192.29.58.42
>>>>>
>>>>>
>>>>> Now if I use my caching servers with forwarders enabled I run quite
>>>>> often into cases where resolving stops working for theses two domains at
>>>>> the same time.
>>>>> When I take a dump I see the following line:
>>>>> ; answer
>>>>> tm.inregion.waas.oci.oraclecloud.net. 893 \-AAAA ;-$NXRRSET
>>>>>
>>>>> I have to clear this host from cache to make it working again, for a few
>>>>> minutes.
>>>>> The stupid thing, this NXRRSET cache entry has a much higher lifetime.
>>>>> And so resolving stops working on my caching servers for more then 15min.
>>>>>
>>>>> Any idea how I could find out why this happens?
>>>>> There must be something between my DNS servers. They are in the same
>>>>> network, so there is no firewall between.
>>>>>
>>>>> Many thanks and regards
>>>>> Florian
>>>>>
>>>>> _______________________________________________
>>>>> Please visit 
>>>>> https://lists.isc.org/mailman/listinfo/bind-users
>>>>>  to unsubscribe from this list
>>>>>
>>>>> bind-users mailing list
>>>>>
>>>>> bind-users at lists.isc.org
>>>>> https://lists.isc.org/mailman/listinfo/bind-users
>>> _______________________________________________
>>> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>>>
>>> bind-users mailing list
>>> bind-users at lists.isc.org
>>> https://lists.isc.org/mailman/listinfo/bind-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20191121/a9209601/attachment.htm>


More information about the bind-users mailing list