"not subdomain of zone {XXXX} -- invalid response" errors found in named.run log

Thu Jan 14 08:00:50 UTC 2021

Thanks you so much, Mark.

Based on your input, I successfully found the culprit.... It's one of the LDNS. It's supposed to config the zone as "xx.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org". But somehow it's been configed as "node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org", which is not delegated to this LDNS.

What's more, only the newly added root DNS will reply with the "real" incorrect zone "node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org". The old one's reply is "correct", it's "xx.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org" (Maybe the old one doesn't query the LDNS and reply the query with it's own configuration).

When the servers query the newly added dns and cache the incorrect zone "node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org", the invalid log keeps popping up.

In summary, the issue happens on two conditions:
1. the incorrect configuration in LDNS.
2. the newly added root DNS, whose mechanism is different from the old one.

------------------ Original ------------------
From:  "marka";<marka at isc.org>;
Send time: Thursday, Jan 14, 2021 11:12 AM
To: "同屋"<39223722 at qq.com>; 
Cc: "Bind-users"<Bind-users at lists.isc.org>; 
Subject:  Re: "not subdomain of zone {XXXX} -- invalid response" errors found in named.run log

> On 7 Jan 2021, at 00:57, 同屋 <39223722 at qq.com> wrote:
> 
> Actually, the background is a little bit complicated. In short, the topo is as belows. dns1 were swapped by a new one (say dns1*), then the issue happened. After that, we dropped all the AAAA request from dns1*, then the issue was gone.

Well if you stop making requests that result in negative responses (NXDOMAIN or NOERROR/NODATA) you no longer send responses with the incorrect SOA record in the authority section.

> There is no config change during the whole process, no idea why the caching server has such log.

You get such logs because there are servers that are misconfigured.  If you delegate a zone to a server then ALL negative responses for queries in that delegated namespace should be coming back with a SOA record that matches the delegated zone.  Named checks the returned SOA record in the authority section and if it isn’t a expected value then named logs the messages you are seeing.

You can reproduce this with the following setup where example.com is delegated to server1.example.com and child.example.com is delegated to server2.example.com but it is incorrectly configured for a different version of
example.com.

server1.example.com(192.0.2.1):
example.com.		SOA	server1.example.com. . 0 0 0 0 0
example.com.		NS	server1.example.com.
server1.example.com.	A	192.0.2.1
server2.example.com.	A	192.0.2.2
child.example.com.	NS	server2.example.com.

server2.example.com(192.0.2.2):
example.com.		SOA	server2.example.com. . 0 0 0 0 0
example.com.		NS	server2.example.com.
server2.example.com.	A	192.0.2.2
child.example.com.	A	192.0.2.3

A proper delegation would have:

server2.example.com(192.0.2.2):
child.example.com.	SOA	server2.example.com. . 0 0 0 0 0
child.example.com.	NS	server2.example.com.
child.example.com.	A	192.0.2.3

Load balancers often end up with broken configuration because, it appears, the documentation is not clear enough.  The load balancing software knows about A queries and returns for them but punts all the other queries to a backing server which instead of being configured with the zone child.example.com is configured with the zone example.com which contains just the SOA and NS records.

example.com.		SOA	server1.example.com. . 0 0 0 0 0
example.com.		NS	server1.example.com.

Client -> load balancer -> backing server.

If you ask for child.example.com/A you get back a A record with the computed value.

If you ask for child.example.com/AAAA the load balancer says this not something I deal with and passes the request on to the backing nameserver which, because it has been configured to serve example.com instead of child.example.com, returns a negative response with example.com as the owner name of the SOA record rather than a child.example.com SOA record that is expected.

Mark

> --------       ---------
> |dns1  |      | dns2 |
> --------       ---------
>     |                 |
>      --------------
>              |
>    -----------------
>   |caching server|  (where the log was observed)
>   ------------------
> 
> ------------------ Original ------------------
> From:  "同屋";<39223722 at qq.com>;
> Send time: Wednesday, Jan 6, 2021 8:43 PM
> To: "同屋"<39223722 at qq.com>; "marka"<marka at isc.org>;
> Cc: "Bind-users"<Bind-users at lists.isc.org>;
> Subject:  re:Re: "not subdomain of zone {XXXX} -- invalid response" errors found in named.run log
> 
> Thanks mark, but why this issue is related to load balancer?
> 
> 
> 
> ------------------ Original Message ------------------
> From: "Mark Andrews";
> Date: 2021-01-06 19:09
> To: "同屋"<39223722 at qq.com>;
> To:
> "bind-users";
> 
> Subject: Re: "not subdomain of zone {XXXX} -- invalid response" errors found in named.run log
> 
> 
> Complain to the administrators of the zone. They have not properly delegated it.  We see this often with load balancers.
> 
> The zone a.b.example has been delegated but the answer is as if it is from b.example. 
> 
> --
> Mark Andrews
> 
>> On 6 Jan 2021, at 21:02, 同屋 <39223722 at qq.com> wrote:
>> 
>> 
>> The version of bind is BIND 9.10.5-P3 id:7d5676f 
>> 
>> One day, I found that the size of named.run is increasing very quickly. And a lot of "invalid response" entries were spotted in the log. Details is as follows (I replace the sensitive info with  {xxxx},{AAA} etc.)
>> 
>> DNS format error from {IP}#53 resolving {XXXX}.bf.bf.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org/AAAA for client 169.254.4.50#51099: Name epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org (SOA) not subdomain of zone node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org -- invalid response
>> 
>> The response related to the above log is as follows:
>> 
>> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50664 ;; flags: qr aa rd ra; QUESTION: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags: do; udp: 4096 ;; QUESTION SECTION: ;{XXXX}.bf.bf.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org. IN AAAA
>> 
>> ;; AUTHORITY SECTION: ;epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org. 86400 IN SOA	.mnc{AAA}.mcc{BBB}.gprs. dns-admin. ( ;						2020122704 ; serial ;	10800 ; refresh (3 hours) ;	3600 ; retry (1 hour) ;	604800 ; expire (1 week) ;	86400 ; minimum (1 day) ;	)
>> 
>> ============================================
>> 
>> Normally, the FQDN should be cached as a NXRRSET record as follows:
>> 
>> {XXXX}.bf.bf.node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org. 8412 -AAAA ;-$NXRRSET
>> 
>> But when the issue happens, it cannot be cached, I guess it's related to the "invalid response" log.
>> 
>> From the error log, it mentions "zone node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org", but I'm wondering where the zone "node.epc.mnc{AAA}.mcc{BBB}.3gppnetwork.org" comes from? I cannot found the related SOA record in the dump file.
>> 
>> _______________________________________________
>> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
>> 
>> ISC funds the development of this software with paid support subscriptions. Contact us at https://www.isc.org/contact/ for more information.
>> 
>> 
>> bind-users mailing list
>> bind-users at lists.isc.org
>> https://lists.isc.org/mailman/listinfo/bind-users

-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742              INTERNET: marka at isc.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/bind-users/attachments/20210114/d74e3521/attachment-0001.htm>