Testers sought for patch that reduces the number of outgoing queries

Ondřej Surý ondrej at isc.org
Tue Nov 4 12:13:58 UTC 2025


Hi,

A couple of recent changes forced by the security fixes and general hardening made BIND 9 go in two different directions:

1. Limiting the number of outgoing queries: max-recursion-queries (single query without restarts in resolution), max-query-count (total number per single client query), max-query-restarts (how many CNAME/DNAME restarts), and max-recursion-depth (maximum levels of recursion) - these are described more in detail in the ARM.

2. sending more outgoing queries - validating nameserver queries (ADB), ignoring extra records in the incoming DNS messages, and asking for these explicitly (some types of GLUE are now ignored). BIND 9 was designed from the very beginning to fill up the caches as quickly as possible; cache memory still has lower latency than network even nowadays.

The aftermath is that the recursive server with cold cache might return SERVFAIL on the first try for some names because of TLDs referring other TLDs, long CDN CNAME chains jumping from domain to domain, etc.

One of the most recent examples that were given on the mailing list was teams.microsoft.com:

Asking for this name on a cold cache BIND 9.20 server ends with

$ grep -c "sending packet to" named.run
114

On 9.21 (future 9.22):

$ grep -c "sending packet from" named.run
110

As you can see, there are more than 100 outgoing DNS queries for a single name queried, and often this leads to a SERVFAIL. Verisign's Transitive Trust Checker can be used to visualize this: https://trans-trust.verisignlabs.com/?z=teams.microsoft.com

Another example that recently circulated around was a reverse name: https://trans-trust.verisignlabs.com/?z=195.5.90.45.in-addr.arpa

$ grep -c "sending packet to" named.run
166

Now, there is a merge request in preparation that reduces the number of outgoing queries by not delaying the fanning out on the nameservers. Instead of sending A and AAAA queries for each nameserver in the set, it sends one and waits 100 ms for the response. If the response is not received, it continues with the next server, and so on and so on until all nameservers have been tried. For 2 nameservers, this might incur a 100 ms delay if the first one does not respond (or is slow). For 13 nameservers, we suddenly get a 1200 ms delay if all but the last one is not responsive.

This is a big change to the way the resolver operates, and thus we would like to gather some real-world data from people willing to run their resolvers with this patch.

However, there are a couple of requirements, especially you must:
1. know how to patch and compile the named from the source (and perhaps do that more than once).
2. be willing to communicate about this on the GitLab merge request (https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/11205), new updates will be posted there.
3. know how to compile BIND 9 with debug symbols and either keep them inside the binaries or use the detached symbols. Either is fine, but a possible coredump that shows just "???" instead of symbols is mostly unusable.
3. be willing to share testing cases, both where it helped and where it didn't.
4. not get angry if named crashes, doesn't work, etc.

The whole MR is mostly still a work in progress; the extra system tests that would test the timed-fallbacks are still missing. And that's also a reason why we are looking for some extra testers that might provide us with real-world examples of what is currently broken.

Now, how does the patched version improve things?

- teams.microsoft.com

$ grep -c "sending packet to" named.run
79

- 195.5.90.45.in-addr.arpa

$ grep -c "sending packet to" named.run
45

Much better, right?

If you read so far and you are still interested in testing this, the latest tarball is always available in the latest pipeline in the tarball-create job in the "precheck" stage, but I've also copied the latest one into a latest comment in the MR itself: https://gitlab.isc.org/isc-projects/bind9/-/merge_requests/11205#note_611712

Disclaimer: This work might be a bust or it could hit a dead end.

Thanks,
--
Ondřej Surý (He/Him)
ondrej at isc.org

My working hours and your working hours may be different. Please do not feel obligated to reply outside your normal working hours.



More information about the bind-users mailing list