BIND 9.20.23: TLS connection leak (CLOSE_WAIT) when forwarding to DoT servers?
Dennis Neufeld
isc.lists at dneufeld.net
Tue Jun 16 14:47:56 UTC 2026
Hi,
after upgrading from BIND 9.20.21 to 9.20.23 on Debian 13, I am seeing a large accumulation of TCP connections in CLOSE_WAIT state to port 853 when forwarding queries to DoT upstream servers (tested with Cloudflare and DNS4EU).
After some time under normal load, "ss -tnp | grep 853 | awk '{print $1}' | sort | uniq -c | sort -rn" shows something like:
4465 CLOSE-WAIT
2 ESTAB
Connections in CLOSE_WAIT accumulate continuously across all configured DoT upstream servers:
$ ss -tnp | grep 853 | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn
1321 86.54.11.200 (DNS4EU)
1203 86.54.11.100 (DNS4EU)
1080 1.1.1.1 (Cloudflare)
861 1.0.0.1 (Cloudflare)
The same error pattern occurs for all domains, regardless of the queried domain or upstream server. Observed examples:
info: shut down hung fetch while resolving 0xXXXXXXXXX000(<ext-domain>/A)
debug 1: set ede: info-code 22 extra-text (null)
debug 1: client @0xXXXXXXXXX000 <client-ip>#56707 (<ext-domain>): rpz QNAME rewrite <ext-domain> stop on unrecognized qresult in rpz_rewrite() failed: SERVFAIL
debug 1: client @0xXXXXXXXXX000 <client-ip>#56707 (<ext-domain>): query failed (SERVFAIL) for <ext-domain>/IN/A at query.c:7860
debug 2: fetch completed for <ext-domain>/A in 12.000205: SERVFAIL/success [domain:.,referral:0,restart:1,qrysent:1,timeout:0,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
debug 3: client @0xXXXXXXXXX000 <client-ip>#56707 (<ext-domain>): send failed: operation canceled
query-errors: debug 1: client @0xXXXXXXXXX000 <client-ip>#56402 (<ext-domain>): rpz QNAME rewrite <ext-domain> stop on unrecognized qresult in rpz_rewrite() failed: SERVFAIL
query-errors: info: client @0xXXXXXXXXX000 <client-ip>#56402 (<ext-domain>): query failed (SERVFAIL) for <ext-domain>/IN/A at query.c:7860
query-errors: debug 2: fetch completed for <ext-domain>/A in 12.004205: SERVFAIL/success [domain:.,referral:0,restart:1,qrysent:0,timeout:0,lame:0,quota:0,neterr:0,badresp:0,adberr:0,findfail:0,valfail:0]
Impact:
- Initially, there are none or only a few SERVFAIL errors; later, there are significantly more. In some cases, DNS becomes unusable
- Query timeouts of exactly 12 seconds before failure
- System accumulates thousands of zombie TCP connections
- Issue affects all configured DoT upstream providers simultaneously, ruling out an upstream-side issue
Downgrading to 9.20.21 fully resolves the issue.
Has anyone else seen this? Is there a configuration-level workaround that properly closes stale TLS connections? Or is this a bug?
Thanks
Dennis
More information about the bind-users
mailing list