bind 9.10 fallback to tcp

Mark Andrews marka at isc.org
Thu Jul 9 01:01:33 UTC 2015


In message <559DBE9B.8000204 at lancaster.ac.uk>, Graham Clinch writes:
> Hi Carl,
> 
> > I have a client with 9.10.2-P1-RedHat-9.10.2-2.P1.fc22 on Fedora 22, on
> > a machine with a pppoe link with an mtu of 1492. The routers seem to be
> > properly fragmenting udp - it can receive large packets such as
> >
> > dig www.byington.org +dnssec +bufsiz=4000 +notcp @205.147.40.34
> >
> > which says:
> >
> > ;; MSG SIZE  rcvd: 3790
> >
> > However, a tcpdump for tcp port 53 shows a lot of traffic. In
> > particular,
> >
> > rndc flushtree novell.com
> > dig www.novell.com @localhost
> >
> > shows some tcp traffic to the .com servers. How does one isolate the
> > query or server that is causing that fallback to tcp?
> 
> We saw a similar jump in TCP traffic with 'cold' (not much in the cache) 
> resolvers after switching from 9.9 to 9.10.  The cause seems to be a 
> change to the way edns sizes are advertised to unknown servers.  The 
> gory details are in the ARM for the 'edns-udp-size' option, but here's a 
> simplified version:
> 
> In 9.9, edns-udp-size is advertised initially, and only after problems 
> is it reduced to 512 bytes.
> In 9.10, edns-udp-size sets the *maximum* size that could be advertised, 
> but the first query uses 512 and then it grows up as successes occur.
> 
> The 'Address database dump' section of a cache dump (rndc dumpdb -cache) 
> has 'udpsize' notes along with edns success rates:
> 
> ; [edns success/4096 timeout/1432 timeout/1232 timeout/512 timeout]
> ; [plain success/timeout]
> ;	148.88.65.105 [srtt 1489] [flags 00006000] [edns 50/0/0/0/0] [plain 
> 0/0] [udpsize 1757] [ttl 173]

50 successful EDNS queries.

No timeouts with a advertised EDNS buffersize of 4096
No timeouts with a advertised EDNS buffersize of 1432
	  (ethernet - IPv4+IPv6+UDP headers to allow for 4/6 encapsulation)
No timeouts with a advertised EDNS buffersize of 1232
	  (IPv6 network - IPv6+UDP headers)
No timeouts with a advertised EDNS buffersize of 512
	  (Stupid firewall)

The largest UDP response seen had a size of 1757 bytes.

If you timeout on a 512 byte advertised size it counts against all 4 counters
If you timeout on a 1232 byte advertised size it counts against first 3 counters
If you timeout on a 1432 byte advertised size it counts against first 2 counters
If you timeout on a 4096 byte advertised size it only counts against the 4096 counter

All counters shift right (divide by 2) when upper bit is set on one
of them which make the self correcting.  The timeout counts decide
which EDNS udpsize is advertised after the first query or whether
to shift to plain DNS.  The known udp response size is a minimum
even if there are enough timeouts to otherwise go to a smaller size.
Plain DNS timeouts clear EDNS timeouts as they then be false
positives.

> though I'm not clear what udpsize is really reflecting here since it has 
> many different values (not just 512, 1232, 1432 & 4096, as I would 
> expect from the ARM).
> 
> We see a freshly restarted (validating) 9.10 resolver need to make many 
> TCP connections before returning its first answer, but things settle 
> after it's got comfortable using larger edns sizes with the root & tld 
> servers.
> 
> Graham
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe
>  from this list
> 
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users
-- 
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742                 INTERNET: marka at isc.org


More information about the bind-users mailing list