BIND srtt algorithm not working as expected
paul at callevanetworks.com
Thu May 17 08:32:16 UTC 2018
After doing some more packet captures, it looks like a lot of the queries are related to Sophos live protection DNS lookups (lots of queries for sophosxl.net), so there are a lot of queries which don't get resolved. We see multiple queries for the same name and the resolver seems to retransmit to each forwarder when it doesn't get a response, including the non-local ones. So the behaviour may be being exacerbated by these non-resolvable queries. Eventually after about 10 seconds, the forwarder replies with a SERVFAIL response as it eventually gives up trying to get a response from the Sophos name servers.
So now I am not sure if the rtt algorithm is completely at fault here as BIND is simply trying additional forwarders in an attempt to resolve the name.
I have seen this live protection stuff going on in quite a few corporates now, and each time we have had to raise the recursive-client limit. I don't think it's just Sophos that do this, pretty sure I saw this with McAfee a couple years ago too, they seem to use DNS to transmit file name hashes so they can do a reputation lookup, but for Sophos they only reply if some kind of action is required. There must be many corporates out there that are experiencing issues with the way this works, i.e all of a sudden their resolvers stop recursing because the recursive client limit is hit.
One account I am working on, the resolvers regularly hit 20,000+ recursive clients when they kick of a scheduled virus scan. I wish the anti-virus vendors would consider the impact they are having on corporate DNS environments and re-think how they implement their reputation lookups, it must be the cause of some pretty serious ouages. :-(
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the bind-users