BIND and UDP tuning

Mon Oct 1 16:51:46 UTC 2018

Hi,

On Mon, Oct 1, 2018 at 9:58 AM Blake Hudson <blake at ispn.net> wrote:
>
> Alex wrote on 9/30/2018 7:27 PM:
> > Hi,
> >
> > On Sun, Sep 30, 2018 at 1:19 PM @lbutlr <kremels at kreme.com> wrote:
> >> On 30 Sep 2018, at 09:59, Alex <mysqlstudent at gmail.com> wrote:
> >>> It also tends to happen in bulk - there may be 25 SERVFAILs within the
> >>> same second, then nothing for another few minutes.
> >> That really makes it seem like either you modem or you ISP is interfering somehow, or is simply not able to keep up.
> > I'm leaning towards that, too. The problem persists even when using
> > the provider's DNS servers. I thought for sure I'd see some verifiable
> > info from other people having problems with cable, such as from
> > dslreports, etc, but there really hasn't been anything. The comment
> > made about DOCSIS earlier in this thread was helpful.
> >
> > Do you believe it could be impacting all data, not just bind/DNS/UDP?
> >
> > Do people not generally use cable as even a fallback link for
> > secondary services? I figured it was because there's no SLA, not
> > because it doesn't work well with many protocols. I'd imagine services
> > like Netflix and youtube don't have problems is because they 1) don't
> > require a lot of DNS traffic and 2) http is a really simple protocol
> > and 3) the link is probably engineered to be used for that?
> >
>
> Overall it probably depends on volume and application. Cable works well
> as a transport, but is not the same as DSL, ethernet, or GPON. If you
> have the need to send 500+ pps, then Cable may not meet your needs.

I believe I said as many as 500 qps, but I believe that's wrong. It's
more like a sustained 200 q/s.

> If you are running a high volume mail server you probably do need to run
> a local resolver to query services like SpamHaus, SORBs, and others due
> to the terms of service of these services and the rate limiting that

Yes, doing all of that. That's why I'm posting to the bind-users list.

For RBLs, I'm using invaluement (amazing), spamhaus, spamcop, sorbs,
senderscore and barracuda.

> they apply which would prevent you from using your upstream provider's
> DNS servers or a public DNS service like Google/Quad9/1.1.1.1. I would,
> however, recommend that you ensure your system has at least 2 resolvers
> configured in /etc/resolv.conf. If the first (local resolver) fails to
> resolve a query, then your system should retry the second server before

That turned out to be a key factor in this.

I managed to get rid of most of the SERVFAIL bind errors after
tunneling them through socat temporarily, but there were still a few
others. I thought by using just one entry in /etc/resolv.conf, it
would force all to go through there, but apparently some were
dropped(?). It wasn't until I added another resolver on a local
network (also on that cable connection) that the 'Name service error'
postfix errors really stopped.

> The occasional timeout might delay email, but should not prevent SMTP
> from functioning because A) DNS timeouts are considered to be a
> temporary error, and B) the default behavior of SMTP is to queue and

It doesn't prevent the email from being delivered, but the RBL queries
time out and consequently don't get consulted, perhaps allowing email
to pass that otherwise shouldn't have.

> retry if there is a timeout or temporary failure. Another angle to look
> at the problem from is if you believe the network can't handle more than
> X query volume, reduce your query volume below X to see if this resolves
> your issue. I operate dozens of email servers and they do not generate
> the query volume you describe. Perhaps you are querying too many RBLs

I've also experimented with QoS, prioritizing interactive traffic like
DNS, and it appears to help, but I don't believe it's a bandwidth
issue. The errors also sometimes happen when processing only a few
emails.

For a while I thought it couldn't be a bandwidth issue because it's a
165/35mbit link, and we have 10mbit ethernet links where it doesn't
ever happen with otherwise very similar configurations, but now I know
(or are pretty sure) it's apparently because of the make-up of how the
cable (DOCSIS?) is designed...