Bind 9 Limiting factors

Fri Jun 28 19:38:44 UTC 2002

>>>>> "Steven" == Steven B Parsons <sparsons at columbus.rr.com> writes:

    Steven> I have 7 bind 8 servers doing forward first to this bind 9
    Steven> server.

Yuk! Forwarding is rarely a good idea.

    Steven> What I really wish to accomplish is to have this server
    Steven> cache as much data as possible so that when the other
    Steven> servers request info from it the bind 9 server will
    Steven> allready have the needed info and not have to go out to
    Steven> the internet and look it up (obviously unless it expired).

It's a nice theory but it's ultimately pointless. Forwarding setups
like that might have made sense in the days when a whole campus lived
behind a 64/56k link. But not now. Then, the goal was really to save
bandwidth and have a side-effect of maybe speeding up resolution. Sure,
the forwarding target builds up a slightly bigger cache and the other
servers benefit from that. But not so you'd notice.

Suppose the forwarding server resolves the MX records for rfc1035.com
It sends the request to your forwarder target, which luckily already
has the answer. [How often do forwarding targets actually get cache
hits?] It sends the answer back to the forwarding server which caches
that result. Further lookups of my MX records get what's in your
forwarding server's cache from then on: until it expires of course.
What your forwarding server has saved is 1 external resolution of
rfc1035.com which maybe takes as much as 100 milliseconds if it had
resolved the query by itself. That massive saving in time lasts for a
week. It's a micro-optimisation that's almost indetectable. 100ms or
so is likely to be less than the time taken for the 3-way handshake in
setting up the SMTP connection to my mail server, so the "saving"
barely helps your mail system's latency or throughput.

Meanwhile, your forwarding target becomes a massive single point of
failure waiting to happen. If it dies or there's a connectivity
problem, your other servers will still forward to it, wait until the
queries time out and then try to resolve the queries for
themselves. Why not just let them do that from the outset? What's the
point of having all that round-trip time stuff in the server and not
using it? Especially when you're making the server dumb by always
blindly forwarding queries to the one place. Make your caching servers
autonomous and independent. It's the Right Thing To Do.

Suppose you always took the Brooklyn Bridge to get to Manhattan. You
hear the bridge is closed. Do you
	(a) make for the bridge regardless and ignore diversion signs?
	(b) find another bridge across the river?
	(c) choose an alternate means of travel?
	(d) give up?

Forwarding name server configurations are at best like (a) or at worst
like (d). Not very smart, eh?

    Steven> Im also trying to determine If I need to throw anymore HW
    Steven> at this box to get better performance & more cached data &
    Steven> lastly if there are any other setting I can change to do
    Steven> that.

Your server is already caching all the data your applications and
users need. Unless you can predict what new stuff they might lookup,
there's no point in caching more data: what could it cache? As for
performance, 500 queries/second is a lot, but not excessive. It's well
within what BIND9 can do on reasonable hardware. You should only be
worrying about that server's throughput if the forwarding servers are
telling you it's dropping their queries. In which case, you should
stop forwarding and make the forwarding target yet another caching
server. So now you'd have 8 caching-only servers, each able to resolve
a few hundred queries/second instead of one (SPoF) target that's doing
all the work. ie Fix the real problem at source instead of tinkering
at the margins. Now do you see why I said forwarding was rarely a good
idea?

    Steven> I just find it weird that the box is only using 640 megs
    Steven> while I still show more recursion's then success's in the
    Steven> logs but its not caching anymore data even though the
    Steven> memory is available.

Add successes and failures: the total is pretty much equal to the
number of recursive requests. You can allow for the difference to be
explained by lame/dead servers encountered during resolution. And you
can't assume that every recursive request will succeed. There are lots
of broken name servers in the world and lots of broken resolvers who
ask for non-existent names, RR types and Classes. Enable query logging
for a while: I guarantee you'll be amazed at the crap flung at your
servers.

    Steven> Ive had some bind 8 servers grow to 900megs before on the
    Steven> named process with same HW & OS version.

But not with exactly the same query traffic obviously. Unless you're
generating the same load, you can't expect to get meaningful results
to compare and contrast. Also, BIND8 could track more statistics than
BIND9 does which could explain the extra RAM it used, all other things
being equal. And BIND9 has fewer memory leaks than BIND8: check the
Changelogs.