monitoring BIND

Dave Knight dave at knig.ht
Wed Jul 13 16:53:42 UTC 2011


Sorry for contributing another non-answer, just wanted to comment that I have done something very similar once upon a time...

The case was a DNS authority service anycast node with:

2 Internet Facing Routers -- 2 Load Balancing Switches -- Big Stack of Servers

We had seen degraded performance reported by RIPE NCC's DNSMON but weren't sure if the problem was Internet routing, or inside our nodes, and if inside our nodes was it the server, or the load balancer, etc. 

We set up traffic capture with tcpdump at strategic points within the node, ie: between the router and load balancer, between the load balancer and the servers, on each server. With a good sample of the traffic, say an hour or so, we could then pull the DNSMON raw data for that same time period, and match the queries it sent to us (the DNSMON raw data contains the query id) against what we saw inside our node and verify that we saw it, answered it, and that the answer made it back out into the Internet. We could also see what path the query and answer took through the node and where any delays might be.

This very quickly led us to the load balancers as the cause of the delays and we were able to fix them.

We never felt the need to run this on an ongoing basis, once our servers looked green in DNSMON again we were happy that all was well in our world. We used it for diagnosis, rather than detection as it sounds like you want to do.

dave


On 2011-07-13, at 11:27 AM, Karl Auer wrote:

> More info to my question:
> 
> dig and Nagios have been suggested as possible solutions.
> 
> dig (and I suspect Nagios, which someone else mentioned) can only test
> resolution times from one point in the network, or maybe several, and
> using a very small number of tests.
> 
> Our current system watches ALL queries and responses to and from the
> nameservers and summarises ALL the response times, regardless of where
> the queries came from. For every second of the day we can say what the
> average, minimum, maximum, etc response times were.
> 
> We're looking for something that can do that, or something similar...
> 
> Regards, K.
> 
> -- 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Karl Auer (kauer at biplane.com.au)                   +61-2-64957160 (h)
> http://www.biplane.com.au/kauer/                   +61-428-957160 (mob)
> 
> GPG fingerprint: DA41 51B1 1481 16E1 F7E2 B2E9 3007 14ED 5736 F687
> Old fingerprint: B386 7819 B227 2961 8301 C5A9 2EBC 754B CD97 0156
> _______________________________________________
> Please visit https://lists.isc.org/mailman/listinfo/bind-users to unsubscribe from this list
> 
> bind-users mailing list
> bind-users at lists.isc.org
> https://lists.isc.org/mailman/listinfo/bind-users




More information about the bind-users mailing list