BIND 10 #833: [b10-resolver] Nameservers unreachable but really are

Tue Apr 5 15:37:14 UTC 2011

#833: [b10-resolver] Nameservers unreachable   but really are
--------------------------------------+---------------------------
                 Reporter:  jreed     |                Owner:
                     Type:  defect    |               Status:  new
                 Priority:  critical  |            Milestone:
                Component:  resolver  |           Resolution:
                 Keywords:            |            Sensitive:  0
Estimated Number of Hours:  0.0       |  Add Hours to Ticket:  0
                Billable?:  1         |          Total Hours:  0
                Internal?:  0         |
--------------------------------------+---------------------------

Comment (by jelte):

 I'm not a hundred percent sure of this, but I've been adding a bit of
 extra debugging to a private branch here, and it looks like one of the
 main problems is that it's doing a lot of unnecessary work, so much in
 fact that the internal timeouts kick in and the NSAS marks zones as
 unreachable.

 For instance, every lookup that is a cache miss starts a new recursion
 from the root right now (it doesn't search for the lowest known delegation
 at this moment). Another thing is that there is no 'front'-demuxer; if we
 ask it to resolve the same name/type twice (the second one before the
 first one finishes), it'll start a second RunningQuery that does all the
 same work. These two combined add up a lot.

 To fix the first one we have partial support in the cache, but we need to
 add some code to the resolver to make use of that (and that is not simply
 a one-line addition, though I think it isn't much). The second one
 requires a bit of design first.

 I can do that and make tickets for it, and I think we should make this
 ticket 'depend' on those, and when they are done we can probably see
 better if this is a problem in itself or really just a side-effect of all
 that extra work.

 (now that i've typed this, perhaps it also helps if we make the
 client_timeout higher for 'internal' queries, i.e. queries we initiated
 ourselves. I shall try that)

-- 
Ticket URL: <http://bind10.isc.org/ticket/833#comment:1>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development