BIND 10 #249: review: auth server can return SERVFAIL with an answer

Fri Jun 25 19:32:59 UTC 2010

#249: review: auth server can return SERVFAIL with an answer
-------------------------+--------------------------------------------------
 Reporter:  jinmei       |        Owner:  jinmei                                        
     Type:  defect       |       Status:  reviewing                                     
 Priority:  major        |    Milestone:  05. 3rd Incremental Release: Serious Secondary
Component:  data source  |   Resolution:                                                
 Keywords:               |    Sensitive:  0                                             
-------------------------+--------------------------------------------------

Comment(by jinmei):

 Replying to [comment:3 each]:

 > Rather than throwing exceptions from the data source, I would have had
 b10-auth look for a SERVFAIL Rcode in the reply from doQuery(), and either
 clear the partial answers out of the Message or construct a new Message
 with no answer.  But, if we're going to throw exceptions, then I agree
 with your speculation that we should do so in every SERVFAIL case, not
 just the two that you've already done.

 Okay, but let's leave the other cases to a separate ticket anyway.

 > > This observation leads to broader discussions.  First, I suspect other
 internal SERVFAIL cases should actully be changed to exceptions.  But I
 was not so sure if they are due to something like a broken database or an
 unsual, but not necessarily broken error.  So I didn't touch them.  We
 should revisit these cases, too, and, whatever the conclusion is, we
 should avoid SERVFAIL+answer for these cases.
 >
 > AFAIK, SERVFAIL always means either that the database is insane, or
 we've been told to do something unnatural (for example, a glue query found
 on the task queue, which should never happen).  Both seem like natural
 circumstances for an exception to me.

 So, do you mean you now agree exceptions are (more) natural?

 Exception vs error code is often a controversial issue, resulting in a
 matter of preference.  So, I'm not necessarily insisting exceptions be the
 only choice.  But, as I already explained and you seem to have agreeded,
 at least for these two cases it's clearly due to broken data bases, which
 should normally be rejected by a tool that builds the data source, I
 personally think exceptions are better.

 > My only question about it is, I recall having a conversation early in
 the BIND 10 project about someone (maybe you?) running benchmarks that
 showed exceptions caused a serious performance hit.  If that's true, then
 maybe it would be preferable to continue signaling the problem via the
 Rcode, and simply handle it better on the calling end?  But I'm okay
 either way.

 Overall, I don't think this is an issue in this case.

 If exceptions are actually triggered that's normally an expensive
 operation.  But these two cases should actually be "exceptional" which
 should never happen with a proper tool.  Of course, it couldn't be
 triggered via a remote packet as long as the data source is valid.  So, in
 these normal cases we can ignore the cost of exception.  If someone uses a
 broken tool that makes a broken data source and suffers from performace
 penalty, I think we can say it's a cost of using a broken tool (and the
 solution is to fix it).

 That's one reason why I didn't touch other SERVFAIL cases.  If some of
 them can be triggered via a remote query even if the underlying data
 source is valid, that could be a source of DoS and we might want to treat
 it differently.  (although this concern applies to broken queries in
 general...it would be a subject of a different discussion).

 > > So, in a longer term, I think we need to define a border in the top
 level data source, under which we cannot assume the validity of the data
 and the module must perform validation.
 >
 > What do you mean by "the module"?

 The top level data source (or, in this context, the query processing logic
 that interfaces to the underlying data source).  For example, when we do
 {{{
         return (ds->findRRset(task.qname, task.qclass, task.qtype,
                               target, task.flags, zonename));
 }}}
 we trust the data source that it builds valid RRsets in target, but even
 this may not always be guaranteed.  A broken data source implementation
 may include an empty RRset (an RRset with no RDATA), for example.

 Saying "that's not our problem" is one solution.  But I persontally think
 it's a bit naive in an open environment like BIND 10, and even if we adopt
 this solution we should at least be sure a broken input doesn't cause a
 catastrophic consequence within (e.g.) data_source.cc.  It won't be an
 easy task anyway.

 But again, it's beyond the scope of this ticket.  If we want to continue
 this discussion, let's move it to the list or create a separate ticket.

-- 
Ticket URL: <https://bind10.isc.org/ticket/249#comment:5>
BIND 10 Development <http://bind10.isc.org>
BIND 10 Development