[bind10-dev] Robustness in BIND 10, was whether/when to use exceptions

Shane Kerr shane at isc.org
Thu Oct 15 13:50:05 UTC 2009


On Wed, 2009-10-14 at 11:59 -0700, JINMEI Tatuya / 神明達哉 wrote:
> At Wed, 14 Oct 2009 13:30:15 +0200,
> Shane Kerr <shane at isc.org> wrote:
> 
> > An important area that we can do much better with if we have exceptions
> > are coder errors. By this I basically mean assert() failures. With
> > exceptions these become an exception rather than an abnormal exit, and
> > some action may be taken.
> > 
> > Of course, it is difficult to write code that handles coder errors.
> > Rather than having a blanket policy, one must review each piece of
> > software in context and decide what the best thing to do is.
> > 
> > For example, if we get an update message that puts the system in an
> > unexpected state, we may be able to simply drop the update and continue
> > on. However, if we discover the problem after we have started the
> > update, then we may have to restart the process doing the update. In
> > extreme cases we may have corrupted our database - although using an SQL
> > database that is ACID can make this very unlikely.
> 
> Personally I'm not so optimistic about this advantage of exceptions.
> As you noted yourself, it would be generally very difficult to tell
> whether an unexpected code state is due to a critical damage ins some
> core part of the system (in which case there's not much we can do
> except existing) or due to a minor recoverable error.  So, to provide
> safer behavior I suspect we'll still end up existing in most of such
> cases in practice, whether or not it's exception-based.
> 
> (This is not a discussion about whether to use exceptions per se) IMO
> what we should do in BIND10 comparing it with BIND9 is to provide a
> quicker and automatic restart mechanism when we encounter a code error
> and have to exit.

Hm... I am a bit more hopeful, but I guess we shall see.

I'm going to blog about this, but I'll chat a bit here.

One of the goals for BIND 10 is to increase the robustness of the
software. The basic strategy for this is fault isolation. For us this
means making errors affect as small a part of the system as possible,
and then recovering from them as quickly in possible.

I think two things that will help robustness in BIND 10 are:

     1. Multiple processes
     2. Exceptions

Both are part of the general strategy of fault-isolation.

Multiple processes helps because a problem that affects one component
(say dynamic updates) does not necessarily stop other components (like
query processing). Also, restarting the dynamic update component alone
is likely much quicker than restarting the entire system.

Exceptions help (at least in my mind, although nobody else seems to
agree) because they allow you to handle a code error at the closest
location possible. My reasoning:

        I looked at the security advisories for BIND 9, and found that
        there have been 13. Of these, 1 was actually in a library that
        BIND 9 itself does not use. Of the remaining 12, 8 were caused
        by assertion failures. This means that 2/3 of all "security"
        problems were actually DoS caused by coding errors.
        
        Coding errors happen. I *like* that we didn't simply continue
        normal processing when we discovered things were not as
        expected. However what we should not do is abort the entire
        system, if possible.
        
        Running multiple processes and allowing fast restart is part of
        the solution. However, I think we can do better. For example,
        almost all processing consists of a few basic steps:
        
              * Reading operation
              * Parsing & checking operation (ACLs and the like)
              * Lookup and/or update of some data store
              * Building a reply
              * Sending the reply
        
        If we discover a code error in each of these, in many cases we
        do not have to restart - we can dump the operation and move on.
        Sometimes we *do* have to simply exit, but *sometimes* we should
        be able to use an exception handler and take some evasive
        action.
        
        Hey, if NASA can do it, so can we!

I could be wrong about exceptions helping with robustness. But I remain
hopeful!

--
Shane




More information about the bind10-dev mailing list