[kea-dev] thoughts on #3780

Fri Oct 23 13:15:25 UTC 2015

On 22/10/2015 16:49, Thomas Markwalder wrote:

> One of the issues here is that subsequent MySQL api calls made on a
> connection which has been lost can core, so once we detect an issue we
> can't use that connection for anything other than looking at error
> values.  Currently the MySqlLeaseMgr has no real defense against that.
> 
> At first blush, the second option seems the proper thing to do. 
> However, we have done a pretty thorough job of making sure database
> related exceptions, and even unexpected std::exceptions within our
> libraries, do not bring the server down.  This means that propagating
> this new exception all the way out to a server's ::run() method, would
> require adding catch-rethrows to quite a few try-catch blocks.  These
> blocks are in lots of places from allocation engine to
> TimeMgr::handleReadySocket().  Basically any place that can end up
> accessing the database.  This seems like a pretty invasive thing to do.
> 
> Unless we think there might be other conditions, unrelated to the
> database, that we wish to treat as "fatal" in the sense that we want the
> server to shut down, but in a more orderly fashion that calling exit().
>   If that's true, having explicit catches for a generic class of
> isc::FatalException might be warranted.

I dislike calling exit() in the middle of the code, although this might
be a good short-term solution for Kea 1.0 - providing we revisit it
before Kea 1.1. Ultimately though, I think that creating an
isc::FatalException class is the way to go.

Ideally, all exceptions generated by library code should be derived from
std::exception.  If this were the case, we could change all "catch
(...)" calls to "catch (const std::exception&)" to allow
isc::FatalExceptions (if not derived from the base class) to propagate
all the way to the top level.  Unfortunately, Boost defines an exception
hierarchy that is not derived from std::exception, so if we were do
that, Boost exceptions could also leak to the top level.

If we were compiling with C++11, we could call the current_exception()
construct in the "catch (...)" clause to determine the thrown exception
type and take action accordingly.  But we aren't, so we can't.

A possible alternative to calling exit() is to replace all "catch (...)"
clauses with:

catch (const isc::FatalException&) {
    throw;
}
catch (...) {
   :
}

I count 61 instances of "catch (...)" in Kea (74 if you include their
appearance in the unit tests), so this is a viable solution, albeit more
work than calling exit().

Ultimately I think that the best solution (and one requiring most
effort, so not suitable to do before the next release) is to revisit all
places where we catch exceptions and check what we really want to do. In
many cases we should probably just catch isc::Exception as these
exceptions will be from "throw" statements we have put inside the code
and so can anticipate.

Frequently though, we catch std::exception (or "..."). Exceptions
derived from this class include bad_alloc (can't allocate memory),
out_of_range (e.g. accessing outside the bounds of a vector) etc. Should
we be catching those?  For example:

* bad_alloc means that Kea can't allocate memory.  Theoretically we
could free up memory but in practice we don't and memory allocation
failure should be treated as fatal error condition.  Attempting to
continue is likely to lead to a failure in an unrelated part of the code.

* out_of_range may be a valid exception to catch if we are using a
user-supplied value to access some data, although I would prefer that
the value was explicitly checked for validity before being used.  If
some Kea code generates this through, it probably means that there is a
logic error somewhere.  (Although if this is the design of the code, the
code should be specifically checking for std::out_of_range, not
std::exception.) Rather than try to continue with the current packet, we
should probably kill the program or effectively re-initialize - drop the
current packet and process the next one.

In many cases, catching only exceptions derived from isc::Exception will
be the right thing to do and other exceptions should be left to
propagate up through the stack.

Of course, there will be cases where all exceptions should be caught.
For example, the Hooks feature causes user libraries to be loaded.  Kea
should protect itself from errors generated in that code, so the current
behaviour of logging user library-generated exceptions and dismissing
them is probably correct.  But these cases are likely to be few and far
between.

> I am inclined to implement option #1 for 1.0.  It might not be pretty
> but it is far less invasive than trying to thread the exception all the
> way through.  In the end, what would we really gain by doing that? 
> Granted, we would not all of our destructors but is this really a big
> issue?   We could attempt to register some sort atexit() function but
> I'm not sure if it's worth it.

exit() will be fine.  The operating system's process cleanup should
leave things in a consistent state, although if a destructor does
something like deleting a marker file, that of course won't happen.

Stephen