[kea-dev] Fwd: thoughts on #3780

Thu Oct 22 15:55:13 UTC 2015

Subject is correct text of email states the wrong ticket,  The correct
ticket is trac #3780

-------- Forwarded Message --------
Subject: 	thoughts on #3780
Date: 	Thu, 22 Oct 2015 11:49:34 -0400
From: 	Thomas Markwalder <tmark at isc.org>
To: 	Kea Dev List <kea-dev at lists.isc.org>

Trac 3969 calls for the servers to exit if they lose connectivity with
MySQL.
I've been looking into this.

First, the MySqlLeaseMgr::checkError() method needs to be modified to
distinguish betweeen a unrecoverable DB error and things like statement
errors.   That's pretty straight forward.

But then we have at least the following alternatives:

1.  If a fatal error is detected,  log it and call exit(-1).

2.  Create a new exception, perhaps DbFatalError, and throw it. 

One of the issues here is that subsequent MySQL api calls made on a
connection which has been lost can core, so once we detect an issue we
can't use that connection for anything other than looking at error
values.  Currently the MySqlLeaseMgr has no real defense against that.

At first blush, the second option seems the proper thing to do. 
However, we have done a pretty thorough job of making sure database
related exceptions, and even unexpected std::exceptions within our
libraries, do not bring the server down.  This means that propagating
this new exception all the way out to a server's ::run() method, would
require adding catch-rethrows to quite a few try-catch blocks.  These
blocks are in lots of places from allocation engine to
TimeMgr::handleReadySocket().  Basically any place that can end up
accessing the database.  This seems like a pretty invasive thing to do.

Unless we think there might be other conditions, unrelated to the
database, that we wish to treat as "fatal" in the sense that we want the
server to shut down, but in a more orderly fashion that calling exit().
  If that's true, having explicit catches for a generic class of
isc::FatalException might be warranted. 

Keep in mind that we have a post 1.0 ticket, under which we should
attempt to reconnect to the database.  This has its own challenges but
that's for another day.  This might render such as intrusive code as
throw-away.

I am inclined to implement option #1 for 1.0.  It might not be pretty
but it is far less invasive than trying to thread the exception all the
way through.  In the end, what would we really gain by doing that? 
Granted, we would not all of our destructors but is this really a big
issue?   We could attempt to register some sort atexit() function but
I'm not sure if it's worth it.

Thoughts?

Thomas

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.isc.org/pipermail/kea-dev/attachments/20151022/df47b15b/attachment.html>