[kea-dev] backend db redundancy
marcin at isc.org
Fri Nov 28 10:50:37 UTC 2014
Currently the connection to the lease database is opened when the server
configuration (from the config file) is parsed. That is, when the server
starts up, or when it is reconfigured (using SIGHUP signal, or keactrl
script - which uses SIGHUP signal). In other words, the
configuration/reconfiguration is the only time when the connection is
opened. That said, if the connection is lost because of the temporary
database unavailability the server will start failing any operations on
the database. Although, the server will continue to run, the DHCP
service will be broken from the clients' perspective.
When the lease database is back on-line, the DHCP server will not
reconnect until it is restarted or reconfigured. Note: reconfiguration
doesn't shut off the server - it triggers the server to re-parse the
configuration file and load a new configuration. This will happen even
when nothing has changed in the configuration file. As a result, the
server should re-connect to the lease database.
I believe that this behavior should be improved, although I wouldn't say
it is a top priority. That is mostly because there is a viable
workaround, as described above, to manually reconfigure.
Let me know what you think about the importance of the automated
re-connect vs manual?
I think that at some point we should implement the automatic re-connect
when the lease database is back on-line.
Stephen investigated the MySQL options, and it seems that there are at
least two possible solutions for this:
- MySQL lease backend should examine the error codes received from the
database, and if they indicate the closed connection, re-open connection.
- Rely on the MySQL's auto-reconnect mode which should re-establish the
connection when the database is back on-line.
Each of them has its pros and cons. We tend to think that the latter is
better (perhaps easier to implement). However, it also has issues. The
major one is that when the MySQL re-connects the prepared statements and
the autocommit option (maybe some other options too) are lost. The lease
database backend uses the prepared statements so it will not work until
the DHCP server detects that there was a re-connect which requires
setting the prepared statements and other options. However, there are
means for the DHCP server to detect it, so this is not going to be
terribly complex to implement it.
I am not sure if other SQLs have similar functionality. But if not, the
solution #1 is also viable.
I submitted a new ticket http://kea.isc.org/ticket/3639 to address it
for MySQL. Currently it is in the Kea-proposed queue, which means we
don't know when we would fit this work. You mentioned that you'd be
interested to help addressing it. You're very welcome. And, it also
probably means that you'd get it done sooner.
If you're willing to work on this, please send a quick overview how you
think you'd implement it in Kea, so as we could verify it on our end
before you write any code.
Please also note that it may not be as trivial as it may seem from the
first glance, as we require to have unit tests to cover every new code.
The unit test would need to simulate the database down time and would
need to allow for the auto re-connect and since two processes are
involved there may be some race conditions.
Finally, many thanks for bringing this up. This use case sounds like a
possible operational problem that people may hit.
On 11/18/14 03:05, alexis wrote:
> Guys, question, to avoid double work if you're already on this.
> Generally MSO's in central and latin america (and believe me ill be
> deploying heavily kea on them) uses non-clustered backends, this means
> there's an HA solution like corosync+pacemaker mantaining a floating IP
> for the DB service.
> If i have, as an example, 2 kea dhcp servers using a database with IP
> 18.104.22.168 which is a floating one, and there´s an external problem that
> forces that 22.214.171.124 to switch to a different db server (causing all
> established connections to be dropped), will the dhcp servers recover
> from there? (active transactions will be lost, that's ok), the question
> is related to from now, will try to connect again and keep going without
> the need to restart the dhcp?
> if it's not covered, i believe ill be testing and working on this topic.
> kea-dev mailing list
> kea-dev at lists.isc.org
More information about the kea-dev