[kea-dev] Kea's database schema versioning in practice

Fri Aug 26 14:54:33 UTC 2016

Hi All,

I am soliciting feedback about the issue raised during review of Kea
ticket http://kea.isc.org/ticket/4562.

Kea is using switchable lease/host database backends. The database
schemas constantly evolve as we add new features which use those
backends. As an example, in Kea 1.1 release we're significantly
expanding support for Host Reservations. We have added support for
reserving DHCP options per host, reserving DHCPv4 message fields like
next server, boot file name (for PXE boot) etc. In PostgreSQL we have
created host reservations from scratch since Kea 1.0.

All those new features require adding new columns to database tables,
new indexes, changing types of some columns etc.

To assure that Kea users can identify what version of database they are
using and if this version is compatible with the current Kea version we
have introduced schema versioning. For example Kea 1.0 was released with
MySQL schema version 4.1 and PostgreSQL schema version 2.0.

Kea 1.1 will be released with MySQL schema version 5.0 and PostgreSQL
schema version 3.0.

We also provide the tool, called kea-admin, which can detect the
currently used schema version and upgrade as needed.

My question, though, is how to version the schema being used between
releases of Kea?

During the development of Kea 1.1 we made extensive changes to the
schemas but these changes didn't all appear at once. They were added as
we were implemented new tickets. Though, we had to update the scripts
which create the database schema for both unit testing and system
testing. Because the scripts use schema versioning, we marked all
changes appearing on top of the 4.1 and 2.0 schemas as 4.2 and 3.0
respectively, for MySQL and Postgres. So, technically, if someone
elected to compile Kea server from version control system after release
of Kea 1.0 he would see that it is using MySQL schema 4.2 and Postgres
schema 3.0. If he took some later revision of Kea, he could observe that
the schema number didn't change but the schemas were actually updated,
and so on.

What happens for the Kea 1.1 release is that it contains major updates
for PostgreSQL schema but the schema version is still 3.0. For MySQL,
the number of updates appeared to be so high that I also decided to call
it 5.0 (rather than 4.2). At the same time, I don't provide any upgrade
script from schema 4.1 to 4.2 to 5.0. I have the upgrade script which
makes it straight from 4.1 to 5.0, because schema version 4.2 was never
really released and it was simply renamed to 5.0.

Thomas makes a point that this approach may cause trouble for people who
used intermediate versions of Kea (between release of Kea 1.0 and Kea
1.1), as they could upgrade to intermediate versions of schema, called
3.0 and 4.2. The released version of 3.0 differs from the one they have
for Postgres and the 4.2 is not really upgradable to 5.0, because there
is no 4.2 to 5.0 upgrade script etc.

Thomas is thinking that we should rather bump up version numbers for the
release to 4.3 and 3.1, as it would allow people using intermediate
versions to upgrade to the schema used by released version. Also, Thomas
thinks that 4.3 is better than 5.0 as because there is no sufficiently
large change in MySQL schema to warrant major version change.

Personally, I disagree with Thomas'es points for a couple of reasons:
- because schemas evolved gradually between 1.0 and 1.1 release, and the
version numbers remained 4.2 and 3.0 for the entire time, there is no
way to tell which version of Kea people checked out from git and
therefore their database versions may not be upgradable to the most
recent version with a single script, because the needed updates depend
on what they have checked out.

- ISC (AFAIK) doesn't support the unreleased versions of our code, so
there is no requirement for us to make schemas upgradable between all
intermediate versions and released versions

- If we wanted to give a version number to each schema version committed
to master branch we'd need to account for the situations that someone
made a mistake in the schema and wants to back off the changes. It means
further update which will bloat our schema creation scripts.

- Finally, I don't think (although I may be wrong) that users use
unreleased code in production, in which case there is no strong desire
to maintain data integrity in the database. If someone is using the test
environment with intermediate version of Kea, he can probably toss the
database and simply re-create it (with kea-admin) easily, and with no
(important) data loss.

But, people may disagree with my opinions, in which case it would be
nice to hear some ideas how to maintain compatibility between the
released and unreleased code, and if it is desired.

I also think we need some policy for this, as we're going to be adding
new features using databases, e.g. configuration of the server held in
the database.

So the policies required seem to be:

- Should we bump up schema versions and update schema upgrade scripts
every time we commit the change to master? Or we rather make it on
release-to-release basis, similarly to what we did for 1.1?
- What does it mean a major change to a schema? In other words, in what
cases we need to bump up major version number vs minor version number?

Marcin Siodelski
ISC