[kea-dev] Kea's database schema versioning in practice

Tue Sep 6 13:32:18 UTC 2016

On 9/5/16 12:33 PM, Marcin Siodelski wrote:
> On 05.09.2016 13:04, Ola Thoresen wrote:
>>> Oddly enough, nobody has any opinions. Come on, stating that we have
>>> been doing that right is also a good feedback.
>>>
>>
>>
>> I don't have any strong opinions, and I have not read the code, so I
>> might be wrong, but I have a few questions which might lead to others
>> having opinions.
>>
>> a) Are you limited to just two parts in the schema version numbering?
>> - To me it makes sense to use a two part ("two digit") version for the
>> releases, and use another number for intermediate versions.  So kea 1.0
>> is released with schema x.y and all updates to the schema until the next
>> release is numbered x.y.z
>> This will not bloat the schema version too much, as all the minor
>> updates don't count as soon as a new version of kea is released.
>>
> Ola,
>
> Currently we use major and minor version number of the schema. So, only
> x.y. It is not technically hard to add revision "z" to this if there is
> sufficient demand for versioning between releases. One problem I see
> with this approach, though, is that you have to tag each change to the
> SQL scripts with a version number (namely, value of "z"). Upgrade
> scripts are incremental, which means that if you make a change like
> adding new column into an SQL table and later you decide to rename it,
> you have to keep both statements in your upgrade scripts, one that adds
> the column, second one that renames it. If you didn't version the schema
> with "z" granularity, you could simply have one statement (between two
> releases) that adds a column with the target name.
>
> This is obviously only an example, but there will be similar issues as
> we extend the databases to hold much more data than they hold today. And
> this will certainly bloat the upgrade scripts. The level of "bloating"
> is debatable. The major question is, again, whether people feel that
> being able to upgrade between intermediate verions of schema is a must.
>
>> b) Is there an effort going on to synchronize the schema versions
>> between the different db-platforms?  The difference between IE pgsql and
>> mysql should not be THAT big - and that PG is using schema version 3 and
>> MySQL is already in version 5 looks a bit strange to me.
>> The schema version should reflect the actual contents of the schema
>> (what columns and indexes and so on that they contain) and if they are
>> more or less "on pair" between the DBs the version should also be more
>> or less the same.
>> - Of course, it can be that PG is lagging behind MySQL (I have not read
>> the code) but this sounds like an issue that should be prioritized, as
>> working with code that needs to behave vastly different based on what
>> DB-engine is running in the background sounds like a big problem in the
>> long run.
>>
> Yeah, I see your point. I think this would be more informative if we had
> the same or nearly the same version number for the two schemas that
> contain the same data set and support the same features.
>
> The reason for the discrepancy today is that, historically, PostgreSQL
> backend implementation was lagging behind MySQL. We had many more
> revisions of MySQL schema and finally we updated Postgres within a
> single release.
>
> Our general feeling is that we should make sure that MySQL and Postgres
> have the same feature set and the schema version numbers should mostly
> match. But, I am not sure we can guarantee that this will always be the
> case. Maybe we should try that and see how it goes.
>
> Marcin Siodelski
> ISC
> _______________________________________________
> kea-dev mailing list
> kea-dev at lists.isc.org
> https://lists.isc.org/mailman/listinfo/kea-dev

Was on PTO, else I would have responded earlier. 

I think I'd suggest this:

1. If during development a cycle, we issue an engineering drop or we
deliberately distribute or request user(s) to pull and test, and there
are schema changes, we should bump the schema number provide an update path.

2. I agree with Ola, in that PostgreSQL should be renumbered as 5.0, and
we should going forward match schema numbers based on content.

I think the primary reason there's much discussion about interim updates
is due to the pain involved in our current DB schema creation/update
architecture.   If we restructure our schema creation/modification so we
have a base script and a set of deltas then creating a new DB, is matter
of running the base and all deltas, updating a DB is matter of finding
the appropriate starting point and then running all subsequent deltas. 

For MySQL we would have had

    1.0-base-script, 2.0-delta, 3.0-delta, 4.0-delta, 4.1-delta,
4.2-delta, 5.0-delta

Similarly, for PostgreSQL:

    1.0-base-script, 2.0-delta, 3.0-delta, 3.1-delta

Using this approach simplifies our process as one only writes the delta
thus eliminate having to manually extend the creation script and does
away with the upgrade scripts altogether. I'd like to propose we do this
as part of 1.2.

Thomas