[kea-dev] update on this ....FRe: thoughts on 3601

Thu Oct 29 11:03:04 UTC 2015

On 10/29/15 6:48 AM, Marcin Siodelski wrote:
> On 28.10.2015 21:30, Thomas Markwalder wrote:
>> On 10/28/15 10:13 AM, Thomas Markwalder wrote:
>>> About #3601 - Update script for Memfile
>>>
>>> Currently, CSV files that do not have all of the defined columns fail
>>> internal
>>> validation when we attempt to load them.  Basically, they need to
>>> upgraded to
>>> the current CSV "schema".  We could do this with a script, as we do for
>>> MySQL
>>> and PostgreSQL.  The script would need to either take the Kea
>> configuration
>>> file as a command line argument or an explicit list of files to
>> convert. The
>>> former being a bit more problematic than the latter.  It would then
>> need to
>>> examine the header row, determine what columns need to be added and
>> use that
>>> information to do the conversion.
>>>
>>> We could, I think, also build the conversion into the code, with
>> about as
>>> much effort as doing the script.
>>>
>>> Background on current code:
>>> ---------------------------
>>>
>>> - We use the utility class CSVFile to derive our v4 and v6 lease file
>>> classes,
>>> CSVLeaseFiler4 and CSVLeaseFile6
>>>
>>> - CSVFile reads and writes its header and rows based on a list of named
>>> columns
>>> defined after instantion via calls to
>> CSVFile::addColumn(std::string& name).
>>> All columns need to be defined before using a file.
>>>
>>> - The first row of a file is presumed to be the header row. All
>>> subsequent rows
>>> are presumed to be data.
>>>
>>> - CSVFile::validateHeader()  - validates the header row by checking that
>>> values in
>>> the row match the list of column names, in order.
>>>
>>> - CSVFile::validate() - validates reading a row, primarily be expecting
>>> there to
>>> be the same number of data values as there are defined columns.
>>>
>>>
>>> What if we?
>>> -----------
>>>
>>> What if we derive a new class, VersionedCSVFile,  whose column
>> definitions
>>> have a name, a version number, and a default value?  Defining the
>>> columns for
>>> v6 now might look like this:
>>>
>>> void
>>> CSVLeaseFile6::initColumns() {
>>>     addColumn("address", "0.9.2", "");
>>>     addColumn("duid", "0.9.2", "");
>>>     addColumn("valid_lifetime");
>>>     addColumn("expire", 0.9.2", "");
>>>     addColumn("subnet_id", 0.9.2", "");
>>>     addColumn("pref_lifetime", 0.9.2", "");
>>>     addColumn("lease_type", 0.9.2", "");
>>>     addColumn("iaid", 0.9.2", "");
>>>     addColumn("prefix_len", 0.9.2", "");
>>>     addColumn("fqdn_fwd", 0.9.2", "");
>>>     addColumn("fqdn_rev", 0.9.2", "");
>>>     addColumn("hostname", 0.9.2", "");
>>>     addColumn("hwaddr", 1.0.0", "");
>>>     addColumn("state", 2.0.0", "");
>>> }
>>>
>>> VersionedCSVFile::validateHeader() would read the header row, column by
>>> column
>>> until it either hits a mismatch or runs out of columns.  A mismatch
>> we treat
>>> as an incompatibility error.  Too few columns, means we have an earlier
>>> version
>>> that that needs updating.  An exact match, and we're good to go. The
>>> version of
>>> the last column found gives us the "detected" version.
>>>
>>> VersionedCSVFile::next() would read up to the "detected" version and
>>> supply the
>>> remaining values from the column defaults.
>>>
>>> VersionedCSVFile::validate() would only enforce up to the "detected"
>> version
>>> We could also add logic to VersionedCSVFile() to allow it to
>> read/write a
>>> preamble that includes version information. The preamble could be
>> something
>>> like 1 or more lines at the beginning of the file that start with
>>> "Preamble:"
>>> or perhaps a comment marker such as "#".
>>>
>>> Caveats:
>>>    A. We only add new columns to the end of file.
>>>    B. We never rename or delete a column
>>>
>>>     If we were to ever add columns whose value must be based on the
>>> values in
>>>     other columns, rather than having default values for each column, we
>>> could
>>>     have a default supplier method.
>>>
>>>     If we expanded the column meta-data we could actually cope with
>> columns
>>>     that were deleted or renamed.
>>>
>>> Summary:
>>> -----------
>>>
>>> This approach has the following advantages:
>>>
>>> 1. Conversion is done seamlessly for the user upon start up, unless they
>>> change the lease file location or name.
>>>
>>> 2. Schema content is defined in one place.  In other words, the
>> lease file
>>> class for v4 and v6 are the only places where we have to keep track
>> of the
>>> column meta data.  We don't have to update both these classes and
>> upgrade
>>> scripts, etc.
>>>
>>> 3. We do not have to try to locate lease files based on parsing Kea
>> config
>>> or rely on the admin to know about ".completed" files
>>>
>>> The primary downside is that it requires the server to retain code for
>>> conversion that won't be used often. Though I think the additional code
>>> would be pretty minimal.
>>>
>>>
>>> Thoughts?
>>>
>>> _______________________________________________
>>> kea-dev mailing list
>>> kea-dev at lists.isc.org
>>> https://lists.isc.org/mailman/listinfo/kea-dev
>> FYI:  I have the basics of the VersionedCSVFile working.  It was pretty
>> simple actually.  This is minus any "preamble".
>> Integrating into the v4 and v6 servers should be pretty
>> straightforward too.
>>
Thanks for editing my text.  Hopefully it still made sense ;)

> This approach seems fine to me. I wouldn't like to see the preamble,
> because I like the lease file being purely a CSV file, that can be
> browsed as a spreadsheet. I also think that having a preamble would
> affect our existing unit tests and would basically complicate them.
>
> But, if I understand your suggestion, the preamble is not needed as long
> as we can figure out the version number from the columns that are
> present there and the specification which column belongs to which Kea
> version held in the CSVLeaseFileX classes.
You're understanding is correct. Each CSVLeaseFileX class will contain
meta-data for each column.  That data includes the version the column
first appears and a default value should it need to be added.   This
means preamble isn't strictly necessary.  There was mention in related
tickets about trying to include version information etc.   We should be
able to rely on the header row to recognize schema version.   If we
were to introduce a schema change that broke this scheme we can
address it at that point. 

This approach would address this same need for any future CSVFile we
might eventually create.  It is conceivable that Memfile's schema might
some day expand to require additional files.

> I like the fact that the user doesn't have to run any additional
> tools/commands to update the files and that it is done seamlessly.
> However, the log could probably contain some WARN message informing
> about such seamless upgrade.
It simplifies things quite a bit and I think Memfile users might
actually find the
need for manual intervention to upgrade their files as a shortcoming. 

Yes, I plan to include log messages so it won't happen without some
record of it
occurring.

> Marcin
>
>