Fwd: 64-bit time_t transition for 32-bit archs: a proposal
Julien ÉLIE
julien at trigofacile.com
Tue Jun 27 18:17:49 UTC 2023
About the news spool:
> @02nnaabbccddyyyy00000000000000000000@
> "aabbccdd" is the arrival time in hexadecimal, and "yyyy" a sequence
> number.
With timehash, articles are stored in files named:
<patharticles>/time-nn/bb/cc/yyyy-aadd
(Time read as 0xaabbccdd in 32-bit.)
Supposing 0xeeffgghhaabbccdd in 64-bit, the token could become
a/ @02nneeffgghhaabbccddyyyy000000000000@
(full rewrite)
or
b/ @02nnaabbccddyyyyeeffgghh000000000000@
And the file names could become:
a/ <patharticles>/time-nn/bb/cc/yyyy-aadd-eeffgghh
with fixed length (articles will be named yyyy-aadd-00000000 until 2038).
b/ <patharticles>/time-nn/bb/cc/yyyy-aadd[-hh][gg][ff][ee]
with a bit of complexity: articles will be named yyyy-aadd until 2038
(no change from the current naming), and then yyyy-aadd-hh until epoch
reaches 2^40, and yyyy-aadd-hhgg until epoch reaches 2^48, and so on.
It will mean extra complexity when parsing filenames and tokens. It
seems manageable, though.
Of course, any other combination could be discussed, like yyyy-aaddhh or
yyyy-hhaadd.
But well, is it really worth doing something right now to handle these
future names?
0xaabbccdd will overflow when epoch reaches 2^32, which is in year 2106...
It gives us time to implement that new parsing, if these storage methods
still exist in 2100s. No need for an history or overview rebuild.
We could maybe just fix the parsing for a 64-bit time_t, if there's
really something to fix, and leave the names and tokens unchanged?
> The storage tokens in history should also be changed accordingly.
> They contain the same "aabbcc": > @04nn00aabbccyyyyxxxx0000000000000000@
With timecaf, articles are stored in files named:
<patharticles>/timecaf-nn/bb/aacc.CF
Changes similar to timehash could be done for it.
The remaining issue is for the CAF file header containing a time_t
LastCleaned field.
Hmm, as time_t may already be 32-bit or 64-bit depending on the system,
couldn't it be switched to uint64_t LastCleaned?
And we'll have to ship a tool which converts the CAF files appropriately.
Shouldn't we also change the size_t and off_t variables in the CAF file
header to uint64_t at the same time?
As space is allotted for 262144 (2^18) articles in current CAF files and
from time to time people complain with that limit (when reinjecting at
high speed articles in their news spool, more than 262144 can arrive in
a time frame of 4 minutes, which leads to storage failures).
At the same time we do the change to uint64_t LastCleaned, couldn't we
increase the allotted space to handle for instance 4,194,304 (2^22)
articles?
Handling 64-bit time_t would then be an opportunity to improve timecaf :)
Other ideas welcome of course.
--
Julien ÉLIE
« – Ouvre l'œil, et le bon !
– L'autre, je peux pas encore l'ouvrir, je risque pas de me
tromper ! » (Astérix)
More information about the inn-workers
mailing list