storage tokens duplications

Julien ÉLIE julien at trigofacile.com
Fri Aug 28 18:59:46 UTC 2009


Hi Kamil,

> Hmm, strange:
> kjonca at alfa:/var/spool/news/articles%grep -a -i "^Message-ID: .*" timecaf-01/94/4ae8.CF|wc -l
>  69316
> kjonca at alfa:/var/spool/news/articles%grep -a "^Xref: .*" timecaf-01/94/4ae8.CF|wc -l
>  69552
> kjonca at alfa:~%grep "@0401004A94E8" /var/lib/news/history|wc -l
>  69523
>
> So it's about 69000 articles but I don't know why these numbers are so
> different.

I do not know :-/



> you can see effect of
> grep "@0401004A94E8" /var/lib/news/history |gzip - > ~/tmp/history.0401004A94E8.gz

In your history file:

[95E82401B8C8114BE57A418B4A60F932] 1251272950~-~904626650 @0401004A94E8FFFB00000000000000000000@
[25047DC0007561FD7FC31A848E73506A] 1251272950~-~904626716 @0401004A94E8FFFC00000000000000000000@
[DCCD2720FA8B23F008012A8E32D3B8E0] 1251272950~-~904626670 @0401004A94E8FFFD00000000000000000000@
[AE96B94BEE9FB204EECFAF1350EF8B72] 1251272950~-~904626705 @0401004A94E8FFFE00000000000000000000@
[E9AC4549BD6D5EA41F9F97EF1C1C0EF5] 1251272950~-~904626739 @0401004A94E8FFFF00000000000000000000@
[0F7531A987577505E5AEC35FF78E1C7D] 1251272950~-~904626766 @0401004A94E8000000000000000000000000@
[F30AB9C9738124D1F2F0F4531E0BD258] 1251272950~-~904601521 @0401004A94E8000100000000000000000000@
[77D195106C0F8DB6C2E866561790B5BE] 1251272950~-~904626747 @0401004A94E8000200000000000000000000@
[59DCB6D13A3FAB6EAAB0B20494C243DE] 1251272950~-~904626803 @0401004A94E8000300000000000000000000@
[47C90B71A26C7C180F0A32D113A9ABD6] 1251272950~-~904608874 @0401004A94E8000400000000000000000000@
[8C032C2AC9AD778B5CFCB6222E69C099] 1251272950~-~904608864 @0401004A94E8000500000000000000000000@
[9C7341D6C46CB15746DAB75997DED4A5] 1251272950~-~904626869 @0401004A94E8000600000000000000000000@
[4EA5C22A01385516DC836CCD5E4D6D13] 1251272950~-~904626841 @0401004A94E8000700000000000000000000@

We clearly see that there was a rewrap.  In fact, it was 0x010000 just after 0xFFFF but it was
not written...

I suggest a fix which would be to use 2 additional octets.  Your history file would then be:

[95E82401B8C8114BE57A418B4A60F932] 1251272950~-~904626650 @0401004A94E8FFFB00000000000000000000@
[25047DC0007561FD7FC31A848E73506A] 1251272950~-~904626716 @0401004A94E8FFFC00000000000000000000@
[DCCD2720FA8B23F008012A8E32D3B8E0] 1251272950~-~904626670 @0401004A94E8FFFD00000000000000000000@
[AE96B94BEE9FB204EECFAF1350EF8B72] 1251272950~-~904626705 @0401004A94E8FFFE00000000000000000000@
[E9AC4549BD6D5EA41F9F97EF1C1C0EF5] 1251272950~-~904626739 @0401004A94E8FFFF00000000000000000000@
[0F7531A987577505E5AEC35FF78E1C7D] 1251272950~-~904626766 @0401004A94E8000000010000000000000000@
[F30AB9C9738124D1F2F0F4531E0BD258] 1251272950~-~904601521 @0401004A94E8000100010000000000000000@
[77D195106C0F8DB6C2E866561790B5BE] 1251272950~-~904626747 @0401004A94E8000200010000000000000000@
[59DCB6D13A3FAB6EAAB0B20494C243DE] 1251272950~-~904626803 @0401004A94E8000300010000000000000000@
[47C90B71A26C7C180F0A32D113A9ABD6] 1251272950~-~904608874 @0401004A94E8000400010000000000000000@
[8C032C2AC9AD778B5CFCB6222E69C099] 1251272950~-~904608864 @0401004A94E8000500010000000000000000@
[9C7341D6C46CB15746DAB75997DED4A5] 1251272950~-~904626869 @0401004A94E8000600010000000000000000@
[4EA5C22A01385516DC836CCD5E4D6D13] 1251272950~-~904626841 @0401004A94E8000700010000000000000000@


As far as I can tell, it seems easy to do.
caf.c already uses ARTNUM (32 bits) for all numbers.
There is only the function which generates a token and the one which breaks a token in timecaf.c
to patch so that not to use short ints (16 bits) but ARTNUM.

If I send you a patch, will you be able to test it?  (That is to say, stop INN, apply the patch,
compile and update INN, modify in your history file @0401004A94E8000000000000000000000000@ to
@0401004A94E8000000010000000000000000@, restart INN and request a retrieval of the article
whose message-ID is the one for @0401004A94E8000000010000000000000000 at .)
Normally, that request would give you a wrong article before the patch, and the right article
after the patch.
To retrieve an article:
    sm '@0401004A94E8000000000000000000000000@'




To INN maintainers:  do you see a better thing to do in order to fix that issue?


Incidentally, in CAF, the header is about 300 bytes, leaving 212 bytes for the free zone
index size and therefore 212*512*8 = 868 KB for the main free zone table.  Consequently,
the size of a CAF file cannot exceed 868*8*512 = 3.5 GB.

On 64-bit architectures, the size of the free zone index size will be lower because
the header will have a greater size...
We're limited by the blocksize (512 bytes) and the fact that index = blocksize-sizeof(header).

The TOC size is constant and can contain up to 262 144 articles.


I do not know what happens when we exceed these numbers for a CAF file...
(> 3.5 GB or 262 144 articles)

-- 
Julien ÉLIE

« Quinze ans de légion et être bleu ! » (Astérix) 




More information about the inn-workers mailing list