storage tokens duplications

Julien ÉLIE julien at trigofacile.com
Wed Aug 26 18:15:20 UTC 2009


Hi Kamil,

Thanks for your report!

> My *guess* is that there is only four hex digits descibings position of
> article in *.CAF file.
> Since there were posted ~200 000 articles into single .CF file, "number
> overflow occured", and number were reused. But, its only my gues.

@040200470388000200000000000000000000@
is in timecaf-02/03/4788.CF

04 = TIMECAF
02 = the storage.conf class
00470388 = path 03/4788
0002 = article number in the file

The path is generated at the arrival time.

%convdate -n "Wed, 3 Oct 2007 14:17:36 +0200"
1191413856

in binary: 01000111 00000011 10001000 01100000

therefore, with shifting >> 8, >> 16 and >> 0 (with proper endianness):
00000011/01000111|10001000.CF

which means that during 256 seconds (2^8), articles are put into the same
file.  And we can have 65535 articles (2^16-1) in a file if I understand
how that works.


Article number in the file is given by:
    s = htons(seqnum);     // uint16_t htons(uint16_t hostshort)
    memcpy(&token.token[4], &s + (sizeof(s) - 2), 2);



> %head -n 1000000 /var/lib/news/history|cut -f 3 |grep "@04" |sort |
> uniq -c|sort -nr |head -n 10
>      2 @0401004A94E8057000000000000000000000@

can you look at the corresponding file (timecaf-01/94/4AE8.CF) and
see how many articles you have in it?
(maybe a count of 'Message-Id: ' would roughly give that)



Another possibility would be race conditions, as Ray once suggested.
I read in the source:

/*
** variables for keeping track of currently pending write.
** FIXME: assumes only one article open for writing at a time.
*/

Maybe your count will tell us.
Thanks,

-- 
Julien ÉLIE

« Nam et ipsa scientia potestas est. » (Francis Bacon) 




More information about the inn-workers mailing list