INN commit: branches/2.5/storage/timecaf (4 files)

Mon Sep 7 08:27:03 UTC 2009

Date: Monday, September 7, 2009 @ 01:27:03
  Author: iulius
Revision: 8622

Fix a documentation error about timecaf:  it does not work
per newsgroup (though it used to).

FreeZoneIndexBytes does not exist; it is FreeZoneIndexSize.

Also remove trailing spaces.

Modified:
  branches/2.5/storage/timecaf/README.CAF
  branches/2.5/storage/timecaf/caf.c
  branches/2.5/storage/timecaf/caf.h
  branches/2.5/storage/timecaf/timecaf.c

------------+
 README.CAF |   48 ++++++++++++++++++++++----------------------
 caf.c      |    8 +++----
 caf.h      |   48 +++++++++++++++++++++++---------------------
 timecaf.c  |   64 +++++++++++++++++++++++++++++------------------------------
 4 files changed, 86 insertions(+), 82 deletions(-)

Modified: README.CAF
===================================================================

--- README.CAF	2009-09-07 08:25:38 UTC (rev 8621)
+++ README.CAF	2009-09-07 08:27:03 UTC (rev 8622)
@@ -1,17 +1,17 @@
 The timecaf storage manager is like the timehash storage manager, except that
 it stores multiple articles in one file.  The file format is called CAF
 (for "crunched article file", putting multiple articles together into one big
-file), and uses a library 'caf.c' dating back from the pre-storage manager 
+file), and uses a library 'caf.c' dating back from the pre-storage manager
 days when I made a locally-hacked version of INN1.5 that used this
 code in order to boost performance on my system.  Originally I had planned to
 do one big file per newsgroup, but it turns out that a time-based file layout
 rather than newsgroup-name-based is a. more efficient and b. much easier to
-fit into the current storage manager interface paradigm.  Anyway, the 
+fit into the current storage manager interface paradigm.  Anyway, the
 pathnames for the files are of the form
 	<patharticles>/timecaf-nn/bb/aacc.CF
 where 'nn' is the numeric storage class (same as in 'timehash') and the
-file contains all articles written during the interval from 
-(time_t) 0xaabbcc00 to 0xaabbccFF.  
+file contains all articles written during the interval from
+(time_t) 0xaabbcc00 to 0xaabbccFF.
 
   The way expiration works on the 'timecaf' storage manager is a bit
 complicated.  When articles are expired or cancelled (via SMcancel())
@@ -26,7 +26,7 @@
 newsgroups with differing expiration lengths put in the same timecaf
 storage class, everything will work ok but your expire runs will spend
 some extra time copying files about.  In my experience this hasn't been too
-much of a problem.  If you find that it is a problem, you may wish to 
+much of a problem.  If you find that it is a problem, you may wish to
 consider dividing up your spool layout so each storage class gets newsgroups
 that expire at more-or-less the same time, or putting *.binaries in their own
 storage class.
@@ -39,21 +39,21 @@
 artwrite speed).  This is presumably due to improved locality of reference and
 not having to open/close article files all the time but only every 4 minutes or
 so.  Artcancel speed, on the other hand, is not much different, because
-cancel requests have terrible locality of reference.   Expire times seem
-to be generally somewhat faster than timehash as well, even given the 
+cancel requests have terrible locality of reference.  Expire times seem
+to be generally somewhat faster than timehash as well, even given the
 extra copying overhead mentioned above.
 
   Timecaf is probably slower than CNFS, but I haven't had a chance
 to do any comparison tests.  Timecaf does share the feature with timehash
-that you can get much more fine-tuned control of your expire times (on a 
-group-by-group basis, if needed) than you can with CNFS.  
+that you can get much more fine-tuned control of your expire times (on a
+group-by-group basis, if needed) than you can with CNFS.
 
 Down below is an old README telling more about the implementation details
 of the CAF file format.  Most people won't care about this, but if you're
 curious, read on; it also tells some of the historical developments that
 went on in this code.  I've been running some version of this code off and
 on for the past two years, and have been running it as a storage manager
-module for the past few months, so I'm pretty sure of it's stability.
+module for the past few months, so I'm pretty sure of its stability.
 
 			Richard Todd
 	(rmtodd at mailhost.ecn.ou.edu/rmtodd at servalan.servalan.com)
@@ -61,12 +61,12 @@
 
 Implementation details (format of a CAF file) and some design rationale:
 
- Look at include/caf.h for the details, but basically, the layout is
+  Look at caf.h for the details, but basically, the layout is
 something like this.  Each CAF file has a blocksize associated with it
 (usually 512 bytes, but it can vary).  The layout of a CAF file is as
 follows:
   1.	Header (~52 bytes) containing information like low and high
-article numbers, amount of free space, blocksize.  
+article numbers, amount of free space, blocksize.
   2.	Free space bitmap (size given by the FreeZoneTabSize field of the
 header).
   3.	CAFTOCENTs (CAF Table of Contents Entries), 1/article storable
@@ -75,7 +75,7 @@
 for 64K CAFTOCENTs, even if the # of articles in the CAF file is
 nowhere near that amount.  The unused CAFTOCENTs are all zeros, and
 this means CAF files are almost always sparse.
-  4.	Articles, always stored starting at blocksize boundaries. 
+  4.	Articles, always stored starting at blocksize boundaries.
 
 When fastrm is told to remove an article, the article is not actually
 removed as such, it is merely marked as non-existent (the CAFTOCENT is
@@ -86,27 +86,27 @@
 the article into those blocks and marks those blocks as being in use.
 If there is no suitable free space chunk in the CAF file, then innd
 merely appends the article to the end of the CAF file and records the
-article's position in the TOC. [Given the way the CAF code is currently
+article's position in the TOC.  [Given the way the CAF code is currently
 used by the timecaf storage manager, it's almost always the case that we're
-appending to the end of the file.] 
+appending to the end of the file.]
 
-   A note on the free bitmap portion of the CAF file: it's not just a simple
-bitmap (each bit of the bitmap tells whether a data block is in use or free.)
+  A note on the free bitmap portion of the CAF file:  it's not just a simple
+bitmap (each bit of the bitmap tells whether a data block is in use or free).
 First there is an 'index' bitmap which tells which blocks of the 'main' bitmap
 have free blocks listed in them, and then a 'main' bitmap which tells whether
-the data blocks are in use or free.  This setup means that we can have 
+the data blocks are in use or free.  This setup means that we can have
 bitmaps for CAF files as large as 8GB, while still being able to find free
 space by only reading the 'index' bitmap and one block of the 'main' bitmap.
-(Previous versions of the CAF code had just a 'main' bitmap and scaled the 
+(Previous versions of the CAF code had just a 'main' bitmap and scaled the
 blocksize up when CAF files got large; this became rather, um, non-optimal
 when control.cancel started to hit hundreds of thousands of articles and 8K
 blocksizes.)  In practice, CAF files over 2GB or 4GB may be a problem because
-of unsigned/signed long problems, and ones over 4G are probably impossible 
+of unsigned/signed long problems, and ones over 4GB are probably impossible
 on anything besides an Alpha unless you track down all the places in innd
-where they assume off_t is a long and fix it to work with long longs.  
+where they assume off_t is a long and fix it to work with long longs.
 
   At some point I'd also like to try some other, more efficient
-directory layout for the CAF files, as opposed to the old 
+directory layout for the CAF files, as opposed to the old
 /var/spool/news/newsgroup/name/component/ scheme.  At the time I
 started implementing this, it seemed like it'd be too much of a hassle
 to change this in INN as it stands.  I'm hoping that changing this
@@ -119,7 +119,7 @@
 alt.tv.babylon-5 will now be /var/spool/news/alt/tv/babylon-5.CF -- note the
 final . instead of a /.  This pretty much bypasses the need for the 'terminal'
 layer of directories to be read, and means that these directory blocks will not
-be fighting with other blocks for the limited space available in the buffer 
-cache.   This provides more of an improvement than you might think; thruput on 
+be fighting with other blocks for the limited space available in the buffer
+cache.   This provides more of an improvement than you might think; throuput on
 news.ecn.uoknor.edu went from 160,000 articles/day to >200,000 articles/day
 with this patch, and this is on an aging 32M 486/66.]

Modified: caf.c
===================================================================
--- caf.c	2009-09-07 08:25:38 UTC (rev 8621)
+++ caf.c	2009-09-07 08:27:03 UTC (rev 8622)
@@ -164,8 +164,8 @@
 }
 
 /*
-** Fetch the TOC entry for a  given article.  As usual -1 for error, 0 success */
-
+** Fetch the TOC entry for a given article.  As usual -1 for error, 0 success.
+*/
 static int
 CAFGetTOCEnt(int fd, CAFHEADER *head, ARTNUM art, CAFTOCENT *tocp)
 {
@@ -480,7 +480,7 @@
 ** failure, offset of starting block if successful.
 ** XXX does not attempt to find chunks that span BMB boundaries.  This is 
 ** messy to fix.
-** (Actually I think this case  works, as does the case when it tries to find
+** (Actually I think this case works, as does the case when it tries to find
 ** a block bigger than BytesPerBMB.  Testing reveals that it does seem to work, 
 ** though not optimally (some BMBs will get scanned several times).  
 */
@@ -630,7 +630,7 @@
 ** file so that we don't "lose" free space and not be able to reuse it.
 ** (Currently only returns CAF_DEFAULT_BLOCKSIZE, as with the new 2-level
 ** bitmaps, the FreeZoneTabSize that results from a 512-byte blocksize can 
-** handle any newsgroup with <7.3G of data.  Yow!)
+** handle any file with <7.3G of data.  Yow!)
 */
 
 static unsigned int

Modified: caf.h
===================================================================
--- caf.h	2009-09-07 08:25:38 UTC (rev 8621)
+++ caf.h	2009-09-07 08:27:03 UTC (rev 8622)
@@ -1,4 +1,5 @@
-/* $Revision$
+/* $Id$
+**
 ** Declarations needed for handling CAF (Crunched Article Files)
 ** Written by Richard Todd (rmtodd at mailhost.ecn.uoknor.edu) 3/24/96
 */
@@ -28,31 +29,31 @@
 #define CAF_DEFAULT_BLOCKSIZE 512
 
 /*
-** then the table of free blocks.  The table is FreeZoneTabSize bytes
+** Then the table of free blocks.  The table is FreeZoneTabSize bytes
 ** long.  First comes a "first-level" or "index" bitmap, taking up the
 ** space from the end of the CAFHEADER to the end of the first
-** block, i.e. FreeZoneIndexBytes. The rest of the table is a big  bitmap
+** block, i.e. FreeZoneIndexSize.  The rest of the table is a big bitmap
 ** listing free blocks in the 'data' portion of the CAF file.
 **
-** In the "index" bitmap: LSB of bitmap byte 0 is 1 if there are any 1s 
+** In the "index" bitmap:  LSB of bitmap byte 0 is 1 if there are any 1s
 ** (free blocks) listed in the first block of the big bitmap, and 0 if there
-** are no 1s in that block.  The remaining bits of the index bitmap 
+** are no 1s in that block.  The remaining bits of the index bitmap
 ** correspond to the remaining blocks of the big bitmap accordingly.
-** The idea is that from the index bitmap one can tell which part of the 
-** main bitmap is likely to have free blocks w/o having to read the entire 
+** The idea is that from the index bitmap one can tell which part of the
+** main bitmap is likely to have free blocks w/o having to read the entire
 ** main bitmap.
 **
 ** As for the main bitmap, each bit is 1 if the corresponding data
 ** block (BlockSize bytes) is free.  LSB of bitmap byte 0 corresponds
 ** to the block @ offset StartDataBlock, and all the rest follow on
-** accordingly.  
+** accordingly.
 **
-** Note that the main part of the bitmap is *always* FreeZoneIndexByte*8
+** Note that the main part of the bitmap is *always* FreeZoneIndexSize*8
 ** blocks long, no matter how big the CAF file is.  The table of free blocks
 ** is almost always sparse.  Also note that blocks past EOF in the CAF file
-** are *not* considered free.  If the CAF article write routines fail to 
-** find free space in the fre block bitmaps, they will always attempt to 
-** extend the CAF file instead. 
+** are *not* considered free.  If the CAF article write routines fail to
+** find free space in the free block bitmaps, they will always attempt to
+** extend the CAF file instead.
 */
 
 #define CAF_DEFAULT_FZSIZE (512-sizeof(CAFHEADER))
@@ -86,9 +87,9 @@
     char *BMBBits;
 } CAFBMB;
 
-/* 
+/*
 ** Next in the file are the TOC (Table of Contents) entries.  Each TOC
-** entry describes an article. 
+** entry describes an article.
 */
 
 typedef struct _CAFTOCENT {
@@ -98,19 +99,19 @@
 } CAFTOCENT;
 
 /*
-** and then after the NumSlots TOC Entries, the actual articles, one after
-** another, always starting at offsets == 0 mod BlockSize
+** And then after the NumSlots TOC Entries, the actual articles, one after
+** another, always starting at offsets == 0 mod BlockSize.
 */
 
 /*
-** Number of slots to put in TOC by default.  Can be raised if we ever get 
-** more than 256K articles in a newsgroup (frightening thought).
+** Number of slots to put in TOC by default.  Can be raised if we ever get
+** more than 256K articles in a file (frightening thought).
 */
 
 #define CAF_DEFAULT_TOC_SIZE (256 * 1024)
 
 /*
-** Default name for CAF file in the news spool dir for a given newsgroup.
+** Default extension name for CAF file in the news spool dir.
 */
 #define CAF_NAME "CF"
 
@@ -126,15 +127,18 @@
 extern int CAFStatArticle(char *path, ARTNUM art, struct stat *st);
 
 #ifdef CAF_INNARDS
-/* functions used internally by caf.c, and by the cleaner program, and cafls
-   but probably aren't useful/desirable to be used by others. */
+/*
+** Functions used internally by caf.c, and by the cleaner program, and cafls
+** but probably aren't useful/desirable to be used by others.
+*/
 extern int CAFOpenReadTOC(char *cfpath, CAFHEADER *ch, CAFTOCENT **tocpp);
 extern int CAFReadHeader(int fd, CAFHEADER *h);
 extern off_t CAFRoundOffsetUp(off_t offt, unsigned int bsize);
 extern CAFBITMAP * CAFReadFreeBM(int fd, CAFHEADER *h);
 extern void CAFDisposeBitmap(CAFBITMAP *cbm);
+
 /*
-** note! CAFIsBlockFree needs the fd, since blocks of the free bitmap may 
+** Note:  CAFIsBlockFree needs the fd, since blocks of the free bitmap may
 ** need to be fetched from disk.
 */
 extern int CAFIsBlockFree(CAFBITMAP *bm, int fd, off_t block);

Modified: timecaf.c
===================================================================
--- timecaf.c	2009-09-07 08:25:38 UTC (rev 8621)
+++ timecaf.c	2009-09-07 08:27:03 UTC (rev 8622)
@@ -35,9 +35,9 @@
     DIR	       		*sec; /* open handle on the 2nd level directory */
     DIR 		*ter; /* open handle on 3rd level dir. */
     struct dirent	*topde; /* last entry we got from top */
-    struct dirent	*secde; /* last entry we got from sec */ 
-    struct dirent	*terde; /* last entry we got from sec */ 
-    CAFTOCENT		*curtoc; 
+    struct dirent	*secde; /* last entry we got from sec */
+    struct dirent	*terde; /* last entry we got from sec */
+    CAFTOCENT		*curtoc;
     ARTNUM		curartnum;
     CAFHEADER		curheader;
 } PRIV_TIMECAF;
@@ -56,8 +56,8 @@
 typedef enum {FIND_DIR, FIND_CAF, FIND_TOPDIR} FINDTYPE;
 
 /*
-** Structures for the cache for stat information (to make expireover etc. 
-** faster. 
+** Structures for the cache for stat information (to make expireover etc.)
+** faster.
 **
 ** The first structure contains the TOC info for a single CAF file.  The 2nd
 ** one has pointers to the info for up to 256 CAF files, indexed
@@ -97,7 +97,7 @@
 static CAFTOCL3CACHE *TOCCache[256]; /* indexed by storage class! */
 static int TOCCacheHits, TOCCacheMisses;
 
-    
+
 static TOKEN MakeToken(time_t now, ARTNUM seqnum, STORAGECLASS class, TOKEN *oldtoken) {
     TOKEN               token;
     uint32_t            i;
@@ -110,7 +110,7 @@
      * "xxxxyyyy" the hexadecimal sequence number seqnum. */
     if (oldtoken == (TOKEN *)NULL)
 	memset(&token, '\0', sizeof(token));
-    else 
+    else
 	memcpy(&token, oldtoken, sizeof(token));
     token.type = TOKEN_TIMECAF;
     token.class = class;
@@ -136,14 +136,14 @@
     *seqnum = (ARTNUM)((ntohs(s2) << 16) + ntohs(s1));
 }
 
-/* 
+/*
 ** Note: the time here is really "time>>8", i.e. a timestamp that's been
 ** shifted right by 8 bits.
 */
 static char *MakePath(time_t now, const STORAGECLASS class) {
     char *path;
     size_t length;
-    
+
     /* innconf->patharticles + '/timecaf-nn/bb/aacc.CF'
      * where "nn" is the hexadecimal value of the storage class,
      * "aabbccdd" the arrival time in hexadecimal (dd is unused). */
@@ -196,7 +196,7 @@
 ** Routines for managing the 'TOC cache' (cache of TOCs of various CAF files)
 **
 ** Attempt to look up a given TOC entry in the cache.  Takes the timestamp
-** as arguments. 
+** as arguments.
 */
 
 static CAFTOCCACHEENT *
@@ -267,7 +267,7 @@
 }
 
 /*
-** Do stating of an article, going thru the TOC cache if possible. 
+** Do stating of an article, going thru the TOC cache if possible.
 */
 
 static ARTHANDLE *
@@ -296,13 +296,13 @@
 	cent = AddTOCCache(timestamp, toc, head, tokenclass);
 	free(path);
     }
-    
+
     /* check current TOC for the given artnum. */
     if (artnum < cent->header.Low || artnum > cent->header.High) {
 	SMseterror(SMERR_NOENT, NULL);
 	return NULL;
     }
-    
+
     tocentry = &(cent->toc[artnum - cent->header.Low]);
     if (tocentry->Size == 0) {
 	/* no article with that article number present */
@@ -379,7 +379,7 @@
 			token.type = TOKEN_EMPTY;
 			return token;
 		    }
-		} 
+		}
 	    } else {
                 warn("timecaf: could not OpenArtWrite %s/%ld: %s", path, art,
                      CAFErrorStr());
@@ -417,14 +417,14 @@
 	CloseOpenFile(&WritingFile);
 	return token;
     }
-    if (CAFFinishArtWrite(fd) < 0) { 
+    if (CAFFinishArtWrite(fd) < 0) {
 	SMseterror(SMERR_UNDEFINED, NULL);
         warn("timecaf: error writing %s: %s", path, CAFErrorStr());
 	token.type = TOKEN_EMPTY;
 	CloseOpenFile(&WritingFile);
 	return token;
     }
-    
+
     return MakeToken(timestamp, art, class, article.token);
 }
 
@@ -512,13 +512,13 @@
     private->topde = NULL;
     private->secde = NULL;
     private->terde = NULL;
-    
+
     if (amount == RETR_ALL) {
 	art->data = private->artdata;
 	art->len = private->artlen;
 	return art;
     }
-    
+
     if ((p = wire_findbody(private->artdata, private->artlen)) == NULL) {
 	SMseterror(SMERR_NOBODY, NULL);
 	if (innconf->articlemmap)
@@ -560,7 +560,7 @@
     ARTHANDLE           *art;
     static TOKEN	ret_token;
     time_t		now;
-    
+
     if (token.type != TOKEN_TIMECAF) {
 	SMseterror(SMERR_INTERNAL, NULL);
 	return NULL;
@@ -569,15 +569,15 @@
     BreakToken(token, &timestamp, &artnum);
 
     /*
-    ** Do a possible shortcut on RETR_STAT requests, going thru the "TOC cache"
-    ** we mentioned above.  We only try to go thru the TOC Cache under these
+    ** Do a possible shortcut on RETR_STAT requests, going through the "TOC cache"
+    ** we mentioned above.  We only try to go through the TOC Cache under these
     ** conditions:
-    **   1) SMpreopen is true (so we're "preopening" the TOCs.)
+    **   1) SMpreopen is true (so we're "preopening" the TOCs).
     **   2) the timestamp is older than the timestamp corresponding to current
-    ** time. Any timestamp that matches current time (to within 256 secondsf
-    ** would be in a CAF file that innd is actively 
+    ** time.  Any timestamp that matches current time (to within 256 seconds)
+    ** would be in a CAF file that innd is actively
     ** writing, in which case we would not want to cache the TOC for that
-    ** CAF file. 
+    ** CAF file.
     */
 
     if (SMpreopen && amount == RETR_STAT) {
@@ -590,7 +590,7 @@
     path = MakePath(timestamp, token.class);
     if ((art = OpenArticle(path, artnum, amount)) != (ARTHANDLE *)NULL) {
 	art->arrived = timestamp<<8; /* XXX not quite accurate arrival time,
-				     ** but getting a more accurate one would 
+				     ** but getting a more accurate one would
 				     ** require more fiddling with CAF innards.
 				     */
 	ret_token = token;
@@ -605,7 +605,7 @@
 
     if (!article)
 	return;
-    
+
     if (article->private) {
 	private = (PRIV_TIMECAF *)article->private;
 	if (innconf->articlemmap)
@@ -618,7 +618,7 @@
 	    closedir(private->sec);
 	if (private->ter)
 	    closedir(private->ter);
-	if (private->curtoc) 
+	if (private->curtoc)
 	    free(private->curtoc);
 	free(private);
     }
@@ -631,7 +631,7 @@
 DoCancels(void) {
     if (DeletePath != NULL) {
 	if (NumDeleteArtnums != 0) {
-	    /* 
+	    /*
 	    ** Murgle. If we are trying to cancel something out of the
 	    ** currently open-for-writing file, we need to close it before
 	    ** doing CAFRemove...
@@ -649,7 +649,7 @@
 	DeletePath = NULL;
     }
 }
-	    
+
 bool timecaf_cancel(TOKEN token) {
     time_t              now;
     ARTNUM              seqnum;
@@ -682,7 +682,7 @@
 
 static struct dirent *FindDir(DIR *dir, FINDTYPE type) {
     struct dirent       *de;
-    
+
     while ((de = readdir(dir)) != NULL) {
         if (type == FIND_TOPDIR)
 	    if ((strlen(de->d_name) == 10) &&
@@ -832,7 +832,7 @@
     newpriv->curheader = priv.curheader;
     newpriv->curtoc = priv.curtoc;
     newpriv->curartnum = priv.curartnum;
-    
+
     snprintf(path, length, "%s/%s/%s", priv.topde->d_name, priv.secde->d_name, priv.terde->d_name);
     art->token = PathNumToToken(path, priv.curartnum);
     art->arrived = priv.curtoc[priv.curartnum - priv.curheader.Low].ModTime;