buffindexed: could not open overview

Katsuhiro Kondou Katsuhiro_Kondou at isc.org
Sun Dec 1 05:43:29 UTC 2002


In article <20021130220750.GA381 at cdp.kigam.re.kr>,
	Sang-yong Suh <sysuh at kigam.re.kr> wrote;

} And do I have to read/understand all the codes in buffindexed.c?

I'm writing the explanation of buffindexed.  Still in
the way and may not well described, but here shows it
anyway for your understanding.
-- 
Katsuhiro Kondou


0. why buffindexed was written?

  When I ran ov3 which was written by Clayton O'Neill in 1999, ov3 could
never keep up with full feed which was my environment.  There are some
reasons why it is.  First, innd has to open two overview files(data and
index) when overview data is stored.  There is an option 'overcachesize' to
keep those files opened, but in a full feed environment, it doesn't help so
much since access to newsgroup of overview data is almost randomly.  The
most painfulness of storing overview data with ov3 is the overhead of
opening two files.  And furthermore, many of nnrpd try to open overview
files almost randomly and simultanesouly.  This leads tremendous I/O
bottleneck of the disks.  So, I decided to write new overview method to
resolve these problems.

1. data files for overview data and index

  First, it's necessary to avoid additional open() when overview data is
stored to the storage method.  To achieve this, buffindexed opens large
buffer files which include overview data and index.  The opened files look
like cycbuff which is used by cnfs.  Here is the layout of the file which
is called buffer hereafter.

+----------+--------------------+------+------+------+------+------+
|OVBUFFHEAD|allocation bit field|BLOCK1|BLOCK2|BLOCK3|......|BLOCKn|
+----------+--------------------+------+------+------+------+------+

OVBUFFHEAD includes attributes of the buffer and some information.
ABF(allocation bit field) is bit map of each block to indicate whether it's
in use or not.  BLOCK is the data area which can be used for overview data
and index.  BLOCK size is 8192 bytes which can be useful over NFS.
However, this does not mean that buffindexed can be used over NFS.  The
major difference from cnfs is that buffer is NOT cyclical.  Namely, it's
necessary to allocate a BLOCK before its use and free it when unnecessary.

  The basic concept of buffindexed is to keep overview data and index into
BLOCKs which are allocated for each newsgroup.  This reduces the cost of
open() when storing/retrieving overview data.

  $pathetc/buffindexed.conf is used, if buffindexed is selected for
overview method.  buffindexed.conf includes three information on buffer
files.  First, index which is used to determine which buffer file
internally.  Second, file name which indicates buffer file name.  The last
one is the size of buffer file.  When program which uses buffindexed as a
overview method starts, buffindexed.conf is read and OVBUFF is created.
OVBUFF includes some of information like index which is just exlpained,
file name, file descriptor of the buffer file, the number of total/free
BLOCK, pointer to ABF, next free BLOCK to be allocated, chunk number used
as a hint for next free BLOCK, etc.

2. index to each newsgroup

  There is a index file to access each newsgroup; $pathdb/group.index.
Here is the layout of this file.

+-----------+-------------+-------------+-------------+---------------+
|GROUPHEADER|GROUPENTRY[0]|GROUPENTRY[1]|.............|GROUPENTRY[n-1]|
+-----------+-------------+-------------+-------------+---------------+

GROUPHEADER includes attributes of the file and the list header of unsed
GROUPENTRY.  GROUPENTRY includes some of information of each newsgroup;
hi/lo mark, hi/lo mark within overview data, 1st BLOCK of overview index,
last BLOCK of overview index, offset to next unused index space in last
BLOCK of overview index, last BLOCK of overview data, offset to next unused
space in last BLOCK of overview data, etc.

3. overview index 

  overview index BLOCK includes two types of data.  One is OVINDEXHEAD
which indicates next overview index BLOCK of the newsgroup, highest and
lowest article number in this ovewview index BLOCK.  Another data is
OVINDEX which is an array of overview index.  The array includes the
information of each overview data; article number, buffer index which is
equivalent to index in buffindexed.conf, which BLOCK in the buffer, offset
to the overview data from the begining of the BLOCK, length of the overview
data, etc.

4. overview data

  overview data BLOCK includes only overview data.  All information which
indicates where overview data begins and its length is included in overview
index.  As described before, the size of BLOCK is 8192 bytes, which means
the maximum size of overview data is limited to 8192.  If the size exceeds
the limitation, storing overview data fails.

5. storing overview data

  Storing overview data process is done when new overview data arrives,
rebuilding overview data from the spool or expiry.  Here describes how
overview data is stored.

  a. find GROUPENTRY for the newsgroup 
  b. lock GROUPENTRY by inn_lock_range() to avoid other processes updating
     overview data for this newsgroup
  c. allocate new BLOCK if there is no or left space in the current BLOCK
     is too small to store
  d. store overview data first by mmapwrite() or pwrite()
  e. then store overview index by mmapwrite() or pwrite().
  f. update the index for the newsgroups(GROUPENTRY) if necessary
  g. unlock GROUPENTRY by inn_lock_range() so that other processes can
     update  overview data for this newsgroup

6. BLOCK allocation

  BLOCK allocation is done when there is no space for storing overview data
and index.  The current allocation policy is like followings;

  a. each allocation will be done by round robin from the buffer files
     within the same process
     e.g. there is three buffer files, A, B and C
          1st allocation will be from A, 2nd allocation will be from B, 3rd
          allocaltion will be from C, and 4th allocation will be from A
     the reason why this is to distribute disk I/O
  b. OVBUFFHEAD includes next available(free) BLOCK number.  If this equals
     to total BLOCK number this means no available BLOCK left in this
     buffer, and search another buffer until available BLOCK is found.
  c. 


More information about the inn-workers mailing list