Theory questions about ovdb

Heath Kehoe heath.kehoe at
Fri Jul 7 17:42:10 UTC 2000

>[Posted to the list, but I want Heath to see and respond here,
>since I haven't seen this discussed on list.]
>Two short questions, and then the important one.
>How often is delete_old_stuff(), called and, for how many
>newsgroups does it call delete_all_records()?  

Ok, here's how its supposed to work:
  1) In the expireover run, expireover calls ovdb_expiregroup()
     (via OVexpiregroup) for every group in the active file.
     In addition to its normal expiration activity,
     ovdb_expiregroup() updates a timestamp in the groupstats
     record for that group.

  2) After all groups in active are processed, expireover calls
     OVexpiregroup with group set to NULL.  When ovdb_expiregroup
     is called in this way, it calls delete_old_stuff().  Note
     that expireover only does OVexpiregroup(NULL,NULL) if all
     of the groups in the active file have been processed during
     its run.

  3) delete_old_stuff looks at each groupstats record, and sees
     if its timestamp is newer then the start of the expireover
     run.  If it is not, it assumes that the group is no longer
     in the active file, and calls delete_all_records() to remove
     all entries for that group.

While looking at the code, I realized that there is a bug:  the
variable that delete_old_stuff compares the timestamp against
only gets initialized if groupbaseexpiry is on.  So if
groupbaseexpiry is off, that variable will be 0, and
delete_all_records will never be called.  I'll get a patch out
today to fix that.

>In other words, what does the Berkeley DB do internally
>for a:
> = &dk;
>   key.size = sizeof dk;
>   val.flags = DB_DBT_PARTIAL;
>   ret = dbcursor->c_get(dbcursor, &key, &val, DB_SET_RANGE)) {
>   loop
>     ret = dbcursor->c_get(dbcursor, &key, &val, DB_NEXT);
>Is it O(1)?  (I hope!)
>Does it make sense to add a retry limit and failure with log
>(such as in ovdb_groupadd() and elsewhere.)  How often
>in real life circumstances does a retry get needed?  Right
>now it looks like it could infinite loop.

I guess I don't understand your question... what do you mean
by retry?  The c_get(... , DB_SET_RANGE) puts the cursor at the
beginning of the keys for the group to be deleted; the loop
does c_get(... , DB_NEXT) to return the next consecutive key
(remember, it's a B-tree), and the loop ends when a key is
reached for a different group or if the c_get returns an error.

>Now the big question.
>The other overview schemes which have been used in INN
>have (more or less) well known read/write/seek characteristics.
>How do the internals of Berkeley DB work for things like:
>   - groupstats->put(), and db->put()  
>       (done for each call to ovdb_add())
>   - and c_get()
>       (done in ovdb_search())
>I guess the main questions for comparison are:
>     Are records with the same key clustered on disk?
>     Is there any magic that new allocations ( ->put() )
>     can write to sequential locations on disk?  
>I don't see how either of these is possible, unless
>writing into a fresh database, or one that was swept
>and packed, or maybe some other magic.  
>If there is none of these, there is a per-article seek
>penalty (2 for writing, 1 for reading.)
>For reading, that sounds a lot like uniover to me.
>   (uniover is slow at reading, fast at writing.)
>For writing, that sounds a lot like tradindexed.
>   (tradindexed is slow at writing, and fast at reading.)

I don't really know what is going on behind the scenes.
I do know that I do not have a per-article seek, or my
disks would be thrashing a lot more than they are.
Keep in mind that the DB library does extensive caching.

My own experience is that ovdb is faster than both
*indexed in all respects.  In terms of Physical IO,
on my server, the overview disks are much less busy
than the history/active disk.


More information about the inn-workers mailing list