Natterings about history files

Tue Jan 16 22:42:12 UTC 2001

Forrest J Cavalier <mibsoft at epix.net> writes:

> Is there an iterator other than HISexpire()?  It seems like a perfectly
> good one, but just asking.

I was thinking about that and there probably should be, yeah.  Something
like:

    bool HISwalk(struct history *, void *cookie,
                 bool (*callback)(void *cookie, time_t, time_t, time_t,
                                  const TOKEN *));

> I think the HISexpire callback should be 

> bool (*exists)(void *cookie, time_t arrived, time_t posted, time_t
> expires, const TOKEN *token);

> (Having a cookie in any callback seems like a good idea, but the other
> time_t items should go along because they might be needed anyway in the
> callback, which should not have to do a HISlookup())

Yup, good point.  And what would the cookie be?  Something passed into
HISexpire that it passes back into the callback?

> I don't suppose you would consider implementing a "cheap lookup" which
> would only do something quick, and never hit the disk.

Actually, yeah, I was thinking about that with respect to some history
designs that I was dreaming up and I couldn't decide if it was useful or
not.  On one hand, I could see doing something like only checking the last
day's history on CHECK and only doing the full check on a TAKETHIS, but
then on the other hand if you say "I want this" to the CHECK, you're going
to get a TAKETHIS, so it isn't clear to me that this actually gains
anything in the long run.

> Any chance of asynchronous API?  I know, it makes your head spin.

This is probably in the "to be added later" department; INN really isn't
in any shape to take advantage of anything asynchronous yet.  I'm not sure
if I'd want to do asynchronous or if I'd want to fork of a history thread,
too... or even if they're mutually exclusive.

> Imagine opening N fds to the history text file (which may happen anyway
> because of multiple text files) and setting them non-block.

Hm.  Doing several lookups in parallel?  I wonder if that would actually
be any faster.

> Tell us more about HISreplace()

That should rewrite the data "in place".  Our current history format
doesn't support it (well, it does kind of in some circumstances, but it
wasn't designed for it), so it may just always return false for some types
of history and the caller will have to cope.  But for a binary history
format with fixed-size records, we should be able to update article
information in place, which would allow us to solve several useful
problems such as duplicating the traditional behavior of tradspool expire
by changing which storage API token a given message ID points to when the
article expires out of the first group to which it was posted but hasn't
yet expired out of other groups.

Oh, another interface that we should have:

  bool HISconvert(struct history *, const char *newpath, const char *type);

You could fake this with an iterator, but I think it's probably a better
idea to have it be a regular interface even if it's implemented with an
iterator inside.  And that reminds me; just having a HIS_CREATE flag to
HISopen isn't sufficient since you want to be able to specify your history
type when you create a history file for the first time, so strike that
flag, HISopen should fail if the history file doesn't already exist, and
add

  struct history *HIScreate(const char *path, const char *type);

which will basically be makedbz but for all our history types.

> Is "key" a message-ID or a hash?  I vote for message Id (do we get
> votes?), and please change the prototypes if so.

Message-ID, definitely, and will do.

> What are the goals for this?  Speed?  Clean-ness?  Reducing the size of
> the history file?

Clean-ness is the main goal of the API for me, that and encapsulation of
the history stuff like we've done for storage and overview so that someone
can write a new backend and drop it in place and nothing else in INN will
have to change.

I have a few ideas for a new history format that should be smaller, less
likely to run into large file problems, and probably faster, but that can
wait for the API, and someone else may have an excellent idea and beat me
to it.

Another thing this will let us do is get rid of the current mess between
regular and tagged hash and make converting back and forth possible, as
well as clean up the code and eliminate the need for the configure option.

Incidentally, I think I may know why Barry was seeing a lot more writes
with current history than with his hacked 1.5-based stuff.  One of the
changes that we picked up was to go to three files, having a history.hash
and a history.index.  Those are two tables with parallel structure, one of
them having the hash of each history entry (designed to answer the
question "do we have this article") and the other containing the offsets
into the text file at the same place that the relevant hash is at in the
first.  Before, with the earlier history format, writing to the history
file required two disk writes, one to the end of the history file (with
buffering and locality of writes) and the other to the hash table.
Currently, you need three writes, one to each of .hash and .index, and one
of those isn't even normally memory mapped and is done with pwrite, which
is unbuffered.

This is one of the tradeoffs that I keep running into when trying to
design a better history format; most of the designs that seem like they
might be faster, simpler, or smaller end up requiring either more writes
or losing the locality of writes on the disk (and therefore the ease of
buffering smaller writes into one larger write).

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>