Natterings about history files

Sat Feb 10 10:27:15 UTC 2001

According to Russ Allbery:
> 
> Unfortunately, if I understand how this works correctly, we can't use a
> perfect hash function even for old history files, since the point of a
> perfect hash function is that you'll only ever look up keys that you fed
> to your function generator when you made it.  Since one of the main
> purposes (and the most common queries) for old history entries is whether
> one has an article that one *doesn't* have, this doesn't work very well.
> We'd be looking up a whole bunch of keys that we didn't give to the
> perfect hash generator, and which therefore would likely collide left and
> right in unuseful ways.

Do you have an idea of the rate collision in the current model ?
I assume it is quite high unfortunatly. In the past, Matt Dillon did
the exercise for its Diablo history system but I can't put my hand on
its report.

> Anyway, one thing that I think might be a win if we stay with the nightly
> expire idea for history entries would be separate, daily history files.
> The lookup time would be somewhat slower, but it would save a lot of time
> with expire since there's no need to even look at the history files that
> are younger than /remember/.

I've tried to evaluate the gain in having multiple history files but I'm
not convinced it will be that interesting. The problem is more in
the expire design than in history. We want it to undo very quickly what has
been done in a day. A discrete approach should give better results.
I see two ways to achieve that : a cyclic history (constant size)
or an expired. The later is easier because it require no change to the
model to bring the pure CNFS performances to other spool methods. The
bad point is that it still need to be periodically cleaned.

> The API that I'm currently considering is:
> 
>     struct history *HISopen(const char *path, int flags);
>     bool HISclose(struct history *);
>     bool HISsync(struct history *);
> 
>     void HISsetcache(struct history *, size_t size);

I prefer to hide the cache completly. In INNT, I've merged HISopen and
HISsetcache into HISsetup.

>     bool HISlookup(struct history *, const char *key, time_t *arrived,
>                    time_t *posted, time_t *expires, TOKEN *token);
>     bool HIScheck(struct history *, const char *key);

are your keys HASH or message-id ?

>     bool HISwrite(struct history *, const char *key, time_t arrived,
>                   time_t posted, time_t expires, const TOKEN *token);

used for HISremember too ?
(I've called mine HISadd and HISaddbogus)

>     bool HISreplace(struct history *, const char *key, time_t arrived,
>                     time_t posted, time_t expires, const TOKEN *token);

replace ? you mean have to remember transition such a expired but remembered
and cancelled ?

>     bool HISexpire(struct history *, time_t threshold,
>                    bool (*exists)(TOKEN *));

can you explain this one ?

>     struct histstats *HISstats(struct history *);
> 
>     const char *HISerror(struct history *);
> 
> I want to hide details like the hashing function used inside the history
> mechanism so that we can replace them and fiddle with them without having
> to worry about the interface.  I'd also like to make the history struct
> opaque to all callers.  The HISopen flags would be things like HIS_RDWR,
> HIS_READ, HIS_MMAP, HIS_CREATE, and the like and would use the path as the
> prefix (so, for example, the standard history file would be at
> concat(innconf->pathdb, "/history")).

in that case, you need some more low level functions such as those
called by makedbz.

> I think the best organizational structure to use in terms of code is to
> make a new directory called history and build a library out of the history
> code in there.  libinnhist?  libhistory is too general and would conflict
> with part of readline.

I'll welcome it if it hides dbz completly from the rest of INN.

-- 
Fabien Tassin -+- fta at sofaraway.org