[InterNetNews/inn] 8243c2: expireover: Bloom filter for fast history checks

Kevin Bowling noreply at github.com
Fri May 22 22:37:49 UTC 2026


  Branch: refs/heads/main
  Home:   https://github.com/InterNetNews/inn
  Commit: 8243c2b640bd57e59f0781e306c65c4dd7184dbb
      https://github.com/InterNetNews/inn/commit/8243c2b640bd57e59f0781e306c65c4dd7184dbb
  Author: Kevin Bowling <kevin.bowling at kev009.com>
  Date:   2026-05-22 (Fri, 22 May 2026)

  Changed paths:
    M .gitignore
    M MANIFEST
    M doc/pod/expireover.pod
    M doc/pod/inn.conf.pod
    M doc/pod/libinnhist.pod
    M expire/expireover.c
    M history/his.c
    M history/hisinterface.h
    M history/hisv6/hisv6-private.h
    M history/hisv6/hisv6.c
    M history/hisv6/hisv6.h
    A include/inn/bloom.h
    M include/inn/history.h
    M include/inn/innconf.h
    M include/inn/ov.h
    M lib/Makefile
    A lib/bloom.c
    M lib/innconf.c
    M samples/inn.conf.in
    M storage/expire.c
    M storage/ov.c
    M storage/ovinterface.h
    M support/mkmanifest
    M tests/Makefile
    M tests/TESTS
    A tests/lib/bloom-hiswalk-t.c
    A tests/lib/bloom-t.c

  Log Message:
  -----------
  expireover: Bloom filter for fast history checks

Add a bloom filter for fast history existence checks.

expireover checks every article in the overview database against the
history file to detect orphaned entries.  This requires a per-article
HISlookup, which does random pread() calls into the DBZ index and
history text file.  On large spools (1B+ articles), this takes months.

Add a bloom filter that is built from a single sequential HISwalk of
the history file at startup.  The bloom filter acts as a positive-only
cache in OVhisthasmsgid: bloom hits skip the slow HISlookup, bloom
misses fall through to HISlookup for correctness.  False positives
are benign (an orphaned overview entry survives one extra cycle).

The bloom filter is controlled by the new inn.conf parameter
expirebloomfp, which specifies the false positive rate as a reciprocal
(default 10000 = 0.01%).  Setting it to 0 disables the bloom filter.
Memory usage is approximately 20 bits per article (48 MB for 20M
articles, 2.4 GB for 1B articles).

Changes:
- Add lib/bloom.c and include/inn/bloom.h (bloom filter implementation
  using enhanced double hashing, Kirsch & Mitzenmacher 2006)
- Extend HISwalk callback signature to include the message-ID HASH
  (HISwalk has had zero callers since it was added in 2001)
- Set hisv6_walk ignore=true so corrupt lines don't abort the walk
- Add OVTOKENCACHE to OVctl for passing the bloom filter to
  OVhisthasmsgid
- Add expirebloomfp to innconf
- Add unit tests (lib/bloom-t.c) and integration tests
  (lib/bloom-hiswalk-t.c)

close #339



To unsubscribe from these emails, change your notification settings at https://github.com/InterNetNews/inn/settings/notifications


More information about the inn-committers mailing list