2.5 wish list (was Re: pre-2.4 (was Re: buffindexed jumbo patch) )

Russ Allbery rra at stanford.edu
Tue Dec 24 06:34:00 UTC 2002


Sang-yong Suh <sysuh at kigam.re.kr> writes:

> I wish spam filtering ( perl/python filering ) can be run parallel with
> article reception.  This will benifit for a SMP machine, which have
> enough idle time on other CPUs.  Would it be possible?  My server is
> getting slow and slow these days.  It spends a lot of time on perl
> filtering...

I'm not sure how to do this without threading INN.

The main thing preventing us from threading INN are the storage and
overview interfaces, which right now need serious work to clean up.  The
majority of the problem is that the data structures used internally are
far too simple and inconvenient, which creates a ton of work on either
side of the API gluing things together and taking them apart.

For example, the API to writing overview data is:

OVADDRESULT
OVadd(TOKEN token, char *data, int len, time_t arrived, time_t expires);

where the data is the already-formatted overview information.  Note that
there is no *key* given for the overview API, only the data.  To see what
newsgroups and article numbers the data should be stored for, the data is
parsed, the Xref header found and parsed, and the formatted article
numbers are converted back to real article numbers.

If we instead had something like:

OVADDRESULT
OVadd(struct overview *, struct newsgroups *, struct overdata *)

where struct overview is an opaque data structure corresponding to an open
overview database and the others are something like:

    struct artnum {
        char *newsgroup;
        ARTNUM number;
    };

    struct newsgroups {
        size_t count;
        struct artnum *groups;
    };

    struct overdata {
        char *data;
        size_t length;
        TOKEN token;
        time_t arrived;
        time_t expires;
    };

it would be easier to write code on both ends of the interface.

Just having the struct overview * opaque struct would make threading
actually conceivable, which it isn't really right now.

Similar issues apply to the storage API, although it's not quite as
complex since the type of data that it stores is actually much simpler.
The main thing that needs to be done is an overhaul to avoid static
variables all over the article storage paths, which will require passing
around more pointers.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.


More information about the inn-workers mailing list