INNT overview..

Sun Feb 25 00:21:48 UTC 2001

Fabien Tassin <fta at sofaraway.org> writes:

> As time passes, it seems like reader support will not be impossible and
> that it can be used as a basis for the next generation of INN. My
> implementation is far from complete but I want to explain as early as
> possible what I'm trying to do. *IF* INNT is used in INN in the future,
> should we try to re-use the code of INN at all costs ?

No.  But I do think it's a good idea to try to use it as a starting point
whenever possible.  In a lot of cases, I think the data structures are
reasonable, even if the code isn't particularly good.

There are parts of INN that I'd recommend reusing:  the new chunks of
libinn (error, xmalloc, concat, the various replacement functions for
missing or substandard functionality, and the new hash table code) in
particular, the build system where possible, and as much of storage
(particularly CNFS) as you can salvage.  The latter is because the code
has been banged on a lot and has some useful supporting architecture like
cnfsstat that you don't really want to reinvent.

Much of INN really needs to be rewritten completely, though; it's grown
organically over the years and it was never written with threading in mind
so it's a lot messier than it should be.

I'm hoping that we can keep Alex's new history system, and I'm also still
hoping to kill all the extra configuration file parsers with something
general (the hash tables were the first step towards that).

> New code is also a problem for maintenance. CURRENT and STABLE are quite
> similar and a bug fixed in one can often be ported in the other quite
> easily.  This will not be true for INNT and this brings the question:
> should I/we limit the design and/or the code of INNT to fit INN ?

I'd really like to put INNT into the same source tree as INN so that we
can put it into CURRENT and it can get more exposure that way, but that
doesn't mean it needs to share anything with INN other than the
configuration and build system (and I can help merge that in).  I do think
using libinn (and making the parts of libinn that you need thread-safe) is
a good thing to do and I'd like to help with that.

I have a bunch of fairly sweeping mechanical code changes that I'd like to
make so that new projects can more easily use INN's architecture; one of
those is to rewrite both paths.h and nntp.h into headers that are more
consistent and cleaner about namespace.  I also want to integrate building
Perl modules into the INN build system.

> Use of macros is also questionnable. Russ did a lot of work in the macro
> and compatibility systems in INN. I will probably use the compatibility
> code (strerror, strcasecmp, snprintf, etc.) but I'm not fond of macros
> such as NEW/DISPOSE/SIZEOF if it adds nothing. I see interest of macros
> for systematic error checking and I will probably use some for that in a
> near future.

Dump NEW/DISPOSE/SIZEOF/etc.  I'd recommend dumping all of macros.h if you
can.  I don't like anything in that file except maybe EQ and friends.

I've been sorely tempted to remove NEW/DISPOSE/etc. from everywhere in INN
for a while now.  The only reasons why I haven't are that it's a sweeping
mechanical code change and I don't want to do that while there are major
patches outstanding (first Katsuhiro's redesign of article buffers and
then Alex's history changes that are currently pending) since those sorts
of changes make merging a real pain, and because there *is* some argument
for NEW's interface being superior to that of malloc.  It makes it
somewhat harder to allocate the wrong amount of memory.  But I don't think
that's a good enough argument; I think we can assume that people working
on INN will know C, and you can't use C without knowing how to call
malloc.

> Now the design..

> - multi-threaded daemon using POSIX Threads (pthreads)
> - the main thread is a loop waiting for incoming NNTP connections and doing
>   the periodic tasks such as statistic creation. [done]
> - once an incoming connection is detected, a new thread is launched. [done]
>   Some checks are made: max connection not reached (both global and per peer)
>   and peer is allowed (by name or address). [almost done but I still need
>   an incoming conf file to describe this]

This sounds good.

> - only a few NNTP commands are supported, including streaming:

> 200 news InterNetNews Transit NNTP Server INNT version 0.01 ready
> help
> 100 Legal commands
>  help
>  list
>  mode stream
>  ihave <msg-id>
>  check <msg-id>
>  takethis <msg-id>
> Report problems to <usenet at sofaraway.org>
>   [all of these are implemented]

I can't think of anything else you really need.

> - no active file supported. "list" returns a fake list with only control
> and junk. It should not be a problem to have an active file by using an
> external process (or even a dedicated thread) fed with headers and
> control messages.  If reader support is wanted, overviews can also be
> generated that way.

This sounds fine.

> - configurable timeout on read.
> - configurable incoming streaming queue size.

This also sounds good.

> - central (locked) history checks (include a builtin precommit cache)
>   Should we remove unneeded fields in history ? I'm currently using stock
>   dbz.c and my own API waiting for something better. I plan to use expire
>   for a while too.

I'd like to see if you can use Alex's and see if the new history API can
support history files with fewer fields cleanly.  I think it can;
basically, for transit, all you need is the interface to insert an entry
without a token.

>   Should I check message-id syntax ?

Yes, please.  But you may not want to use the code in INN; instead,
something closer to the current MESSFOR and USEFOR stuff would be better.
I can take a look at this at some point....

> - on the fly article preparation:
>    - "\r\n"
>    - Path updated
>    - Newsgroups and Distributions headers splitted
>    - Xref removed if present
>   no other fields are checked/changed.

Good.

Don't forget dot-stuffing for lines beginning with a period.  Oh, and
while you're designing the article propagation path, try to make it
completely binary clean (including nuls) if you can.  There's no real
reason not to, and we can always reject such articles anyway.  It's hard
to retrofit that to an existing design.

> - articles are stored dated and ref counted (against outgoing wishes) in
>   a central queue (memory)

Excellent.  I think that's a good design.

> - a filter can be called on each article reseting its ref count if
> rejected.  It is not clear whatever the filter should only avoid an
> article to be stored locally or also block its propagation to *all*
> other peers. It can be desirable to have both filtred and full outgoing
> peers.

Probably something to make configurable after we have some experience with
what people want.  Avoiding disk writes is a *big* performance gain,
though.

> - each outgoing peer is a thread. This thread monitors the central queue
>   for articles and try to propose them to peer based upon user preferences
>   (number of connections, max checks, etc.). Ref count is decremented when
>   an article has been succesfully sent.

Sounds good.

> - if a ref count becomes zero, the article can be removed from the central
>   queue. I don't know if on a transit only server, these articles should
>   be stored on disks or not. Perhaps it can be configurable.

There's no real need to and it's a *huge* performance win not to.
Configurable may be nice for tracking down problems... but I would expect
most people to be running a transit server front-ending a reader server,
and the article will then be on the reader server.

> - if an article stays in the central queue for a too long period or if the
>   queue is full, it is stored on disks and added to the backlog queue of
>   each peer that still needs it. Should we store them sooner and replace
>   the memory copy by an mmaped copy ?

I think memory usage is the driving concern here; if memory usage gets too
high, that's the time to start spilling things to disk.

> - outgoing wishes (given in an equivalent of newsfeeds) can be:
>   - groups including poison and groups count
>   - distributions
>   - pathhosts or path size
>   - full article vs headers vs path only
>   - size or size range
>   See my recent syntax proposal for "newsfeeds.conf" merging incoming.conf,
>   newsfeeds and innfeed.conf.

Right.  I'll get to that.  (I do really want the full capabilities of the
current newsfeeds file if at all possible.)

> - storage uses the INN API, mainly for CNFS. As the article structure is
>   different, the code must be changed (and cleaned).

Right.

> - inn.conf equivalent for all non-peer related parameters (innt.conf ?).
>   As for newsfeeds.conf, reloads must be as smart as possible.

inn.conf is on my list to completely rework in INN, which may make this
easier.

> - no plan for a ctlinntd yet.

I recommend it; it's one of INN's really nice features.  It's a great way
to be able to peak at the running state on demand without having to wait
for periodic logs.

> - autoconf
> - IPv6
> - innpeers-stat as log analyzer.

Excellent.

> here is the essential. I'd like to hear your comments. If someone is
> interested to work on the code before I release it, contact me.

I'm happy to merge even an early version into the INN CVS tree and we can
probably get you CVS write access to make that easier if you want.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>