pure transit server

Fri Feb 2 02:22:34 UTC 2001

According to Russ Allbery:
> 
> > Filtering is not excluded but is not in my current todo list.
> 
> I ask because it impacts the threading model.

Sure. But I start to think that have a filtering mecanism available
in the scheme of inntd will allow it to have the same set of functionnalities
than innd. So there no point in doing inntd. hmm..

>  Perl can't cope with being
> run from more than one thread, and from what I've heard Python copes
> badly, or can cope badly.

Right. I'm no longer following Perl developpments but I thought it was
a point that need to be worked on in 5.7 and the future 6.x.

>  Plus, you really don't want to make filter
> authors deal with locking of data structures if you can avoid it.
>  So the
> most straightforward approach of running the filter inside each NNTP
> listener probably isn't the best idea.

No, I wanted to move it at the end of the line, just before the storage
API. This will let us the opportunity to have both filtred and unfiltred
peers.

> That's why I had the mental design of another queue that things would feed
> into for the filter, or probably actually a separate queue for each
> listener that is pulled from in a round-robin fashion by the filter to
> avoid some small lock contention.
> It does mean a single channel to send articles through, though, which is
> annoying.

yes. I need to think more about this.

> > exactly what's already done except for the final queue (and of course,
> > history which is actually only a kind of precommit cache).
> 
> History should be a thread-safe cache front-ending a thread-safe history
> mechanism vaguely like what we have already, I think.

What about WIP then ? looks like it becomes unless in this design..

> > I don't know if this single thread handling the queue is needed.  I was
> > thinking of way to address directly a CNFS buffer from an NNTP thread.
> 
> Like I mentioned, it depends a lot on where you put the filter.

exactly there. Just before SMstore().

> > I was thinking to do so even without having the article in memory but I
> > start to think it is not a so great idea.
> 
> I think you can actually keep a reasonable number of articles in memory
> without worrying about that part.  Figure a meg per binary article to be
> on the safe side, and my transit server only has 150 simultaneous
> connections tops... if you're carrying a full feed including a full binary
> feed, 300MB of memory isn't that much.  And if you're carrying less of a
> feed than that or have fewer peers, the memory consumption drops fairly
> quickly.
> 
> And of course most of your pending articles won't be anywhere near a meg
> in size.
> 
> > this requires to keep a lot of articles in memory which is something I
> > want to avoid, if possible. My current implementation using a kind of
> > traditionnal spool limits my memory footprint to 600KB (per thread but
> > it is difficult to know for the whole) while still beeing very fast.
> 
> With or without the history indexes?

without. Well, now, it tends to be more than that. It's about 1MB at the
beginning then increases to 2MB per thread with no visible reason.
I need to find why before continuing.

> > you mean rewriting innfeed from scratch ?
> 
> Yeah, I actually semi-started on that at one point, but never really got
> anywhere.

hmm. Can it be resurected ?

> > this is something I have supposed invevitable.. Here is what I have in
> > mind. Let inntd try to send articles itself as soon as they are received
> > and delegate them to innfeed when a peer is not ready/alive or is
> > overloaded.
> 
> If you have a threaded innfeed, you can just pull it into INN and then you
> don't have to do the same thing in two sets of code.

if it needs to be rewritten to be threadable, I see no point in
making it a separate process, it can be part of inntd.

> Right, the model for innfeed... keep a central list of pending articles
> that's reference-counted, and insert incoming articles into that list.
> Then add the article to the outgoing queue of each site sender by locking
> its queue and walking it to find the least full block.  Queues are divided
> into blocks, each one with a size equal to the number of CHECKs you're
> willing to send before a TAKETHIS.  Each time the sender finishes with a
> block, it grabs a new block and replaces it with the old, empty block, so
> it basically pulls N articles off its pending queue at a time rather than
> just one.

Looks like the Diablo dnewsfeed mechanism..

> Throw in some additional blocks for backlog and for deferrals; I shouldn't
> take the time to try to write up all the details right now, but it's a
> logical extension of the idea.

please, do. We are in inn-*workers* here, it's the place :)

> When it's done with articles, stick them in a disposal queue that gets
> reaped periodically to reduce the reference counts and free the memory
> allocated for the article if all of the sites have sent it.  Throw in
> something to start spilling articles to disk when the queues fill up.
> 
> It seemed like a fairly simple model conceptually; I mostly got bogged
> down in the details of trying to make something work threaded, and then
> never got back to it.
> 
> > My (purely mental) design for that is not very clear at the moment.  I
> > was thinking to another set of NNTP threads (outgoing this time) each
> > with a list of ref counted mmapped articles (directly into CNFS
> > buffers). These lists will be in-memory borned fifos, ie articles enter
> > the queue and they will have three ways to leave it: a) beeing sent, b)
> > being pushed out and then delegated to innfeed if the queue is
> > overloaded or c) being expired (and then delegated to innfeed) if they
> > are in the queue longer than a expiration time.
> 
> Yeah, that's where I started too, but then I was worried about how stuff
> kept wanting to lock that queue

as it is a per thread queue, it will never be locked.

> and how to deal with deferred articles and
> a few other things,

can't see the problem. deferred can be requeued once or kept in a 
deferred queue (but I don't like this idea). Do you remember these
other things ?

> and dividing the pending queue up into blocks that
> could be grabbed as a unit seemed cleaner.

Perhaps but it is just seen as more queues.

> When integrated into INN, I think what you really want to do is hand
> innfeed the live article in memory to start working with and try to start
> writing it to storage at the same time, and then let the storage system
> replace the allocated memory with mmap'd memory when it finishes writing
> to disk.

Having 2 process makes things difficult but my point is to feed articles
ASAP. There are some limits I'm not ready to pass yet but if there was a clean
way to pass them, I'll do it. For example, start proposing articles to
peers as soon as we've acked it (238 and 335) from a peer. Are some servers
already doing this ? (cyclone, nntprelay ?)

> >> Yes.  The storage API is really itching for a rewrite.
> 
> > I don't think I need the storage API. I only want CNFS.
> 
> If you can get it, you really want the storage API, I think.  This is one
> of the things that's kept INN going against, say, Diablo; if someone down
> the road comes up with a new, cool way of storing articles that's faster
> than CNFS, you want to be able to drop it in.  Plus, the storage API is a
> good layer at which to handle at least some of the locking issues, I
> think.

you're probably right but this forces me to maintain all methods when
I want to extend/change the API.

> > I wonder if libinn should not be splitted into 3 or 4 libs.  All INN
> > binaries are linked with the big kitchen sink, even when they only need
> > xmalloc() or die().
> 
> Part of this is actually on the TODO list for things that I want to do if
> we ever start over from a fresh repository and completely reorganize the
> code base.

just create one :) We already have STABLE and CURRENT, we can have DEVEL
or PROJECT or FUTURE or whatever you want.

>  I can see three libraries at least (portable replaceements for
> OS functionality, including wrappers like xwrite and so forth; libnntp for
> various basic NNTP functionality; and something to pick up all the various
> utility stuff like error and xmalloc and daemonize and setfdlimit).  Maybe
> a fourth library for configuration file parsing.

I agree except for the NNTP lib for which I see no point.. and no
easy way to do this either. It is interesting for a NNTP client but the server
side is much more complicated.

> > As I will not have time to do everything alone, I wish to distribute my
> > code ASAP.
> 
> I can test on Solaris, which gives you a completely separate pthreads
> implementation and the other major Unix platform.

I'll try to polish inntd (at least the incoming part) and I will submit it
to you for the brand new repository.

-- 
Fabien Tassin -+- fta at sofaraway.org