INN multi-server thoughts.

Thu Feb 15 19:39:31 UTC 2001

Alex Kiernan writes:
> 
> Dan Merillat <harik at chaos.ao.net> writes:
> 
> > I'm looking into making a multi-server addon to INN.  Basically, the
> > ideal setup would be transit server -> storage server -> cluster of
> > readers.
> > 
> 
> We're proposing exactly the same thing.

It's very common, and since normal INN is already finished for a hobby
installation, the most pressing need is large site scaleability.

There's a number of people who've gone ahead and done this for their
sites... generally meaning code fork and making it very difficult for
them to update to the latest code.  Some of these forks go back to 1.4.1.

There's a lot of wasted effort with everyone writing the same code. I'd 
like to get at least a framework for multi-server tools into standard INN,
so that sites don't have to fork so badly.  It would be great to see some
of the various patches updated and merged back into the standard tree.

> More or less what we intend doing, I've an extra piece which we're
> proposing - a numbering server... the transit server will push a feed
> to a numbering cluster (we intend changing the active file management
> so we can cluster two innds on separate boxes for resilience), that
> then distributes articles to each storage server (distributed somehow,
> probably not just message-id hash, more likely by moving average of
> article and size).

My thought is that the distribution from ID hash is going to be basically
random.  Consider a binary "flood".  Very similar message-IDs and sizes,
but the hash is going to distribute them pretty evenly.  Keep it Simple...
if it turns out there's a serious load balancing problem then spend extra
time coding.

Besides,  take the last byte of the hash and you've got 256 shares, so you
can dynamically balance load by moving the percentages of articles from one
machine to another. "ctlinnd load '1:128 2:64 3:64'", as an example.

> I was proposing to implement an NNTP storage manager which would query
> the appropriate backend storage server. The Xref header in each
> message would indicate which storage server an article was located on.

Won't work, since there's no way to lookup a message by ID then.  You'd
have to scan the entire overview database.  That was my first idea as well.

Still, unless you have over 256 storage servers, adding a server number
to your SM token won't be a big deal.  I'm sure there's plenty of room
there now.

> > But who should generate the overview information?  Should the transit do
> > it?  Or should the storage servers do it and each feed all readers?
> > 
> 
> We're planning on implementing a header only feed on the storage
> servers which feeds the readers, each reader then generates a unified
> overview database. In addition I've been toying with ways of pushing
> articles for small, text only, frequently read groups to the reader
> servers.

I was going to use a variant on the timecaf code to cache articles on
the storage server.  Modify to expire the files based on access time
rather then modification time. (and expire if their SM token is expired
on the main server.)

Of course, this means keeping a seperate database of access times for
nnrpd to use.  It may be simpler to just throw a CNFS-type store on.
Since the same 9 gig of articles get read every day, a 20 gig IDE drive
on the reader will probably do quite well, even though the cache is
FIFO.

Of course, this means, again, keeping a second history file, or perhaps
extending the history-API (and moving to a binary history!)  For reader
boxes, it's not as expensive, since there's no negative message-ID
information, only positive history enteries.

Of course, the way things are handled now this whole idea would be one
big kludge.  What's needed is a way to feed both positive and negative
article information to the reader servers.  I.E. every time I overrun an
article on CNFS I send out a kill message on the sync stream, followed
by the overview information on the article that overwrote it.  If we
threw the message-ID hash of the article right before the article in
CNFS, it would be very easy to read, lookup and kill.  (Or do we
already?  There's some binary data there)

I'd like to hear from people who've already done this work, even on an
old fork.  What problems did you run into?  What bottlenecks are there?
How do you handle syncronization issues?

--Dan