Scaling

Wed Dec 6 18:48:32 UTC 2000

>   1. Implement a proxy into nnrpd which makes a request to
>     storage servers.  The storage servers would split the
>     load by only storing a subset of a full feed:
>         - certain newsgroups
>         - certain subsets of message IDs (subset of hash space?)
>         - or certain dates (most recent 1GB?)

I don't know if this is exactly what you're talking about, but at the
ISP where I used to work, we used a tree of caching NNTP servers
(running nntpcache), with an outsourced news provider at the root.
Our future plans were to parse the logs to find what groups readers
read frequently, and retreive all articles in the most popular groups
during off-peak hours (when bandwidth and CPU time are essentially
free), using newshound (which comes with nntpcache).

In theory, it was a best of both worlds situation -- uses the least
possible bandwidth, CPU, and disk, since only articles that are going
to be read (or likely to be read, with newshound) are downloaded, and
they are only downloaded once; easy to manage; fairly inexpensive,
since we had an agreement with our upstream USENET provider to pay
them based on bandwidth we used.

In practice, it still worked pretty well, but reliability problems
both with the upstream USENET provider and the nntpcache software
caused us to have outages and slow access fairly often.  If these two
factors were fixed, I think it would have worked perfectly.

> 
>     I think HighWind already has this (called "chaining")
>     in Typhoon.  I don't think it caught on, because of
>     the historical attitudes of 
>          "we can't let outsiders use our NNTP resources"
> 
>          "we can't allow ourselves to be dependent on
>           someone else's choices about filtering and retention"
> 
>          "we need local newsgroups"
> 
>          "what about posting policies"
> 
>     None of those reasons are good enough to reject the
>     model outright.  But I agree that they need to be
>     addressed, and this can be accomodated with only
>     slight complexity to get authentication/restriction
>     and redundancy.

The caching model we used let the ISP have some control over local
newsgroups and posting policies, since the caching software could (in
theory) reject articles before they were posted, and nntpcache could
be configured to hit different servers for different groups, so local
groups were pretty easy to set up.

It did not do anything to solve the first two, although in practice we
had few problems.

>   2. A NNTP equivalent to HTTP redirects.
>     The news client would get something like a HTTP 304 response to
>     a GROUP, (which included authentication credentials) and then
>     would connect to another server.
> 
>   3. To do article-by-article redirects, a connectionless NNTP
>     might be of some help, but it is probably better to
>     piggyback on HTTP for this.  (Standardize a MIME type and URI
>     scheme for a newsgroup article, etc.) (Yes I know there already
>     is a news: scheme, which isn't going to work here.  We need
>     something that is stateless.)
> 
>     I think article-by-article  redirects are not going to be as big
>     a win as redirecting newsgroups, but they can be part of a mix.  
>     (Allowing an injecting server to be an eternal source of articles.)

Both of these are interesting, and would be easier to provide in the
NNRP server software than in each individual user's connection.
Combined with caching, these could give good performance, too.  But it
would mean that a number of remote servers would need to be up for a
user to read all of their newsgroups; one of the advantages of
USENET's current setup is that even if your ISPs Internet connection
is down, you can still read news until it comes back up.  :)

I'm not sure that these ideas do a whole lot to solve the USENET
scaling problem, anyways.  Last time I ran a news server (admittedly,
about 2 years ago), the whole thing could easily have been handled
with relatively little disk, CPU, and memory resources, if it weren't
for the damn binaries groups.  And unfortunately, the binaries groups
were the only ones many of our customers were interested in.  In order
for redirects to help with the binaries groups, somebody would need to
be willing to store a master copy of them somewhere, and use up huge
amounts of bandwidth serving them to the entire Internet.

Having users connect directly to different servers for different
groups also removes a lot of the anonymity of USENET, and makes it
possible for the person who runs the "master server" for a group to
deny users posting and reading rights by blocking certain IP address
ranges.  I'm not sure if that's an overall win or loss...

-----ScottG.