Scaling

Olaf Titz olaf at bigred.inka.de
Wed Dec 6 12:41:47 UTC 2000


> studies and servers needed to have 100 simultaneous
> readers (equivalent to a service population of 10,000)
> to break-even bandwidth.  (The bandwidth of nnrpd
> requests would equal 2GB/day.)  Most universities and
> small ISPs don't have that kind of service population.

Small ISPs, OK, but universities?

Another point which frequently is a source of gripe with _big_ ISPs is
quality of service. People want a server with little downtime and few
missing articles. Many also want a server with binaries. The quality
of the news server becomes an important factor in choosing ISPs
sometimes. Those who do put a lot of work and money into building a
good service themselves are not eager to ruin this by relying on any
kind of outsourcing.

>   1. Implement a proxy into nnrpd which makes a request to
>     storage servers.  The storage servers would split the
>     load by only storing a subset of a full feed:
>         - certain newsgroups
>         - certain subsets of message IDs (subset of hash space?)
>         - or certain dates (most recent 1GB?)

Doesn't Diablo do something like this already?

>   2. A NNTP equivalent to HTTP redirects.
>     The news client would get something like a HTTP 304 response to
>     a GROUP, (which included authentication credentials) and then
>     would connect to another server.

This, like other means of distributing/outsourcing reader load, does
not sufficiently take into account the problem of access. I presume the
percentage of Internet-_application_ users who don't have a connection
to the _Internet_ is growing, not shrinking. The point where most users
are in RFC1918 adress space or behind firewalls may soon be reached.
A fundamental requirement on any protocol extension must be that it be
proxy-friendly from the start. Don't make the same mistake as with
HTTP again.

>   3. To do article-by-article redirects, a connectionless NNTP
>     might be of some help, but it is probably better to
>     piggyback on HTTP for this.  (Standardize a MIME type and URI
>     scheme for a newsgroup article, etc.) (Yes I know there already
>     is a news: scheme, which isn't going to work here.  We need
>     something that is stateless.)

I see no problem protocol-wise, only with the client and server software.

  >GET news:<abcdef at isp.example.com> HTTP/1.0
  >
  <HTTP/1.0 200 OK
  <Content-Type: message/rfc822
  <
  <Path: podunk!isp!example.com!not-for-mail
  <Newsgroups: misc.test
  <(etc)

This can even be pipelined with HTTP/1.1. We could also implement
XOVER or LISTGROUP etc. this way:

  >GET news:misc.test/xover/12345-12456 HTTP/1.0
  >
  <HTTP/1.0 200 OK
  <Content-Type: text/plain
  <
  <12345 test   Joe Blow <jblow at example.edu>    ...
  <(etc)

Using HTTP for this kind of queries has the advantage that it takes
away the dependence on a certain NNTP server, and it transparently
handles proxying. (Note that I always use a proxy URL scheme here.)
It can also use HTTP authentication, it can also do POST, etc.

To implement this, new ultra-lightweight server side software would be
the key. Considering the years of experience which went into improving
Apache's performance to the current level, I don't think anyone can
come up with a good solution any time soon.

The beauty of a scheme like this is that it allows to solve all kinds
of distribution/proxying/outsourcing/balancing issues on the server
side. What the clients need is to configure one address as their news
server, connect to that server (which is in their RFC1918 address
space) only and have the server handle all the rest. Look at how this
works with IRC for another example.

One big problem you didn't even mention: there has to be some way to
keep article numbering consistent. This also is a server-side issue,
clients will just see what the server presents to them.

> To make any successful there has to be some consensus
> to use new methods of cooperation. What's the best way to
> foster and reach that consensus?  What modes of cooperation
> and load sharing are most likely?

A solution which takes into account all psychological issues with
running and using Usenet service as it is now, and this means taking
into account the many reasons not to rely on anything outside, even
though technically redundancy isn't as important as in the UUCP days.
Anything which reeks of centralization or global setting of
operational standards is bound to fail, not because it is
technologically inferior but because people will reject it.

> Others probably already have given thought to some of those
> ideas.  What issues do admins need to have addressed before
> they start opening up their servers and cooperating more?

Opening up the servers isn't the most important issue, IMHO. I know
many people whose only objections against opening up would be load on
their server and spam. Any proposal which _distributes_ load better
would be welcome, and the problems with abusive postings are best
solved with requiring user authentication and taking action against
abusers.

Spam is yet another problem - it has to be controlled at the immediate
source. I think an absolute requirement on any new news software would
be that it has some way of rate-limiting and EMP detection built in at
the POST command.

Olaf



More information about the inn-workers mailing list