Scaling

Joe St Sauver JOE at OREGON.UOREGON.EDU
Fri Dec 8 18:01:56 UTC 2000


>Date: Thu, 07 Dec 2000 23:37:02 -0500 (EST)
>From: "Forrest J. Cavalier III" <mibsoft at epix.net>
>Subject: Re: Scaling
>To: inn-workers at isc.org
>Cc: forrest at mibsoftware.com
>Message-id: <200012080437.eB84b1N13543 at bean.epix.net>
>
>I think some have missed my key point: 
>
>The point is not how sites are providing service today.  The point
>is how full sites will continue to do a good job in the long term.
>Full feed growth is outstripping Moore's law, and that is a problem.

I'm not sure that the rate of feed growth exceeding the rate of 
CPU speed increase or disk capacity increase is actually an issue
given that Usenet is a partitionable problem.

Put differently, if Usenet gets too busy to be handled by a single
server, the path to happiness is rather clear (at least to me): you
run multiple parallel boxes, and divide the load across the boxes.

This is the same approach that has become commonplace for web servers
(rack of 1U's plus a load balancing switch), for compute servers
(e.g., Beowulf boxes running PVM or MPI), and for a host of other
functions. The same approach can work for Usenet. 

>Is everyone just planning to subset Usenet?  No one carries a
>full feed?  Even that doesn't scale as well as you think.

I think lots of folks will still *feed* a full feed; finding folks
who carry a fullish feed for reading purposes for a reasonable 
length retention period will become less common (but hey, as far 
as I can tell, no publicly available server carries all ~140K or 
so groups or near-groups that are or have been in circulation --
my lists from a while back are at:

http://www-vms.uoregon.edu/~joe/active.       (note the trailing dot)
http://www-vms.uoregon.edu/~joe/groups.nolog  (what we don't carry)).

Usenet is already a fragmented mess if your feed box requires 
an active file...

>The current transport mechanism (flood-fill) isn't designed
>to work well in a fragmented system.  Peering arrangements
>and the protocol are too static and coarse to allow selective
>group feeds.  

I believe that would be an important area to address. Just like
Burger King, you should be able to get your FeedWhopper (tm) made
just the way you like it, on a group-by-group basis if that's 
what's desired. "Extra ketchup, hold the lettuce, fine grained 
feed specs don't upset us..."

For example, speaking normatively, there should be a "feed sync"
function which queries a downstream peer, and then populates a 
local table on the feeder with group-by-group information about
what the downstream peer wants to see. The feeder's innfeed (or
other feed tool) or the innd outbound queueing code should check
the downstream peer's feed table entries, and process the received
articles accordingly *on-a-group-by-group* basis. 

Alternatively, I guess, pull feeds will become more commonplace,
even though I really dislike a polling model for moving articles. 

>How do you find a peer willing to exchange a feed of 
>      alt.binaries.not.carried.in.too.many.places
>and assure that all sites carrying that newsgroup remain
>"connected?"

Lots of places carry and feed * (or spam filtered *), but finding 
a site that has -reader service- for group foo is a trickier issue. 
However, that is more a matter of people being unwilling to share
their active files (and their anti-active files, for lack of
a better term -- what you DON'T carry is as important as what you
DO carry). 

I would also hypothesize that sites which carry many obscure groups
are more likely to attract corner-case customers, where corner-case
may sometimes be a synonym for "problematic."

>Everyone is so used to having a very well connected mesh
>of Usenet sites.  You dump an article to your peers,
>and you can assume it reaches everywhere.

Usenet is definitely well connected for Big 7 groups, it is 
mostly well connected for regionals, foreign language, and 
specialty groups, and it is somewhat well connected for 
what we might call "mainstream binaries." However I would
assert that Usenet is thinly connected when it comes to the 
fairly large raft of what's left beyond that (at least beyond
the *-moving or spam-filtered-*-moving core).

>When most sites are carrying and propagating only subsets
>of Usenet, that assumption is invalid. How can that be fixed?

Remember, for core propagation purposes, I'm not sure there's
really an issue. For reading, there's definitely an issue,
but that's a function more of trying to maintain congruence
among active files across tens of thousands of sites. 

Regards,

Joe



More information about the inn-workers mailing list