Suggestion for innd feature desired by Cornell Univ
Todd Olson
tco2 at cornell.edu
Wed Dec 11 15:32:18 UTC 2002
Hi
Following a decision by the Cornell Univ board of trustees,
I must now change the way I operate our news server.
Below I describe the problem
then I describe how I would like to see INN address it
then I describe what Cornell has done that necessitates this.
I'd be interested if others felt this might be useful
or if there is already a straight forward away to solve this problem
The problem:
How to configure a server that takes a full feed
and have it send to a peer a feed that is determined by article *content*
??????
Discussion:
In INN 2.2.2 we are able shape an outgoing feed in simple ways
via the newsfeeds config file
select based on news group
select based on size
select based hop count, cross posts, etc (other simple things)
As far as I can tell we can't use newsfeeds to select on content
Further the current model is that the *receiving* host should
run cleanfeed and just throw away any articles it does not want.
The drawback is that you have to use network bandwidth anyways
because you have to have the article sent to you to examine it
Note: the problem is to clean an outgoing feed ... not an incoming feed
The suggestion:
A tiny modification to innd that permits a program like cleanfeed
to hand to innd a custom tag (a 32bit word maybe?) and add to
the newsfeeds mechanism a new flag that lets us specify a test
against this new custom tag.
One possible syntax would be
+mask Accept article if cleanfeed_tag & mask <> 0
ie at least one of the set mask bits must be set in tag
-mask Reject article if (cleanfeed_tag & mask) Xor mask = 0
ie all of set mask bits must be set in tag to reject
The tag would have no meaning to innd
The tag would not be computed by innd (it would be computed by cleanfeed)
The values for the tag would be defined by the site
Example:
via cleanfeed site defines
bit0 of tag = 1 if "binary" posting, 0 otherwise
bit1 of tag = 1 if "yenc" posting, 0 otherwise
bit2 of tag = 1 if "sex" posting, 0 otherwise
then a newsfeeds flag field including
+4 accepts posting only if sex bit is set
+6 accepts posting only if sex or yenc are set
+4,+2 accepts posting only if both sex and yenc are set
-4 rejects posting if sex bit is set
-6 rejects posting only if both sex and yenc are set
-4,-2 rejects posting if either sex or yenc are set
Note that the logic of "+" and"-" has to be different because
newsfeeds defines the ',' to be a logical AND of the conditions.
That is -6 is the opposite of +4,+2 ... etc ...
NOTE: an advantage of this approach over an approach that does
all the analysis on the outgoing feed, after innd, is that
the articles don't have to be fetched a second time
and if several sites want cleaned feeds but have different overlapping
requirements, the analysis does not have to be done twice,
nor are complex feed structures required to avoid double analysis
(Boy ... isn't this obscurely phrased ...)
The motivation:
Cornell has decided that each IP address at Cornell must pay
for the bytes it sends and receives over the WAN.
The rate is roughly $3/1G
Thus our modest incoming newsfeed of 45G/day cost about $4000/month
And our bean counters have had heart failure.
So working with our peers I have managed to get them to stop
sending us the obvious large volume groups, which gets us down
to about 4G/day ... except that on holidays a lot of non obvious
binary groups start getting binaries and the feed can double.
That unpredictability also worries the bean counters.
So the next obvious thing to try is to have our peers consume
CPU power on our behalf and try to analyse the postings on their
end and not send us the "binary" postings.
If I could do this it looks like I could reduce our feed to maybe
2.5G./day or about $225/month or about $2700/year for incoming
... which the bean counters might be able to swollow ...
And with this feature I can then control what we send out
so if some disgruntalled community member posts a lot of binaries
to drive our costs up, they just won't get set out to our peers.
Unfortunately, as far as I can tell, INN does not have a good way
of cleansing an outgoing feed.
So if I can't do with INN want I want to ask my peers to do for me
I end up feeling rather awkward.
So it would be useful from our view here if INN had this feature added.
(what does everyone else thing?)
I am sufficently unknowledgeable about the internal bits of INN
that it will take me a long time figure out how to add such a feature.
Regards,
Todd Olson
More information about the inn-workers
mailing list