Suggestion for innd feature desired by Cornell Univ

Todd Olson tco2 at
Wed Dec 11 15:32:18 UTC 2002


Following a decision by the Cornell Univ board of trustees,
I must now change the way I operate our news server.
Below I describe the problem
then I describe how I would like to see INN address it
then I describe what Cornell has done that necessitates this.
I'd be interested if others felt this might be useful
   or if there is already a straight forward away to solve this problem

The problem:
   How to configure a server that takes a full feed
   and have it send to a peer a feed that is determined by article *content*

      In INN 2.2.2 we are able shape an outgoing feed in simple ways
      via the newsfeeds config file
         select based on news group
         select based on size
         select based hop count, cross posts, etc (other simple things)
      As far as I can tell we can't use newsfeeds to select on content

      Further the current model is that the *receiving* host should
      run cleanfeed and just throw away any articles it does not want.
      The drawback is that you have to use network bandwidth anyways
      because you have to have the article sent to you to examine it

      Note: the problem is to clean an outgoing feed ... not an incoming feed

The suggestion:
      A tiny modification to innd that permits a program like cleanfeed
      to hand to innd a custom tag (a 32bit word maybe?) and add to
      the newsfeeds mechanism a new flag that lets us specify a test
      against this new custom tag.  

      One possible syntax would be

            +mask      Accept article if cleanfeed_tag & mask <> 0
                         ie at least one of the set mask bits must be set in tag

            -mask      Reject article if  (cleanfeed_tag & mask) Xor mask = 0
                         ie all of set mask bits must be set in tag to reject

      The tag would have no meaning to innd
      The tag would not be computed by innd (it would be computed by cleanfeed)
      The values for the tag would be defined by the site

         via cleanfeed site defines
             bit0 of tag = 1 if "binary" posting, 0 otherwise
             bit1 of tag = 1 if "yenc"   posting, 0 otherwise
             bit2 of tag = 1 if "sex"    posting, 0 otherwise

         then a newsfeeds flag field including 
            +4        accepts posting only if sex bit is set
            +6        accepts posting only if sex or yenc are set
            +4,+2     accepts posting only if both sex and yenc are set

            -4        rejects posting if sex bit is set
            -6        rejects posting only if both sex and yenc are set
            -4,-2     rejects posting if either sex or yenc are set

         Note that the logic of "+" and"-" has to be different because
         newsfeeds defines the ',' to be a logical AND of the conditions.
         That is -6 is the opposite of +4,+2  ... etc ...

      NOTE: an advantage of this approach over an approach that does
         all the analysis on the outgoing feed, after innd, is that
         the articles don't have to be fetched a second time
         and if several sites want cleaned feeds but have different overlapping
         requirements, the analysis does not have to be done twice,
         nor are complex feed structures required to avoid double analysis
         (Boy ... isn't this obscurely phrased ...)

The motivation:
      Cornell has decided that each IP address at Cornell must pay
      for the bytes it sends and receives over the WAN.
      The rate is roughly $3/1G

      Thus our modest incoming newsfeed of 45G/day cost about $4000/month

      And our bean counters have had heart failure.

      So working with our peers I have managed to get them to stop 
      sending us the obvious large volume groups, which gets us down
      to about 4G/day ... except that on holidays a lot of non obvious
      binary groups start getting binaries and the feed can double.
      That unpredictability also worries the bean counters.

      So the next obvious thing to try is to have our peers consume
      CPU power on our behalf and try to analyse the postings on their
      end and not send us the "binary" postings.  

      If I could do this it looks like I could reduce our feed to maybe
      2.5G./day or about $225/month or about $2700/year for incoming
      ... which the bean counters might be able to swollow ...

      And with this feature I can then control what we send out
      so if some disgruntalled community member posts a lot of binaries
      to drive our costs up, they just won't get set out to our peers.

      Unfortunately, as far as I can tell, INN does not have a good way
      of cleansing an outgoing feed.

      So if I can't do with INN want I want to ask my peers to do for me
      I end up feeling rather awkward.

      So it would be useful from our view here if INN had this feature added.
      (what does everyone else thing?)

      I am sufficently unknowledgeable about the internal bits of INN
      that it will take me a long time figure out how to add such a feature.

Todd Olson


More information about the inn-workers mailing list