Cancel distribution network ideas
Olaf Titz
olaf at bigred.inka.de
Sun Apr 16 23:14:41 UTC 2000
> >This is basically like NoCeM with all the formatting removed, and
> >without indication of the target newsgroups. (Should we keep the
> >target newsgroups?
> Yes, because sites which don't get a full feed want cancels, but
> will not want to waste bandwidth for cancels in groups they don't carry.
How about an optional additional message element telling (the closure
of) the groups where the cancels apply to? The cancelnet daemon could
use this in a newsfeeds-like configuration to decide what to send
where. The unability to discriminate here is one of the main
weaknesses of NoCeM as it stands now.
Proposal: element "D" (distribution), content: list of the longest
prefixes of all of the newsgroups affected.
Ex.: contained messages are in alt.binaries.erotica.male,
alt.binaries.erotica.female, twenty other alt.binaries.erotica.*,
de.alt.dateien.mannsbilder, de.alt.dateien.weibsbilder => the D
element would contain "alt.binaries.erotica,de.alt.dateien". I don't
think wildmatching is of any benefit here, we can fall back to
simple string prefixing.
My first thought was "forget about that, as the target audience for my
protocol is mainly servers who have all.all or a significant portion
thereof".
> >- Receive messages on a UDP port,
> >- Accept connections from configured peers on a TCP port,
> What about UUCP systems?
This idea (originally credited to Joe Greco by Russ Allberry) is about
an additional measure to distribute cancels fast within a backbone,
not necessarily the whole of Usenet.
When the cancels propagate faster than the original articles plus the
time for the UUCP batchers (unlikely to be less than an hour), UUCP
sites won't ever see both of them. I'm not sure if it is sensible at
all to keep a cancel method (either Control: cancel or NoCeM) for mass
cancellations which reaches out into the UUCP networks. The reasoning
is that the Internet part of Usenet is so much faster that it should
always be possible to cancel spam before it hits the batchers 99%, so
let's do just that and keep the whole trash away from the expensive
phone lines in the first place. The part of spam which originates in
the Internet is very close to 100% anyway.
> >Message loop detection:
> >- Each message contains a hop counter which is incremented on the way.
> > When it reaches a configured maximum, the message is discarded.
> >- Messages carry a timestamp, they are discarded when too old.
> This is very ugly because a lot of cancels will be sent more than one
> time to the same peer.
With my protocol, a single cancel is not much more in bytes than an
NNTP IHAVE plus 435 response...
> You are stripping a lot to reduce size, but you
> have not showed why you think current cancels are too big.
Any news admin who still runs traditional spool can certainly show you
a directory listing as proof for the latter 8-). More on that below.
> >- It can be assumed that one issuer does not issue more than one
> > message with the same timestamp (i.e. more than one per second).
> You can't assume this! Most cancelbots generates more than one cancel
> per second.
Even with batching notices to one message per second? With a 4k
message size limit and 64 bytes per Message-ID quoted, this would
allow for ~60 cancelled IDs per message. I don't know how fast the
actual spam cancellers are nowadays, but I sometimes hear numbers that
INN on modern machines can digest from 20-100 messages per second. Not
all of them are spam. (Only 1/3, and another 1/3 are spam cancels.
This figure alone is proof enough to me that we need another protocol
for spam cancels, one which does not load the article filing/
distribution mechanism.)
Okay, we can of course introduce serial numbers (16 bit should be
enough then).
> Why reinventing the wheel? I think cancels should be carried with the
> same protocol of normal articles, because that will solve all problems
> except authentication. I think we should consider developing (another)
> extension for authenticating traditional cancels or maybe a new kind of
> control message.
This concern is semi-valid, but the basic idea is a network to
distribute bulk cancels - for spam, etc. - _as fast as possible_.
In an ideal world without spam or with proactive filtering everywhere
we woudln't need that, but you can see how far we are from that.
I call it only semi-valid because the "Control: cancel" protocol is
_horribly_ inefficient for purposes of bulk cancelling (that's the
only thing we're talking about; I'm all for keeping that protocol,
with Cancel-Lock added, for the occasional cancel of an article by the
author or his administrator). In the extreme case of spam postings
consisting only of an URL, the cancel is just as big as the spam
posting itself.
Olaf
More information about the inn-workers
mailing list