after-the-fact filtering?

christian mock cm at tahina.priv.at
Fri May 24 09:18:29 UTC 2002


> > Request. If small post here (this is hardly a high volume list), or you

> Seconded. This would be a really useful tool.

OK, then. Just don't blame me when you get a headache from reading 
the code.

What it does is to read your .newsrc, and scan a specific group for 
crossposts to "forbidden groups" and for regexps in the subject, and 
mark those posts that match as read.

It does a bit of article list optimization to reduce the number of 
nntp XHDR requests, and basically just needs the cancel_nntp sub from 
nocem-perl transplanted to do what the original poster wanted.

ciao,

cm.

--- cut ---
#!/usr/bin/perl
# -*- perl -*-
#
# cancelbot against the flood of nonsense appearing in alt.privacy.anon-server
#
@badgroups = ("alt.dads-rights.unmoderated", "alt.usenet.kooks");

$group_re = join("|", @badgroups);


@p1 = ("", '\[WARNING\] ', "Tr: ", '\[STATS\] ', '\[INFO\] ', '\[ANNOUNCE\] ');

@p2 = ("Austria ", "Uppnorth ", "Rot13 ", "Frog ", "Shinn ", "Septic ",
       "Randseed ", "Rascal ", "Mix ", "Stealth ", "Noisebox ", "Gretchen ",
       "Passthru ", "Tuttle ", "Cmeclax ", "Squirrel ", "Riot ", "Swiss ",
       "Cracow ", "Farout ", "Senshi ", "Cracow2 ", "Xganon ", "Green ",
       "Cracker ", "Exonet ", "Arick ", "Dizum ", "Winter ", "Bruble2 ",
       "LefArris ", "Lcs ", "Licious ",
       "His cat ", "The dog ", "TLA ", "Gore ", "Bush ", "The bishop ",
       "Gates ", "Arafat ", "His cat ", "FBI ", "CIA ", "NSA ",
       "His in-law ", "The neigbour ", "My brother ", "My sister ",
       "Clinton ");

@p3 = ("", "definitely ", "absolutely ", "probably ", "sure ", "certainly ",
       "fucking ");

@p4 = ("asks ", "wants ", "loves ", "needs ", "used ", "uses ",
       "would love ", "requires ");

@p5 = ("", "to write ", "to infect ", "to fist-fuck ", "to read ",
       "to encode ", "to burn ", "to fuck ", "to code ", "to decypher ",
       "to print");

@p5a = ("", "remaining of ", "most of ", "more of ", "all ", "plenty of ");

@p6 = ("", "these ", "those ", "some ");

@p7 = ("", "priapic ", "nice ", "crunchy ", "smelly ", "tasteful ", "spotty ",
       "politically correct ", "obnoxious ", "tasty ", "gentle ", "sick ",
       "terrible ");

@p8 = ("MIX keys", "FBI", "perl scripts", "democrats", "CIA", "TLA", "NSA",
       "republicans", "jews", "mexicans", "pommies", "VB code", "onions",
       "radishes", "carrots", 'C\+\+ code', "faggots", "potatoes",
       "PGP code", "algorithm", "toilet paper");

@p9 = ('', ' ?\?', ' *!!!!', ' ?\.', ' ?\.\.\.', ' ?!', ' ?\?\?\?');

$re = "(" . join("|", at p1) . ")(" . join("|", at p2) . ")(" . join("|", at p3) .
    ")(" . join("|", at p4) . ")(" . join("|", at p5) . ")(" . join("|", @p5a) .
    ")(" . join("|", at p6) .
    ")(" . join("|", at p7) . ")(" . join("|", at p8) . ")(" . join("|", at p9) . ")";

## print "$re\n";

## $re =~ s/\s/./g;




$apas = "alt.privacy.anon-server";


use Net::NNTP;
use News::Newsrc;
use POSIX;

$newsrc = new News::Newsrc;

$newsrc->load || die "load .newsrc";




$nntp = Net::NNTP->new("news");

($narts, $fart, $lart, $group) = $nntp->group("alt.privacy.anon-server");

print "group: $narts $fart $lart $group\n";

if($group != "alt.privacy.anon-server") {
    die "NNTP group returns wrong group name \"$group\"";
}

@arts = $newsrc->unmarked_articles($apas, $fart, $lart);

print "unmarked: " . ($#arts + 1) . "\n";

#print join("\n", @arts), "\n";

$first = shift @arts;
$cur = $first+1;

foreach $artno (@arts) {
    if($artno > $cur) {
	if($first == --$cur) {
	    push(@list, $first);
	} else {
	    push(@list, [$first, $cur]);
	}
	$first = $artno;
    }
    $cur = $artno+1;
}

push(@list, [$first, $cur-1]);

foreach $e (@list) {
    %groups = %{$nntp->xhdr("Newsgroups", $e)};
    foreach $art (sort {$a <=> $b} keys %groups) {
	$arts++;
##	print "$groups{$art}\n";
	foreach (split(",", $groups{$art})) {
	    if(/$group_re/o) {
		$gr_match++;
		$newsrc->mark($apas, $art);
		last;
	    }
	}
    }
}
## print "\nTOTAL MATCHES (group/regex/total): $gr_match / $re_match / $arts\n";

foreach $e (@list) {
    %subs = %{$nntp->xhdr("Subject", $e)};
    
##    print "\%subs: " . scalar(%subs) . "\n";

    foreach $art (sort {$a <=> $b} keys %subs) {
	next if $newsrc->marked($apas, $art);
	$arts++;
	if($subs{$art} =~ /^$re$/o) {
##	    print "$subs{$art}\n";
	    $re_match++;
	    $newsrc->mark($apas, $art);
	} else {
##	    $subs{$art} !~ /^Re:/ && print "$subs{$art}\n";
	}
    }
}

## $newsrc->save_as($ENV{'HOME'} . "/apasnewsrc");
$newsrc->save;

print "\nTOTAL MATCHES (group/regex/total): $gr_match / $re_match / $arts\n";

-- 
** christian mock in vienna, austria -- http://www.tahina.priv.at/~cm/
** VIBE!AT http://www.vibe.at/ ** wir sind nicht zum spass hier.
Magie kapot. Jij nu weg. En blijf weg. -- Bram

-- 
Hat irgendwer schlechte Schwingungen in seiner globalen
Eierkuchen-Aura bekommen weil man ihm gesagt hat er soll bitte nicht
andauernd mit Vollquotes in 10 Gruppen gleichzeitig crossposten?
 Albert Koellner in at.usenet



-- Attached file included as plaintext by Ecartis --

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: Exmh version 2.1.1 10/15/1999 (debian)

iQCVAwUBPO4FZTN1HR51PEZdAQKcfQP8Dd6YhOxs0vw5e5iwQRrvdIezlW5Hp6Zx
KmAYkRv1UYPzzfWTkO3W7mRv8uJ9B1awCIvtl3sRdNEBtfdhwk1cGzHg/f3KMf3N
Q40STkKe4Bq/zgm52X2dT/gmOBtpCIxGsExvRXvCcJ/Whg7xp2NmogVmMz56Am1m
k/GZ3ZcBgrc=
=yCIL
-----END PGP SIGNATURE-----




More information about the inn-workers mailing list