INN commit: trunk (4 files)
INN Commit
rra at isc.org
Mon Aug 25 17:13:12 UTC 2014
Date: Monday, August 25, 2014 @ 10:13:12
Author: iulius
Revision: 9657
pullnews: new -a flag (hashfeed ability)
Add a new feature to pullnews: hashfeed to split feeds. It uses MD5
and is Diablo-compatible.
Thanks to Geraint Edwards for the patch.
Modified:
trunk/doc/pod/news.pod
trunk/doc/pod/newsfeeds.pod
trunk/doc/pod/pullnews.pod
trunk/frontends/pullnews.in
-----------------------+
doc/pod/news.pod | 3 +-
doc/pod/newsfeeds.pod | 3 +-
doc/pod/pullnews.pod | 40 +++++++++++++++++++++++++++++++++++-
frontends/pullnews.in | 53 ++++++++++++++++++++++++++++++++++++++++++------
4 files changed, 90 insertions(+), 9 deletions(-)
Modified: doc/pod/news.pod
===================================================================
--- doc/pod/news.pod 2014-08-24 13:25:28 UTC (rev 9656)
+++ doc/pod/news.pod 2014-08-25 17:13:12 UTC (rev 9657)
@@ -186,7 +186,8 @@
=item *
Several improvements have been contributed to B<pullnews> by Geraint
-Edwards: the new B<-B> flag triggers header-only feeding, the B<-m>
+Edwards: the new B<-a> flag adds the Diablo-compatible hashfeed
+ability, the new B<-B> flag triggers header-only feeding, the B<-m>
flag now permits to remove headers matching (or not) a given regexp,
and B<rnews> reporting is improved.
Modified: doc/pod/newsfeeds.pod
===================================================================
--- doc/pod/newsfeeds.pod 2014-08-24 13:25:28 UTC (rev 9656)
+++ doc/pod/newsfeeds.pod 2014-08-25 17:13:12 UTC (rev 9657)
@@ -440,7 +440,8 @@
Therefore, it allows to a generate a second level of deterministic
distribution. Indeed, if a news server is fed C<Q1/2>, it can go on
-splitting thanks to C<Q1-3/9_4> for instance.
+splitting thanks to C<Q1-3/9_4> for instance. Up to four levels of
+deterministic distribution can be used.
The algorithm is compatible with the one used by S<Diablo 5.1> and up.
If you want to use the legacy quickhashing method used by Diablo
Modified: doc/pod/pullnews.pod
===================================================================
--- doc/pod/pullnews.pod 2014-08-24 13:25:28 UTC (rev 9656)
+++ doc/pod/pullnews.pod 2014-08-25 17:13:12 UTC (rev 9657)
@@ -4,7 +4,8 @@
=head1 SYNOPSIS
-B<pullnews> [B<-BhnOqRx>] [B<-b> I<fraction>] [B<-c> I<config>] [B<-C> I<width>]
+B<pullnews> [B<-BhnOqRx>] [B<-a> I<hashfeed>] [B<-b> I<fraction>]
+[B<-c> I<config>] [B<-C> I<width>]
[B<-d> I<level>] [B<-f> I<fraction>] [B<-F> I<fakehop>] [B<-g> I<groups>]
[B<-G> I<newsgroups>] [B<-H> I<headers>] [B<-k> I<checkpt>] [B<-l> I<logfile>]
[B<-m> I<header_pats>] [B<-M> I<num>] [B<-N> I<timeout>] [B<-p> I<port>]
@@ -41,6 +42,43 @@
=over 4
+=item B<-a> I<hashfeed>
+
+This option is a deterministic way to control the flow of articles and to
+split a feed. The I<hashfeed> parameter must be in the form C<value/mod>
+or C<start-end/mod>. The Message-ID of each article is hashed using MD5,
+which results in a 128-bit hash. The lowest S<32 bits> are then taken
+by default as the hashfeed value (which is an integer). If the hashfeed
+value modulus C<mod> plus one equals C<value> or is between C<start>
+and C<end>, B<pullnews> will feed the article. All these numbers must
+be integers.
+
+For instance:
+
+ pullnews -a 1/2 Feeds about 50% of all articles.
+ pullnews -a 2/2 Feeds the other 50% of all articles.
+
+Another example:
+
+ pullnews -a 1-3/10 Feeds about 30% of all articles.
+ pullnews -a 4-5/10 Feeds about 20% of all articles.
+ pullnews -a 6-10/10 Feeds about 50% of all articles.
+
+You can use an extended syntax of the form C<value/mod:offset> or
+C<start-end/mod:offset> (using an underscore C<_> instead of a colon
+C<:> is also recognized). As MD5 generates a 128-bit return value,
+it is possible to specify from which byte-offset the 32-bit integer
+used by hashfeed starts. The default value for C<offset> is C<:0> and
+thirteen overlapping values from C<:0> to C<:12> can be used. Only up to
+four totally independent values exist: C<:0>, C<:4>, C<:8> and C<:12>.
+
+Therefore, it allows to a generate a second level of deterministic
+distribution. Indeed, if B<pullnews> feeds C<1/2>, it can go on
+splitting thanks to C<1-3/9:4> for instance. Up to four levels of
+deterministic distribution can be used.
+
+The algorithm is compatible with the one used by S<Diablo 5.1> and up.
+
=item B<-b> I<fraction>
Backtrack on server numbering reset. Specify the proportion (C<0.0> to C<1.0>)
Modified: frontends/pullnews.in
===================================================================
--- frontends/pullnews.in 2014-08-24 13:25:28 UTC (rev 9656)
+++ frontends/pullnews.in 2014-08-25 17:13:12 UTC (rev 9657)
@@ -13,6 +13,7 @@
# INN project. Major changes are:
#
# January 2010: Geraint A. Edwards added header-only feeding (-B);
+# added ability to hashfeed (-a) - uses MD5 - Diablo-compatible;
# enabled -m to remove headers matching (or not) a given regexp;
# minor bug fix to rnews when -O; improved rnews reporting.
#
@@ -121,13 +122,19 @@
}
$usage =~ s!.*/!!;
-$usage .= " [ -BhnOqRx -b fraction -c config -C width -d level
+$usage .= " [ -BhnOqRx -a hashfeed -b fraction -c config -C width -d level
-f fraction -F fakehop -g groups -G newsgroups -H headers
-k checkpt -l logfile -m header_pats -M num -N num
-p port -P hop_limit -Q level -r file -s host[:port] -S num
-t retries -T seconds -w num -z num -Z num ]
[ upstream_host ... ]
+ -a hashfeed only feed article if the MD5 hash of the Message-ID
+ matches hashfeed (where hashfeed is of the form value/mod,
+ value/mod:offset, start-end/mod, or start-end/mod:offset).
+ The algorithm used is compatible with the one used by Diablo;
+ see the pullnews man page for more details.
+
-b fraction backtrack on server numbering reset. The proportion
(0.0 to 1.0) of a group's articles to pull when the
server's article number is less than our high for that
@@ -231,11 +238,11 @@
";
-use vars qw($opt_b $opt_B $opt_c $opt_C $opt_d $opt_f $opt_F $opt_g $opt_G
- $opt_h $opt_H $opt_k $opt_l $opt_m $opt_M $opt_n
+use vars qw($opt_a $opt_b $opt_B $opt_c $opt_C $opt_d $opt_f $opt_F
+ $opt_g $opt_G $opt_h $opt_H $opt_k $opt_l $opt_m $opt_M $opt_n
$opt_N $opt_O $opt_p $opt_P $opt_q $opt_Q $opt_r $opt_R $opt_s
$opt_S $opt_t $opt_T $opt_w $opt_x $opt_z $opt_Z);
-getopts("b:Bc:C:d:f:F:g:G:hH:k:l:m:M:nN:Op:P:qQ:r:Rs:S:t:T:w:xz:Z:") || die $usage;
+getopts("a:b:Bc:C:d:f:F:g:G:hH:k:l:m:M:nN:Op:P:qQ:r:Rs:S:t:T:w:xz:Z:") || die $usage;
die $usage if $opt_h;
@@ -246,6 +253,7 @@
my $localServer = $opt_s || $defaultHost;
my $localPort = $opt_p || $defaultPort;
my $quiet = $opt_q;
+my $hashfeed = $opt_a || '';
my $header_only = $opt_B;
my $watermark = $opt_w;
my $retries = $opt_t || $defaultRetries;
@@ -288,6 +296,26 @@
die "``-z'' value not an integer: $opt_z\n" if defined $opt_z and $opt_z !~ /^\d+$/;
die "``-Z'' value not an integer: $opt_Z\n" if defined $opt_Z and $opt_Z !~ /^\d+$/;
+if ($hashfeed ne '') {
+ my $a_err = "``-a'' value not in format ``start[-end]/mod[:offset]'': $opt_a\n";
+ die $a_err if $opt_a !~ m!^(\d+)(?:-(\d+))?/(\d+)(?:[:_](\d+))?$!;
+ $hashfeed = {
+ 'low' => $1,
+ 'high' => $2 || $1,
+ 'modulus' => $3,
+ 'offset' => $4 || 0,
+ };
+ die $a_err if $hashfeed->{'low'} > $hashfeed->{'high'}
+ or $hashfeed->{'modulus'} == 0
+ or $hashfeed->{'offset'} > 12;
+ if ($hashfeed->{'low'} == 1 and $hashfeed->{'high'} == $hashfeed->{'modulus'}) {
+ $hashfeed = '';
+ } else {
+ require Digest::MD5;
+ Digest::MD5->import(qw/md5/);
+ }
+}
+
$quiet = 1 if $quietness > 1;
my %NNTP_Args = ();
$NNTP_Args{'Timeout'} = $opt_N if defined $opt_N;
@@ -409,7 +437,7 @@
print LOG " ``+'' is an article the downstream server accepted\n";
print LOG " ``x'' is an article the upstream server couldn't ";
print LOG "give out\n";
- print LOG " ``m'' is an article skipped due to headers (-m or -P)\n";
+ print LOG " ``m'' is an article skipped due to headers (-a, -m or -P)\n";
print LOG "\n";
print LOG "Writing to rnews-format output: $rnews\n\n" if $rnews;
}
@@ -743,7 +771,7 @@
my $tx_len = 0; # Transmitted article length (bytes) (for rnews, Bytes:).
my @header_nums_to_go = ();
my $match_all_hdrs = 1; # Assume no headers to match.
- my $skip_due_to_hdrs = 0;
+ my $skip_due_to_hdrs = 0; # Set to 1 if triggered by -P, 2 if by -m, 3 if by -a.
my %m_found_hdrs = ();
my $curr_hdr = '';
@@ -894,9 +922,22 @@
}
}
+ if (not $skip_due_to_hdrs and ref $hashfeed) {
+ my $hash_val = unpack('N', substr(md5($msgid), 12-$hashfeed->{'offset'}, 4)) % $hashfeed->{'modulus'} + 1;
+ $skip_due_to_hdrs = 3 if $hash_val < $hashfeed->{'low'} or $hash_val > $hashfeed->{'high'};
+ }
+
$pulled->{$server}->{$group}++;
if ($skip_due_to_hdrs) {
+ if ($debug >= 2) {
+ print LOG "\tDEBUGGING $i\tskip_art: " .
+ ($skip_due_to_hdrs == 1 ? 'hopsPath'
+ : ($skip_due_to_hdrs == 2 ? 'hdr'
+ : ($skip_due_to_hdrs == 3 ? 'hashfeed'
+ : 'unknown'))) .
+ "\n";
+ }
print LOG "m" unless $quiet;
} elsif ($rnews) {
printf RNEWS "#! rnews %d\n", $tx_len;
More information about the inn-committers
mailing list