INN commit: branches/2.5 (4 files)

INN Commit rra at isc.org
Wed Jul 20 18:46:27 UTC 2011


    Date: Wednesday, July 20, 2011 @ 11:46:27
  Author: iulius
Revision: 9285

improve scripts to send Path: statistics

Add two flags to sendinpaths:  -k and -r permit to control the interval
of days for processing dump files.
It will allow a proper generation of daily statistics.

Also fixed an issue with statistics that could be missing for a couple of
days when monthly sent.

Improve our documentation.

Modified:
  branches/2.5/backends/sendinpaths.in
  branches/2.5/doc/pod/news.pod
  branches/2.5/doc/pod/ninpaths.pod
  branches/2.5/doc/pod/sendinpaths.pod

-------------------------+
 backends/sendinpaths.in |   93 ++++++++++++++++++++++++++++++++++++----------
 doc/pod/news.pod        |    9 ++++
 doc/pod/ninpaths.pod    |   50 +++++++++++++++---------
 doc/pod/sendinpaths.pod |   28 ++++++++++---
 4 files changed, 136 insertions(+), 44 deletions(-)

Modified: backends/sendinpaths.in
===================================================================
--- backends/sendinpaths.in	2011-07-18 22:41:23 UTC (rev 9284)
+++ backends/sendinpaths.in	2011-07-20 18:46:27 UTC (rev 9285)
@@ -1,44 +1,97 @@
 #!/bin/sh
 # fixscript will replace this line with code to load innshellvars
 #
-# Submit path statistics based on ninpaths
+# Submit path statistics based on ninpaths.
 # $Id$
 
-# Assuming the ninpaths dump files are in ${MOST_LOGS}/path/inpaths.%d
+# Assuming the ninpaths dump files are in ${MOST_LOGS}/path/inpaths.%d files.
 
 cd ${MOST_LOGS}/path
 ME=`${NEWSBIN}/innconfval pathhost`
-report=30
-keep=14
-TMP=""
-defaddr="pathsurvey at top1000.org top1000 at anthologeek.net"
 
+USAGE="Usage: sendinpaths [-n] [-k keep-days] [-r report-days] [address [address ...]]"
+NOMAIL=false
+MAILTO=""
+DEFAULTMAILTO="pathsurvey at top1000.org top1000 at anthologeek.net"
+
+# Default to report up to 32 days (ideal for monthly statistics).  It works fine
+# for daily stats too because already processed dump files are deleted by default
+# (0 day of kept articles).
+REPORT=32
+KEEP=0
+NINPATHS_ARGS=""
+
+# Parse command-line arguments.
+while [ $# -gt 0 ]
+do
+  case "$1" in
+  -k)
+    case "$2" in
+    *[^0-9]*)
+      echo "Argument to -k flag must be an integer."
+      exit 1
+      ;;
+    esac
+    KEEP=$2
+    shift
+    ;;
+  -n)
+    NOMAIL=true
+    ;;
+  -r)
+    case "$2" in
+    *[^0-9]*)
+      echo "Argument to -r flag must be an integer."
+      exit 1 
+      ;;
+    esac
+    REPORT=$2
+    shift
+    ;;
+  -*)
+    echo $USAGE
+    exit 1
+    ;;
+  *)
+    MAILTO="${MAILTO} $1"
+    ;;
+  esac
+  shift
+done
+
 # Renice to give other processes priority, since this isn't too important.
 renice 20 -p $$ > /dev/null 2>&1
 
-# Make report from (up to) $report days of dumps
-LOGS=`find . -name 'inpaths.*' ! -size 0 -mtime -$report -print`
+# Make report from (up to) $REPORT days of dumps.
+LOGS=`find . -name 'inpaths.*' ! -size 0 \( -mtime -${REPORT} -o -mtime ${REPORT} \) -print`
 if [ -z "$LOGS" ] ; then
-  echo "No data has been collected this month!"
+  echo "No data has been collected since the last run of this script!"
   exit 1
 fi
 
-# for check dumps
+# Process dumps.
 for i in $LOGS
 do
- ninpaths -u $i -r $ME > /dev/null 2>&1
- if test $? -eq 0; then :
-  TMP="$TMP -u $i"
- fi
+  ninpaths -u ${i} -r ${ME} > /dev/null 2>&1
+  if test $? -eq 0 ; then
+    NINPATHS_ARGS="${NINPATHS_ARGS} -u ${i}"
+  else
+    echo "Skipping unrecognized inpaths file ${i}"
+  fi
 done
 
-if [ "$1" = "-n" ] ; then
-  ninpaths $TMP -r $ME
+if [ -z "${NINPATHS_ARGS}" ] ; then
+  echo "No valid data has been collected since the last run of this script!"
+  exit 1
+fi
+
+if [ "${NOMAIL}" = "true" ] ; then
+  ninpaths ${NINPATHS_ARGS} -r ${ME}
 else
-  ninpaths $TMP -r $ME |\
-   $MAILCMD -s "inpaths $ME" ${1:-$defaddr}
-  # remove dumps older than $keep days
-  find . -name 'inpaths.*' -mtime +$keep -exec rm '{}' \;
+  ninpaths ${NINPATHS_ARGS} -r ${ME} |\
+    ${MAILCMD} -s "inpaths ${ME}" ${MAILTO:-$DEFAULTMAILTO}
+  # Remove dumps older than $KEEP days.
+  find . -name 'inpaths.*' \( -mtime +${KEEP} -o -mtime ${KEEP} \) -exec rm '{}' \;
 fi
 
 exit 0

Modified: doc/pod/news.pod
===================================================================
--- doc/pod/news.pod	2011-07-18 22:41:23 UTC (rev 9284)
+++ doc/pod/news.pod	2011-07-20 18:46:27 UTC (rev 9285)
@@ -48,6 +48,15 @@
 
 =item *
 
+It is now possible to properly generate daily statistics with B<sendinpaths>
+thanks to the new B<-k> and B<-r> flags that permit to control the interval
+of days for processing dump files.
+
+Also fixed an issue with statistics that could be missing for a couple of
+days when monthly sent.
+
+=item *
+
 B<cnfsheadconf> now properly recognizes continuation lines in
 F<cycbuff.conf>, that is to say lines ending with a backslash (C<\>).
 Thanks to John S<F. Morse> for the bug report.

Modified: doc/pod/ninpaths.pod
===================================================================
--- doc/pod/ninpaths.pod	2011-07-18 22:41:23 UTC (rev 9284)
+++ doc/pod/ninpaths.pod	2011-07-20 18:46:27 UTC (rev 9285)
@@ -19,11 +19,11 @@
 into the report.  The purpose of the final report is to summarize the
 frequency of occurrence of sites in the Path: headers of articles.
 
-Some central sites accumulate the Path: data from many news servers running
-this program or one like it, and then report statistics on the most
-frequently seen news servers in Usenet article Path: lines.  The
-B<sendinpaths> shell script can be run once a month to mail the
-accumulated statistics to such a site and remove the old dump files.
+Some central sites accumulate the Path: data from many news servers
+running this program or one like it, and then report statistics on
+the most frequently seen news servers in Usenet article Path: lines.
+The B<sendinpaths> shell script can be run daily to mail the accumulated
+statistics to such a site and remove the old dump files.
 
 You can get a working setup by doing the following:
 
@@ -33,31 +33,45 @@
 
 Create a directory at I<pathlog>/path (replacing I<pathlog> here and in
 all steps that follow with the full path to your INN log directory).
+Do not change the name of the C<path> subdirectory because it is used
+by B<sendinpaths>.
 
 =item 2.
 
 Set up a channel feed using an entry like:
 
-    inpaths!:*:Tc,WP:ninpaths -p -d <pathlog>/path/inpaths.%d
+    inpaths!:*:Tc,WP:<pathbin>/ninpaths -p -d <pathlog>/path/inpaths.%d
 
 if your version of INN supports C<WP> (2.0 and later all do).  Replace
+<pathbin> with the full path to your INN binaries directory, and
 <pathlog> with the full path to your INN log directory.
 
 =item 3.
 
-Enter into your news user crontab something like:
+Run the following command to start logging these statistics:
 
-    6 6 * * *   ctlinnd flush inpaths!
+    ctlinnd reload newsfeeds 'inpaths feed setup'
 
+=item 4.
+
+Enter into your news user crontab these two lines:
+
+    6   6 * * *   <pathbin>/ctlinnd flush inpaths!
+    10  6 * * *   <pathbin>/sendinpaths
+
 (the actual time doesn't matter).  This will force B<ninpaths> to generate
-a dump file once a day.
+a dump file once a day.  Then, a few minutes later, B<sendinpaths> collects
+the dumps, makes a report, sends the collected statistics, and deletes
+the old dumps.
 
-=item 4.
+Note that you can manually generate a report without mailing it, and
+without deleting processed dump files, with C<sendinpaths -n>.
 
-Once per month, run the B<sendinpaths> script, which collects the dumps,
-makes a report, and then deletes the old dumps.  (You can generate a
-report without mailing it and without deleting it with C<sendinpaths -n>.)
+=item 5.
 
+In a couple of days, check that your daily statistics properly appear in
+L<http://www.top1000.org/>.
+
 =back
 
 =head1 OPTIONS
@@ -88,8 +102,8 @@
 
 =item B<-v> I<level>
 
-Set the verbosity level of the report.  Valid values for I<level> are 0,
-1, and 2, with 2 being the default.
+Set the verbosity level of the report.  Valid values for I<level> are C<0>,
+C<1>, and C<2>, with C<2> being the default.
 
 =back
 
@@ -139,14 +153,14 @@
 If your INN doesn't have the C<WP> feed flag (1.5 does not, 1.6 and 1.7 do,
 2.0 and later all do), use the following F<newsfeeds> entry:
 
-   inpaths!:*:Tc,WH:ginpaths
+   inpaths!:*:Tc,WH:<pathbin>/ginpaths
 
 where B<ginpaths> is the following script:
 
     #!/bin/sh
-    exec egrep '^Path: ' | ninpaths -p -d <pathlog>/path/inpaths.%d
+    exec egrep '^Path: ' | <pathbin>/ninpaths -p -d <pathlog>/path/inpaths.%d
 
-replacing <pathlog> as above.
+replacing <pathbin> and <pathlog> as above.
 
 =head1 HISTORY
 

Modified: doc/pod/sendinpaths.pod
===================================================================
--- doc/pod/sendinpaths.pod	2011-07-18 22:41:23 UTC (rev 9284)
+++ doc/pod/sendinpaths.pod	2011-07-20 18:46:27 UTC (rev 9285)
@@ -4,12 +4,13 @@
 
 =head1 SYNOPSIS
 
-B<sendinpaths> [B<-n> | "I<address> [I<address> ...]"]
+B<sendinpaths> [B<-n>] [B<-k> I<keep-days>] [B<-r> I<report-days>]
+[I<address> [I<address> ...]]
 
 =head1 DESCRIPTION
 
 B<sendinpaths> checks I<pathlog>/path for B<ninpaths> dump files, finds
-dump files generated in the past 30 days, makes sure they are valid
+dump files generated in the past I<report-days> days, makes sure they are valid
 by running B<ninpaths> on each one and making sure the exit status is
 zero, and passes them to B<ninpaths> to generate a cumulative report.
 By default, that report is mailed to the e-mail addresses configured at
@@ -18,7 +19,7 @@
 useful statistics:  see L<http://www.top1000.org/> for more information.
 
 When finished, B<sendinpaths> deletes all dump files in I<pathlog>/path
-that are older than 14 days (configurable at the beginning of the script).
+that are older than I<keep-days> days.
 
 For more information on how to set up B<ninpaths>, see ninpaths(8).
 
@@ -26,21 +27,36 @@
 
 =over 4
 
+=item B<-k> I<keep-days>
+
+After having processed dump files, B<sendinpaths> removes those that are
+older than I<keep-days> days.  The default is C<0>, that is to say to
+remove all dump files.
+
+Setting I<keep-days> to another value can be useful for debugging purpose
+because it permits to keep a few dump files.
+
 =item B<-n>
 
 Don't e-mail the report; instead, just print it to standard output.  Don't
 delete old dump files.
 
+=item B<-r> I<report-days>
+
+Process dump files generated during the last I<report-days> days.
+The default is C<32>, that is to say to process all the dump files that
+have been generated during the last 32 days (if, of course, they have
+not been deleted yet by a previous run of B<sendinpaths> according to the
+value set by the B<-k> flag).
+
 =item I<address> ...
 
 E-mail the report to the mentioned addresses instead of the default ones.
 Several addresses can be used, separated by whitespace.  For instance,
 for two adresses:
 
-    sendinpaths "pathsurvey at top1000.org top1000 at anthologeek.net"
+    sendinpaths pathsurvey at top1000.org top1000 at anthologeek.net
 
-The quotes can be omitted when only an address is specified.
-
 =back
 
 =head1 HISTORY




More information about the inn-committers mailing list