The infamous to-do list
Russ Allbery
rra at stanford.edu
Sat Jun 24 10:09:18 UTC 2000
I've just checked this into CURRENT. There are plenty of unclaimed
projects; volunteers are always welcome. Send mail to inn-workers if you
want to claim anything, and feel free to offer any additions to this list.
This is a rough and informal list of suggested improvements to INN, parts
of INN that need work, and other tasks yet undone. Some of these may be
in progress, in which case the person working on them will be noted in
square brackets and should be contacted if you want to help. Otherwise,
let inn-workers at isc.org know if you'd like to work on any item listed
below.
The list is divided into higher priority changes that will hopefully be
done in the near future, small or medium-scale projects for the future,
and long-term, large-scale problems.
High Priority Projects:
* Rewrite the configuration file parsing code, starting with the parser
for inn.conf and eventually extending to all of the configuration file
parsers in INN. [Russ is actively working on this.]
* Add man pages for the default authenticators, as well as documentation
on the readers.conf external program interface.
* Add documentation of adding and removing CNFS cycbuffs.
* Allow the standardized forms of the XOVER, XPAT, etc. NNTP commands
according to the current draft NNTP standard. They're just synonyms for
the existing commands, so this should be easy to add.
* Complete rewrite of history. History should have a single API and the
same interface for both innd and nnrpd, the current WIP cache and
history cache should be integrated into that interface, things like
message ID hashing should become a selectable property of the history
file, and the history API should support multiple backend storage
formats and automatically select the right one for an existing history
file based on stored metainformation. [Russ is working on this.]
* Better documentation of UUCP feeds. [Russ has some saved articles on
this that could form the basis of documentation.]
* INN shouldn't flush all feeds (particularly all program feeds) on
newgroup or rmgroup. Currently it reloads newsfeeds to reparse all of
the wildmat patterns and rebuild the peer lists associated with the
active file on group changes, and this forces a flush of all feeds.
The best fix is probably to stash the wildmat pattern (and flags) for
each peer when newsfeeds is read and then just using the stashed copy on
newgroup or rmgroup, since otherwise the newsfeeds loading code would
need significant modification. But in general, innd is too
reload-happy; it should be better at making incremental changes without
reloading everything.
* Add authenticated Path support, based on the current USEFOR draft or the
behavior of some other servers (such as Diablo). [Andrew Gierth wrote a
patch for part of this a while back, which Russ has.]
* Various parts of INN are using write or writev; they should all use
xwrite or xwritev instead. Even for writes that are unlikely to ever be
partial, on some systems system calls aren't restartable and xwrite and
xwritev properly handle EINTR returns.
* Some installed binaries and configuration files don't have man pages.
In particular, man pages are needed for actsync.conf, actsync.ign,
innreport.conf, and news2mail.cf.
Future Projects:
* Rewrite configure, breaking all of the tests out into separate files.
This really wants autoconf 2.50 and its additional support for including
tests from separate files, and autoconf 2.50 will introduce a new set of
basic primitives and incorporate a lot of the macros we're already using
so that's a good point at which to do this. At the same time,
configure.in and Makefile.global.in should be fixed to use the same
names as each other for various parameters. [Russ plans to work on this
after autoconf 2.50 is released.]
* Checkgroups processing should also update the newsgroups file, and it
should be possible to synchronize the newsgroups file against external
sources when that's done with the active file. Likewise, when a
newgroup is received for a group the server already carries, the
newsgroups line should be updated if necessary. See
<ftp://ftp.tin.org/pub/news/servers/utils/docheckgroups> for a partial
implementation.
* contrib needs to be reviewed, the best ideas integrated into INN, and
some sort of general overview documentation written for everything else,
at least sufficient to explain what things are for.
* frontends/pullnews and contrib/backupfeed solve the same problem; the
best ideas of both should be unified into one script.
* It may be better for INN on SysV-derived systems to use poll rather than
select. The semantics are better, and on some systems (such as Solaris)
select is limited to 1024 file descriptors whereas poll can handle any
number. Unfortunately, the API is drastically different between the
two and poll isn't portable, so supporting both cleanly would require a
bit of thought.
* Currently only innd and innfeed increase their file descriptor limits.
Other parts of INN, notably makehistory, may benefit from doing the same
thing (and it should really be a library function).
* Consider implementing the HEADERS command as discussed rather
extensively in news.software.nntp. [Greg Andruk has a preliminary
patch.]
* Document the internal formats of the various overview methods, CNFS,
timehash, and timecaf. A lot of this documentation already exists in
various forms, but it needs to be cleaned up and collected in one place,
preferrably as a man page.
* Look through <http://www.visi.com/~barr/INN.html> for stuff that should
be included in INN (in particular, flowstats may be interesting).
* One person wanted to allow access to the news server only to people who
are members of a specific Unix group. ckpasswd could do this as an
option and it would probably be easy enough to add.
* rnews currently rejects articles with lines ending in CRLF, according to
one report. This should be checked, and if true, it should be more
flexible about line endings.
* The Tcl filtering support code has undergone serious bitrot and needs
some work to fix it and make it work with modern versions of Tcl and the
current version of INN. It also lacks a lot of the functionality of the
Perl and Python filters, if anyone cares.
* There have been a few requests for the ability to programmatically set
the subject of the report generated by news.daily, with escapes that are
filled in by the various pieces of information that might be useful.
* A PAM-based authenticator for the readers.conf external authentication
support.
* Currently, if open returns a file descriptor higher than select can
handle (such as on a Solaris system where the maximum file descriptor
limit has been increased above 1024 in /etc/system), INN will crash in a
fairly nasty fashion. It may be possible to check this by comparing
with FD_SETSIZE.
* If backoffdb is set in inn.conf and that directory doesn't exist, nnrpd
refuses to start. Either the directory should be created by the install
process or nnrpd should just create it if it can.
* A bulk cancel command using the MODE CANCEL interface. Possibly through
ctlinnd, although it may be a bit afield of what ctlinnd is currently
for.
* Sven Paulus's patch for nnrpd volume reports should be integrated. See
<ftp://ftp.tin.org/pub/news/servers/inn/unofficial-patches/
patch-inn-2.2.x-artstat+list+overstat>.
* LIST NEWSGROUPS should probably only list newsgroups that are marked in
the active file as valid groups.
* Lots of people encrypt X-Trace in various ways. Should that be offered
as a standard option?
* Validity checks on the poster's address. (Although this could also be
handled by the nnrpd posting filter.)
* There are a whole bunch of places in the INN source that apply wildmats,
possibly comma-separated, to comma-separated lists of newsgroups,
sometimes including poison support and sometimes not. All this code
should be centralized into libinn with the basic wildmat code.
* Get rid of GetTimeInfo and TIMEINFO. All the struct is is a struct
timeval plus time zone information. All of the parts of INN that deal
with time zone information should be isolated in lib/date.c; the only
thing remaining to move there is the parsing of dates given to NEWNEWS
and NEWGROUPS. The rest of INN uses GetTimeInfo where a plain call to
time would often work fine, or at most gettimeofday, and there's no
reason to compute the time zone everywhere. Plus, it makes the code
more readable to use standard functions and data types.
* Revisit support for aliased groups and what nnrpd does with them.
Should posts to the alias automatically be redirected to the real group?
Also, the new overview API, for at least some of the overview methods,
truncated the group status at one character and lost the name of the
group to which a group is aliased; that needs to be fixed.
* Add documentation for slave servers. [Russ has articles from
inn-workers that can be used as a beginning.]
* sendbatch and send-uucp do the same thing. So do send-nntp and
nntpsend, mostly. Might be a good idea to unify them into single
programs (and easier to maintain).
* More details as to *why* a message ID is bad would be useful to return
to the user, particularly for rnews, inews, etc. rnews also reportedly
rejects message IDs with trailing spaces, which can be hard to check.
* Support putting the active file and history file in different
directories without hand-editing a bunch of files.
* ctlinnd flushlogs currently renames all of the log files. It would be
nice to support the method of log rotation that most other daemons
support, namely to move the logs aside and then tell innd to reopen its
log files. Ideally, that behavior would be triggered with a SIGHUP.
scanlogs would have to be modified to handle this.
* Replace all of the temporary file creation code in INN with something
built on a safe temporary file function like mkstemp (or a local
replacement, if the system doesn't have it). [Matus Uhlar was working
on this.]
* innfeed breaks Xref slaving if it ever goes to a backlog, since it then
starts sending articles out of order and most of the overview methods
can't deal with this. There should be a configuration option that would
cause it to spool any new articles if there's a backlog and always
process the backlog in order. [Sven Paulus had a preliminary patch for
this, which Russ has.]
* Several people have Perl interfaces to pieces of INN that should ideally
be part of the INN source tree in some fashion. Greg Andruk has a bunch
of stuff at <http://members.xoom.com/meowing/cssri/>, for example.
* INN's startup script should be sure to clean out old lock files and PID
files for innfeed.
* nnrpd should have support for fixing broken Date headers supplied by
clients, although now that most clients have been fixed for Y2K this may
be less of a problem.
* It's been reported that innd doesn't deal well with syntax violations in
incoming.conf and doesn't correctly report the problems (and inncheck
doesn't catch them). Some of this may be fixed with a new configuration
parsing infrastructure.
* Various things may break when trying to use data written while compiled
with large file support using a server that wasn't so compiled (and vice
versa). The main one is the history file, but also reportedly affected
is the buffindexed (and probably the tradindexed) overview method.
* makedbz should be more robust in the presence of malformed history
lines, discarding with them or otherwise dealing with them.
* CNFS, if the cycbuff is larger than 2GB and it doesn't have large file
support, reports a mysterious file not found error because it assumes
all errors from stat are the result of the cycbuff not being found.
* nnrpd's NNTP command parsing interacts poorly with AUTHINFO and
passwords containing spaces. The correct solution isn't clear; check
with the current NNTP RFC draft?
* Some servers reject some IHAVE, TAKETHIS, or CHECK commands with 500
syntax errors (particularly for long message IDs), and innfeed doesn't
handle this particularly well at the moment. It really should have an
error handler for this case. [Sven Paulus has a preliminary patch that
needs testing.]
Long-Term Projects:
* Completely rewrite the header parsing and turn it into a library so that
all the various parts of INN that have to parse headers (innd, nnrpd,
inews, rnews, controlchan, etc.) can all use the same code. This will
probably require a dynamic string library. [Russ has the beginnings of
a suitable dynamic string library, but it needs more work and a test
suite.]
* The interface to embedded filters needs to be reworked. The information
about which filters are enabled should be isolated in the filtering API,
and there should be standard API calls for filtering message IDs, remote
posts, and local posts. As part of this revision, all of the Perl
callbacks should be defined before any of the user code is loaded, and
the Perl loading code needs considerable cleanup. [Russ is planning on
working on this at some point.]
* Add authentication via SASL to nnrpd. This is a boatload of additional
issues, particularly if we want to add authentication methods like
Kerberos that require their own separate libraries. Best to start with
just the basic framework and the required authentication type and then
see what other people contribute.
* When articles expire out of a storage method with self-expire
functionality, the overview and history entries for those articles
should also be expired immediately. Otherwise, things like the GROUP
command don't give the correct results. This will likely require a
callback that can be passed to CNFS that is called to do the overview
and history cleanup for each article overwritten. It will also require
the new history API.
* Feed control, namely allowing your peers to set policy on what articles
you feed them (not just newsgroups but max article size and perhaps even
filter properties like "non-binary"). Every site does this a bit
differently. Some people have web interfaces, some people use GUP, some
people roll their own alternate things. It would really be nice to have
some good way of doing this as part of INN. It's worth considering an
NNTP extension for this purpose, although the first step is to build a
generic interface that an NNTP extension, a web page, etc. could all
use. (An alternate way of doing this would be to extend IHAVE to pass
the list of newsgroups as part of the command, although this doesn't
seem as generally useful.)
* Traffic classification as an extension of filtering. The filter should
be able to label traffic as binary (e.g.) without rejecting it, and
newsfeeds should be extended to allow feeding only non-binary articles
(e.g.) to a peer.
* The interface between nnrpd and the external authenticators really
should be wrapped into a library with a standard API for simplicity of
writing authenticators.
* External authenticators should also be able to do things like return a
list of groups that a person is allowed to read or post to. Currently,
maintaining a set of users and a set of groups, each of which some
subset of the users is allowed to access, is far too difficult. For a
good starting list of additional functionality that should be made
available, look at everything the Perl authentication hooks can do.
* Allow nnrpd to spawn long-running helper processes. Not only would this
be useful for handling authentication (so that the auth hooks could work
without execing a program on every connection), but it may allow for
other architectures for handling requests (such as a pool of helpers
that deal only with overview requests). [Aidan Culley has ideas along
these lines.]
* The tradspool storage method requires assigning a number to every
newsgroup (for use in a token). Currently this is maintained in a
separate tradspool.map file, but it would be much better to keep that
information in the active file where it can't drop out of sync. A code
assigned to each newsgroup would be useful for other things as well,
such as hashing the directories for the tradindexed overview. For use
for that purpose, though, the active file would have to be extended to
include removed groups, since they'd need to be kept in the active file
to reserve their numbers until the last articles expired.
* INN really should be capable of both sending and receiving a
headers-only feed (or even an overview-only feed) similar to Diablo and
using it for the same things that Diablo does, namely clustering,
pull-on-demand for articles, and the like. This should be implementable
as a new backend, although the API may need a few more hooks. Both a
straight headers-only feed that only pulls articles down via NNTP from a
remote server and a caching feed where some articles are pre-fed, some
articles are pulled down at first read, and some articles are never
stored locally should be possible.
* The locking of the active file leaves something to be desired; in
general, the locking in INN (for the active file, the history file,
spool updates, overview updates, and the like) needs a thorough
inspection and some cleanup. A good place to start would be tracing
through the pause and throttle code and write up a clear description of
what gets locked where and what is safely restarted and what isn't.
* The proliferation of configuration files should be significantly
reduced. For example, cycbuff.conf, buffindexed.conf, and storage.conf
could probably be combined; innfeed.conf, newsfeeds, and incoming.conf
would ideally be combined; and several of the other small auxilliary
configuration files could be rolled into other, more general
configuration files. This probably shouldn't be done until the new
configuration parsing infrastructure is in place.
--
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
More information about the inn-workers
mailing list