shared file pointer problem with nnrpd in daemon mode

Heiko Schlichting inn-bugs at fu-berlin.de
Mon May 8 16:24:27 UTC 2000


Hi,

since serveral months we use an older INN 2.3 snapshot on our reader server
News.CIS.DFN.DE with CNFS and nnrpd in daemon mode. Before we started with
our 30.000 registered users, I made a lot of tests and some changes until I
noticed no more problems.

Right after starting the production, I noticed a error messages in syslog:
"...could not match article size token..." produced by cnfs_retrieve(). The
number of messages seems to be direct related to the number of clients on
our server. With few (50 or less) clients no error messages appear at all,
which might be the reason why my tests never matched this condition.  On our
production server with about 1,000 simultaneous clients we noticed more than
50,000 error messages per days.

One big problem to find the reason was that the tokens of the error messages
always differ and the errors, which resulted in 'article not available'
responses to the user, are not reproducable at all. Requesting a specific
article was sometimes successful (>99%) but sometimes not. The latter
case appears more often with many active nnrpd processes. It does never
appear when I start nnrpd in my debugging environment (SGI CaseVision).

After a huge amount of bug tracking, I noticed the following:
The sequence on seek and read in cnfs_retrieve()...

    if (CNFSseek(cycbuff->fd, offset, SEEK_SET) < 0) { [...]
    }
    if (read(cycbuff->fd, &cah, sizeof(cah)) != sizeof(cah)) { [...]
    }

...does not work properly in all cases. The read() just gets data of a
wrong position of the correct CNFS buffer. So I started with a loop around
the seek+read and try to seek more than one time to the same position if
the mentioned error condition appears. Against my expectations this has an
effect: the articles can be read in the second or third try.

As I'm sure that seek() and read() aren't broken on my operating system
(IRIX 6.5), I continued debugging:

If nnrpd is started in daemon mode (and only then) and two or more nnrpd
processes try to access articles in the same CNFS buffer simultaneously
there are conflicts which cause article loss for the reader. The problem
seems to be the opening of the CNFS buffer, which are done in SetupDaemon()
*before* the daemon forks.

Marc J. Rochkind, "Advanced Unix Programming", 1985:
|
| 5.4 fork SYSTEM CALL
| [...]
| - The child gets copies of the parent's open file descriptors. Each is
|   opened to the same file, and the file pointer has the same value. The
|   file pointer is shared. If the child changes it with lseek, than the
|   parent's next read or write will be at the new location. The file
|   descriptor itself, however, is distinct: If the child closes it, the
|   parent's copy is undisturbed.

Having shared file pointers for the CNFS buffers of all nnrpd processes
is of course a major problem and I'm surprised that I never noticed any
bug report by anyone else.

The patch below fixed all problems on our server and if someone can
confirm it, it should be applied before releasing INN 2.3. The patch
is against inn-BETA-20000507 and is very small compared to the debugging
effort which was necessary to create it.

Heiko

Heiko Schlichting        | Freie Universität Berlin
heiko at FU-Berlin.DE       | Zentraleinrichtung für Datenverarbeitung (ZEDAT)
Telefon +49 30 838-54327 | Fabeckstraße 32
Telefax +49 30 838-56721 | 14195 Berlin
---------------------------------------------------------------------------

--- nnrpd/nnrpd.c.org	Sun May  7 12:06:10 2000
+++ nnrpd/nnrpd.c	Mon May  8 16:55:52 2000
@@ -880,7 +880,6 @@
 
 	/* Set signal handle to care for dead children */
 	(void)xsignal(SIGCHLD, WaitChild);
-	SetupDaemon();
  
 	TITLEset("nnrpd: accepting connections");
  	
@@ -895,7 +894,6 @@
 	    for (i = 0; (pid = fork()) < 0; i++) {
 		if (i == MAX_FORKS) {
 		    syslog(L_FATAL, "cant fork %m -- giving up");
-		    OVclose();
 		    exit(1);
 		}
 		syslog(L_NOTICE, "cant fork %m -- waiting");
@@ -912,6 +910,7 @@
 	close(fd);
 	dup2(0, 1);
 	dup2(0, 2);
+	SetupDaemon();
 
 	/* if we are a daemon innd didn't make us nice, so be nice kids */
 	if (innconf->nicekids) {



More information about the inn-bugs mailing list