Nonblocking I/O and POLL_BUG

Per Hedeland per at erix.ericsson.se
Tue Oct 26 01:45:00 UTC 1999


Russ Allbery wrote:
>
>Per Hedeland <per at erix.ericsson.se> writes:
>> The semantics of O_NDELAY on SysV-heritage systems, that is (broken as
>> usual:-).
>
>Well, that's where O_NDELAY originated, so I think it's safe to say that
>those are the canonical semantics.

OK, I guess BSD may have "fixed" that too.:-)

>  I know that some systems, like Linux,
>have made O_NDELAY synonymous with O_NONBLOCK, though.

As had SunOS4, before there was an O_NONBLOCK:-) - current BSD systems
(well, I only checked FreeBSD) also #define O_NDELAY O_NONBLOCK - makes
sense to me, as no-one could possibly want the original semantics.

>Saying that it's a bug implies that returning ready for read from select()
>is not only a guarantee that there will be data available when the
>application reads from the socket, but that this guarantee lasts for some
>indefinite period of time into the future.  I think that's pretty strong,
>and I don't know of any standard which actually makes that guarantee.

It doesn't guarantee that there will be data available, it guarantees
that a blocking read will not block (remember EOF) - if it can't give
that guarantee, it is useless. And if a blocking read would not have
blocked, a non-blocking one should not give EAGAIN (or EWOULDBLOCK).

But in the case of data available, that data was sitting in a kernel
buffer when select/poll returned, where could it have gone? And only
temporarily??? I.e. if the communication channel is still open, and the
data has disappeared, we're in *big* trouble. If it's closed, and the
data has disappeared (at least HTTP is in big trouble:-), we should get
EOF. There is simply no justification for ever returning EAGAIN on that
read (unless you did some other manipulation of the descriptor in
between, but of course you didn't).

>But regardless, it doesn't actually work that way on Solaris, which under
>at least versions 2.4 through 2.6 inclusive read() will sometimes return
>EAGAIN even if the socket had previously selected for reading.

Yep, and it's a bug. As are the other spurious EAGAINs it returns. The
question is just whether we need to worry about getting in a tight
poll/EAGAIN loop on some versions.

>> Likewise if that read returns 0 without a "real" EOF condition existing.
>> Plus of course per above one has to wonder why one on Solaris would
>> *ever* get EAGAIN on read() when using O_NDELAY to set non-blocking
>> mode...
>
>O_NDELAY was defined to return EAGAIN.  Maybe you're thinking of the BSD
>FNDELAY, which isn't necessarily the same thing, and which is defined to
>return EWOULDBLOCK?

>I should clarify that:  O_NDELAY, at least as I understand it, was defined
>to return EAGAIN on writes and 0 on reads.

Exactly. So why would Solaris (which should follow the SysV semantics of
course) ever return EAGAIN on read() when you had requested non-blocking
mode with O_NDELAY? It should return either data or 0, of course. But it
*likes* to return EAGAIN.:-) Actually if you run the test program,
you'll see that current Solaris (I ran it on 2.6 now) still has the same
bizarre O_NDELAY semantics... - but who cares.

>Your table says to me that we really want an autoconf test.  Do you still
>have your testing code available?

Yes, but I seriously doubt that it's appropriate for autoconf - for one
thing it may take quite a while to run, as it may have to kill various
blocking operations with alarm(1). Also at least in its current form, it
may in theory leave your tty in non-blocking mode (if it fails to set it
back to blocking), which makes many shells decide that it's time to
exit. Plus note the comment about Ultrix...

But as I guess we're not interested in non-blocking ttys, that test
could be stripped out - which would also shave a bit off the time
required. And if we only need to set sockets/pipes non-blocking, and
not back to blocking, it could be stripped further (not much left:-).

My thought was rather that it could be used to verify that the world has
become a saner place though, by running it on a few more current/common
OSes to see that O_NONBLOCK does indeed work. But anyway, it's enclosed
below, feel free to use and mangle it... - I can't promise that it will
actually run on any of those current/common OSes, of course!:-)

--Per


nonblock.c---------------------------------------------------------
/*
   Program to test three methods of setting a file descriptor in
   non-blocking mode (and back to blocking mode), each method is tested
   with a file descriptor refering to a pipe, a socket, and a tty
   (*your* tty!:-). *All* output is written to std*err*.

   If you don't have sockets, compile with -DNO_SOCKETS. If you don't
   have the (POSIX) 'termios' functions, compile with -DNO_TERMIOS; if
   you don't have the (SysV) 'termio' ioctls either, add -DNO_TERMIO -
   it is then assumed that you have the BSD ioctls.

   WARNING: When I ran this program on Ultrix 4.3, the system became
   totally hung, requiring a power-cycle to reboot, when the program
   completed (it managed to produce all the wanted output first
   though:-). No problems on other systems so far...

   Per Hedeland  <per at erix.ericsson.se>  1994-01-13
*/

#include <stdio.h>
#include <setjmp.h>
#include <signal.h>
#include <sys/types.h>
#include <fcntl.h>
#include <sys/ioctl.h>
/* seems it's not a good idea to include both ioctl.h and termio*.h
   on some systems, but we need to... */
#undef NL0
#undef NL1
#undef CR0
#undef CR1
#undef CR2
#undef CR3
#undef TAB0
#undef TAB1
#undef TAB2
#undef XTABS
#undef BS0
#undef BS1
#undef FF0
#undef FF1
#undef ECHO
#undef NOFLSH
#undef TOSTOP
#undef FLUSHO
#undef PENDIN
#undef CEOT
#undef CEOF
#undef CSTART
#undef CSTOP
#ifndef NO_SOCKETS
#   include <sys/socket.h>
#   include <netinet/in.h>
#endif
#include <errno.h>

/* stuff needed to suspend/flush output to a tty */
#ifndef NO_TERMIOS
#   include <unistd.h>
#   include <termios.h>
#   define out_off(fd) tcflow((fd), TCOOFF)
#   define out_flush(fd) tcflush((fd), TCOFLUSH)
#   define out_on(fd) tcflow((fd), TCOON)
#else
#   ifndef NO_TERMIO
#	include <sys/termio.h>
#	ifndef TCOOFF
#	    define TCOOFF 0
#	endif
#	ifndef TCOON
#	    define TCOON 1
#	endif
#	ifndef TCOFLUSH
#	    define TCOFLUSH 1
#	endif
#	define out_off(fd) ioctl((fd), TCXONC, TCOOFF)
#	define out_flush(fd) ioctl((fd), TCFLSH, TCOFLUSH)
#	define out_on(fd) ioctl((fd), TCXONC, TCOON)
#   else
#	include <sys/file.h>
static int fwrite = FWRITE;
#	define out_off(fd) ioctl((fd), TIOCSTOP, 0)
#	define out_flush(fd) ioctl((fd), TIOCFLUSH, &fwrite)
#	define out_on(fd) ioctl((fd), TIOCSTART, 0)
#   endif
#endif

static jmp_buf buf;
#ifndef NO_SOCKETS
static int pid;
static unsigned short port;
#ifndef INADDR_LOOPBACK
#define INADDR_LOOPBACK (u_long)0x7F000001
#endif
#endif

void timeout(sig)
int sig;
{
    longjmp(buf, 1);
}

#if defined(O_NONBLOCK) || defined(O_NDELAY)
#define fcntl_flag(fd, flag, set) ((flags = fcntl((fd), F_GETFL, 0)) < 0 ? \
				   flags : \
				   fcntl((fd), F_SETFL, (set) ? \
					 (flags | (flag)) : (flags & ~(flag))))
#endif

#ifdef O_NONBLOCK
o_nonblock(fd, set)
int fd, set;
{
    int flags;

    return fcntl_flag(fd, O_NONBLOCK, set);
}
#endif

#ifdef O_NDELAY
o_ndelay(fd, set)
int fd, set;
{
    int flags;

    return fcntl_flag(fd, O_NDELAY, set);
}
#endif

#ifdef FIONBIO
fionbio(fd, set)
int fd, set;
{
    int one = 1, zero = 0;

    return ioctl(fd, FIONBIO, set ? &one : &zero);
}
#endif

main()
{
#ifndef NO_SOCKETS
    if (serve() < 0)
	pid = -1;
#endif

#ifdef O_NONBLOCK
    try("fcntl O_NONBLOCK", o_nonblock);
#endif

#ifdef O_NDELAY
    try("fcntl O_NDELAY", o_ndelay);
#endif

#ifdef FIONBIO
    try("ioctl FIONBIO", fionbio);
#endif

quit(0);
}

try(name, func)
char *name;
int (*func)();
{
    int p[2], s, ret;

    fprintf(stderr, "Trying %s...\n", name);

    fprintf(stderr, "=== pipe ===\n");
    if (pipe(p) < 0) {
	perror("pipe");
    } else {
	(void) doit(p[0], func, 1);
	(void) doit(p[1], func, 0);
	close(p[0]); close(p[1]);
    }

#ifndef NO_SOCKETS
    fprintf(stderr, "=== socket ===\n");
    if ((s = conn()) > 0) {
	(void) doit(s, func, 1);
	(void) doit(s, func, 0);
	/* Interactive blocks on the close... */
	signal(SIGALRM, timeout);
	alarm(1);
	if (setjmp(buf) == 0)
	    close(s);
	alarm(0);
    }
#endif

    if (isatty(0)) {
	fprintf(stderr, "=== tty ===\n");
	if ((ret = doit(0, func, 1)) > -100 && isatty(1)) {
	    sleep(1);		/* drain output */
	    ret = doit(1, func, 0);
	}
	if (ret <= -100) {
	    fprintf(stderr, "Giving up - hope your shell doesn't...:-)\n");
	    sleep(5);
	    quit(1);
	}
    }
}

doit(fd, func, in)
int fd, in;
int (*func)();
{
    int ret, err, rr;
    char c, fill[8192];

    if ((ret = (*func)(fd, 1)) < 0) {
	perror("set non-blocking");
	return ret;
    }
    signal(SIGALRM, timeout);
    alarm(1);
    if (setjmp(buf) == 0) {
	if (in) {
	    errno = 0;
	    ret = read(fd, &c, 1);
	} else {
	    if (isatty(fd)) {
		if ((ret = out_off(fd)) < 0) {
		    perror("stop output");
		    return ret;
		}
	    }
	    while (write(fd, fill, 8192) > 0)
		;
	    /* OSF/1 managed to squeeze in another byte after the above, so: */
	    while (write(fd, fill, 1) > 0)
		;
	    errno = 0;
	    ret = write(fd, &c, 1);
	}
	err = errno;
	alarm(0);
	if ((rr = fix_out(in, fd)) < 0)
	    return 100 * rr;
	fprintf(stderr, "%s returned %d%s errno = ",
		in ? "read" : "write", ret, ret < 0 ? "," : "!!! -");
	if (err == EAGAIN) fprintf(stderr, "EAGAIN ");
#ifdef EWOULDBLOCK
	if (err == EWOULDBLOCK) fprintf(stderr, "EWOULDBLOCK ");
#endif
	fprintf(stderr, "(%d", err);
	errno = err;
	perror(")");
    } else {
	if ((rr = fix_out(in, fd)) < 0)
	    return 100 * rr;
	fprintf(stderr, "%s blocked!!!\n", in ? "read" : "write");
    }

    if ((ret = (*func)(fd, 0)) < 0) {
	perror("set blocking");
	return 100 * ret;
    }
    if (!in && isatty(fd))	/* Too much work to verify... */
	return 0;
    signal(SIGALRM, timeout);
    alarm(1);
    if (setjmp(buf) == 0) {
	if (in)
	    (void) read(fd, &c, 1);
	else
	    while(write(fd, fill, 8192) > 0)
		;
	alarm(0);
	fprintf(stderr, "failed to set blocking!!!\n");
	return -100;
    }
    return 0;
}

#ifndef NO_SOCKETS
conn()
{
    int s;
    struct sockaddr_in inaddr;
    int addrlen = sizeof inaddr;

    memset((char *)&inaddr, '\0', addrlen);
    inaddr.sin_family = AF_INET;
    inaddr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
    inaddr.sin_port = port;
    if ((s = socket(inaddr.sin_family, SOCK_STREAM, 0)) < 0) {
	perror("socket");
	return -1;
    }
    if (connect(s, (struct sockaddr *)&inaddr, addrlen) < 0) {
	perror("connect");
	return -1;
    }
    return s;
}

serve()
{ 
    int s;
    struct sockaddr_in inaddr;
    int addrlen = sizeof inaddr;
   
    memset((char *)&inaddr, '\0', addrlen);
    inaddr.sin_family = AF_INET;
    inaddr.sin_addr.s_addr = htonl(INADDR_ANY);
    if ((s = socket(inaddr.sin_family, SOCK_STREAM, 0)) < 0) {
	perror("socket");
	return -1;
    }
    if (bind(s, (struct sockaddr *)&inaddr, addrlen) < 0) {
	perror("bind");
	return -1;
    }
    if (getsockname(s, (struct sockaddr *)&inaddr, &addrlen) < 0) {
	perror("getsockname");
	return -1;
    }
    port = inaddr.sin_port;
    if ((pid = fork()) < 0) {
	perror("fork");
	return -1;
    }
    if (pid != 0) {		/* parent */
	close(s);
	return 0;
    }
    /* child */
    signal(SIGALRM, SIG_DFL);
    alarm(100);			/* timeout in case parent crashes */
    if (listen(s, 1) < 0) {
	perror("listen");
	exit(1);
    }
    while(1) {
	if (accept(s, (struct sockaddr *)0, (int *)0) < 0)
	    perror("accept");
    }
}
#endif

fix_out(in, fd)
int in, fd;
{
    int ret;
    
    if (!in && isatty(fd)) {
	if ((ret = out_flush(fd)) < 0) {
	    perror("flush output");	/* XXX */
	    return ret;
	}
	if ((ret = out_on(fd)) < 0) {
	    perror("start output");	/* XXX */
	    return ret;
	}
    }
    return 0;
}

quit(c)
int c;
{
#ifndef NO_SOCKETS
    if (pid > 0)
	kill(pid, SIGTERM);
#endif
    exit(c);
}


More information about the inn-workers mailing list