Nonblocking I/O and POLL_BUG

Tue Oct 26 02:25:49 UTC 1999

Per Hedeland <per at erix.ericsson.se> writes:

> It doesn't guarantee that there will be data available, it guarantees
> that a blocking read will not block (remember EOF) - if it can't give
> that guarantee, it is useless.

Hm, well, not useless, as we're all still using select in places where
that clearly isn't the guarantee.  :)  But yes, I do understand what
you're saying.

> But in the case of data available, that data was sitting in a kernel
> buffer when select/poll returned, where could it have gone? And only
> temporarily???

I'm wondering if Solaris is doing something like triggering the select
when it gets the first of a sequence of partial packets and then returning
EAGAIN when it hasn't gotten enough to reassemble, or something odd like
that.  Not sure.  Or maybe there's some sort of system resource exhaustion
in the TCP code itself that's setting it off.

read(2) sez:

     EAGAIN     Total amount of system memory available when reading using
                raw I/O is temporarily insufficient.

     EAGAIN     No message is waiting to be read on a stream and O_NDELAY
                or O_NONBLOCK was set.

> Yep, and it's a bug. As are the other spurious EAGAINs it returns. The
> question is just whether we need to worry about getting in a tight
> poll/EAGAIN loop on some versions.

I'm *pretty* sure not, sure enough that I'd like to throw it into INN 2.3
(which is in testing after all) and see if it dies anywhere.  After all,
this code was activated as early as Solaris 2.4, which seems to have the
worst problem, and it was fine there.  If anyone is using a version of
Solaris earlier than 2.4 (other than SunOS), well, I'm very sorry for
them.  :)

> Actually if you run the test program, you'll see that current Solaris (I
> ran it on 2.6 now) still has the same bizarre O_NDELAY semantics... -
> but who cares.

I'll try that; I whipped up a simpler test program that didn't reveal a
problem, which is interesting.

> Yes, but I seriously doubt that it's appropriate for autoconf - for one
> thing it may take quite a while to run, as it may have to kill various
> blocking operations with alarm(1). Also at least in its current form, it
> may in theory leave your tty in non-blocking mode (if it fails to set it
> back to blocking), which makes many shells decide that it's time to
> exit. Plus note the comment about Ultrix...

> But as I guess we're not interested in non-blocking ttys, that test
> could be stripped out - which would also shave a bit off the time
> required. And if we only need to set sockets/pipes non-blocking, and not
> back to blocking, it could be stripped further (not much left:-).

I think we can strip it, or merge it with mine, and get something that
will work okay for an autoconf test.  Or we can just decide that we don't
care and bail and just use POSIX.  I think our currently most-limiting
portability factor is that 2.3 currently requires fcntl() range locking
to compile.

Here are my results:

                   O_NONBLOCK         O_NDELAY
Solaris 2.6          EAGAIN            EAGAIN
HP-UX 11.00          EAGAIN               0
Digital Unix 4.0B    EAGAIN            EAGAIN
Linux 2.0.x          EAGAIN            EAGAIN
AIX 4.1              EAGAIN            EAGAIN
IRIX 6.5             EAGAIN            EAGAIN

So of a cross-section of fairly recent operating systems, only HP-UX 11.00
still supports the original O_NDELAY semantics.  Here's my test code,
which only tests read, not write.  I think it's fairly suitable for
autoconf as it is, if I remove the non-error output and add an alarm()
call in the parent and child to generate EINTR on the read() if it takes
too long.

None of the SunOS machines we have around still have gcc installed, and we
don't have any Ultrix systems any more.  If anyone out there has one
available and would be willing to run the below program there and let me
know what it does, I'd appreciate it.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <errno.h>

void
die(char *msg)
{
    perror(msg);
    exit(1);
}

int
main(int argc, char *argv[])
{
    int master, data, mode, status, size, flag;
    struct sockaddr_in sin;
    pid_t child;
    char buffer[] = "D";

    /* Choose the blocking flag to use. */
    if (argv[1] && !strcmp(argv[1], "O_NDELAY")) {
        flag = O_NDELAY;
    } else {
        flag = O_NONBLOCK;
    }

    /* Parent will create the socket first to get the port number. */
    memset(&sin, '\0', sizeof(sin));
    sin.sin_family = AF_INET;
    master = socket(AF_INET, SOCK_STREAM, 0);
    if (master == -1) die("socket");
    if (bind(master, (struct sockaddr *) &sin, sizeof(sin)) < 0)
        die("bind");
    size = sizeof(sin);
    if (getsockname(master, (struct sockaddr *) &sin, &size) < 0)
        die("getsockname");
    if (listen(master, 1) < 0) die("listen");

    /* Fork, child closes the open socket and then tries to connect, parent
       calls listen() and accept() on it.  Parent will then set the socket
       non-blocking and try to read from it to see what happens, then write
       to the socket and close it, triggering the child close and exit. */
    child = fork();
    if (child < 0) {
        die("fork");
    } else if (child != 0) {
        /* Parent. */
        size = sizeof(sin);
        data = accept(master, (struct sockaddr *) &sin, &size);
        close(master);
        if (data < 0) die("accept");
        mode = fcntl(data, F_GETFL, 0);
        if (mode < 0) die("fcntl GETFL");
        if (fcntl(data, F_SETFL, mode | flag) < 0)
            die ("fcntl SETFL");
        status = read(data, buffer, sizeof(buffer));
        write(data, buffer, sizeof(buffer));
        close(data);
        printf("Return status: %d (%d)\n", status, errno);
        exit(status == EAGAIN);
    } else {
        /* Child. */
        close(master);
        data = socket(AF_INET, SOCK_STREAM, 0);
        if (data == -1) {
            perror("socket child");
            _exit(1);
        }
        if (connect(data, (struct sockaddr *) &sin, sizeof(sin)) < 0) {
            perror("connect");
            _exit(1);
        }
        status = read(data, buffer, sizeof(buffer));
        _exit(status > 0 ? 0 : 1);
    }

    /* NOTREACHED */
    exit(0);
}

-- 
Russ Allbery (rra at stanford.edu)         <URL:http://www.eyrie.org/~eagle/>