Nonblocking I/O and POLL_BUG

Russ Allbery rra at stanford.edu
Mon Oct 25 23:42:15 UTC 1999


Per Hedeland <per at erix.ericsson.se> writes:
> Russ Allbery wrote:

>> **  The semantics of O_NDELAY are that if the read would block, it
>> **  returns 0 instead.

> The semantics of O_NDELAY on SysV-heritage systems, that is (broken as
> usual:-).

Well, that's where O_NDELAY originated, so I think it's safe to say that
those are the canonical semantics.  I know that some systems, like Linux,
have made O_NDELAY synonymous with O_NONBLOCK, though.

> I'm not quite sure I follow you here, but: If select() (or poll())
> returns saying that a descriptor is ready for reading, and you get
> EAGAIN when first trying to read it, that is most definitely a bug.

I don't think I agree.

Saying that it's a bug implies that returning ready for read from select()
is not only a guarantee that there will be data available when the
application reads from the socket, but that this guarantee lasts for some
indefinite period of time into the future.  I think that's pretty strong,
and I don't know of any standard which actually makes that guarantee.

But regardless, it doesn't actually work that way on Solaris, which under
at least versions 2.4 through 2.6 inclusive read() will sometimes return
EAGAIN even if the socket had previously selected for reading.

> Likewise if that read returns 0 without a "real" EOF condition existing.
> Plus of course per above one has to wonder why one on Solaris would
> *ever* get EAGAIN on read() when using O_NDELAY to set non-blocking
> mode...

O_NDELAY was defined to return EAGAIN.  Maybe you're thinking of the BSD
FNDELAY, which isn't necessarily the same thing, and which is defined to
return EWOULDBLOCK?

(There are also various places in INN that were checking for EWOULDBLOCK
and not for EAGAIN; I've fixed those to check for both, but haven't
committed those patches yet.)

After running the new code on my Solaris 2.6 test server for about 12
hours now, I seem to be getting a lot fewer spurious readclose and read
error messages, and there doesn't appear to be a negative impact on
performance.  So I think that at least for Solaris this is the right thing
to do.

Your table says to me that we really want an autoconf test.  Do you still
have your testing code available?

-- 
Russ Allbery (rra at stanford.edu)         <URL:http://www.eyrie.org/~eagle/>


More information about the inn-workers mailing list