timeout switch for actsync

Florian Schlichting fschlich at CIS.FU-Berlin.DE
Thu Jun 23 19:07:46 UTC 2011


for a long time, we've been having occasional problems with actsync
syncing from certain servers: instead of erroring out and exiting (with
"cannot connect to server: Connection timed out" or alternatively
"cannot connect to server: Success"), actsync would hang until manually
killed, sometimes for days.

I've now had the chance to look more closely at the situation with a
certain "undead" server, where the oomkiller was loose and telnet to
port 119 would report the establishment of a connection ("Escape
character is '^]'."), but the remote server would never send anything,
not even the banner. While telnet times out after a while, actsync
apparently doesn't.

gdb shows actsync is hanging in a call to fgets in NNTPconnect() (from
actsync.c:get_active()). While I don't understand why fgets doesn't
return when the TCP connection times out, looking around I found that
innxmit and innxbatch can use alarm() to set a timeout (other users of
NNTPconnect, such as nntpget, rnews, clientlib.c and nnrpd/post.c
apparently don't).

When I tried to add a switch to actsync to implement a timeout similar
to innxmit, I found that the letters used there (-t and -T) are already
in use for other things. So what would be a good name for a timeout
switch for actsync? And since fgets never seems to return in certain
cases, shouldn't the timeout be implemented for all users of
NNTPconnect, perhaps directly in that function?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5557 bytes
Desc: not available
URL: <https://lists.isc.org/pipermail/inn-workers/attachments/20110623/2ab33caf/attachment.bin>

More information about the inn-workers mailing list