malformed overview! batches

Todd Olson tco2 at cornell.edu
Thu Jan 18 13:13:58 UTC 2001


Hi

A wild idea born of ignorance ...
What if the last byte/word of every line was a checksum
then the receiving end of the pipe could detect these fragmented
lines and either not log them, or do something special with them.

Perhaps easier, with less overhead, what if every line had to *start*
with a given bytevalue, that did not otherwise occure in the line.
Then if the bytevalue was not there the receiver would know the line
is corrupt and treat it special.

Maybe there are some ideas from the networking world on unreliable
deliver, detection there of, that might be applicable and not too costly.

Regards,
Todd Olson

At 11:15 +0100 2001/01/18, Olaf Titz wrote:
> >   however, the first line of the batch file is always corrupted.  it's
> > missing about the
> > first half of the data:
>
>This is the backlogging channel feed problem I've been talking about
>recently. The cause is roughly this: the channel feed is implemented
>via a pipe or socketpair which has a fixed size buffer in the kernel.
>(Usually 8192 or 32768 bytes.) When a channel gets lagging behind, the
>pipe buffer fills up. The last few bytes to be squeezed into the
>buffer with write() are most likely the first half of a line innd
>wants to send to the channel. After that, the write() returns
>first incomplete, then EAGAIN (or select() doesn't select for write
>any more), and innd switches to spool file mode.
>
>Unfortunately, it now starts out with the second half of the line it
>was trying to write last. Result: the first line which goes into the
>spool file is corrupted. I can remember having this problem with
>overchan under INN 1.x very frequently; back then loss of an overview
>line wasn't catastrophic.
>
>The root cause is the lack of an atomic write() operation on pipes and
>stream sockets under the usual Un*x API. What would be needed here is
>a SEQPACKET socket, but most systems don't provide that. There is no
>easy way around it on the application level; the best that can be done
>is for every channel feed to read as fast as it can and do its own
>buffering, so the kernel buffer won't ever fill up. innfeed does that,
>and the code is rather involved. (c-nocem does it too, in perl even,
>and the code is involved too.)
>
>Perhaps a library function would be in order to do this buffering for
>all standard, and preferrably also third-party, channel feeds. However
>this could get complicated because it doesn't fit in the while(!eof)
>{ read_line(); process_line(); } programming model. Basically every
>channel feed would need its own select loop.
>
>Olaf




More information about the inn-workers mailing list