Date parsing

Russ Allbery rra at stanford.edu
Wed Sep 4 23:34:53 UTC 2002


bill davidsen <davidsen at tmr.com> writes:

> I think in this case we want options because we (site admins) WILL want
> to have local policy and MAY want that to be more strict than the feed
> accept policy. As in be generous in what you accept and strict in what
> you generate. In general, if I can read it, I would hope the date parser
> can, too.

I'm curious if you still feel that way with the results of Andrew's
research.  Rejecting malformed Date headers gets rid of a ton of spam and
job spew.

I think that for feed acceptance, we need to have some exceptions (no time
zone, missing first digits in various numbers, and a few more non-numeric
time zones), but I don't think we need to go nearly as far as parsedate is
going right now.  (Certainly we don't need to go farther into accepting
any human-readable date; that involves things like 2002-09-03 23:23, which
parsedate won't accept).

I do think that at some point we say that news articles are supposed to
follow RFC 1036 and RFC 2822 and people putting random junk into the Date
header are generating malformed articles, just like people putting random
junk into the Message-ID header.

I guess my personal opinion is that I'd like to require RFC 2822 dates in
nnrpd and use the same parser but with the above-mentioned tweaks in innd
for article acceptance.  Andrew's analysis pretty much convinced me that
any lossage from doing that will be well into the noise.

But I'm not an ISP; maybe I'm too willing to require people use non-broken
software.

> Long month names kind of fall under what I meant by "if I can read it"
> option, matter of policy rather than technology.

Well, I don't really agree that this is policy rather than technology.
Compliance with a network protocol standard is technology to me.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>

    Please send questions to the list rather than mailing me directly.
     <http://www.eyrie.org/~eagle/faqs/questions.html> explains why.


More information about the inn-workers mailing list