Date parsing
bill davidsen
davidsen at tmr.com
Sat Sep 7 21:09:16 UTC 2002
In article <yly9ah1g82.fsf at windlord.stanford.edu>,
Russ Allbery <rra at stanford.edu> wrote:
|
| bill davidsen <davidsen at tmr.com> writes:
|
| > I think in this case we want options because we (site admins) WILL want
| > to have local policy and MAY want that to be more strict than the feed
| > accept policy. As in be generous in what you accept and strict in what
| > you generate. In general, if I can read it, I would hope the date parser
| > can, too.
|
| I'm curious if you still feel that way with the results of Andrew's
| research. Rejecting malformed Date headers gets rid of a ton of spam and
| job spew.
What I kind of hoped for was a status flag if the date could be parsed,
so that sites could apply policy to it. One might drop it, one might
drop it in feed and rewrite it in post (ignore the politics, but it
would be technically possible). I'm more interested in extending choices
if possible, and letting policy determine which choices are made.
| I think that for feed acceptance, we need to have some exceptions (no time
| zone, missing first digits in various numbers, and a few more non-numeric
| time zones), but I don't think we need to go nearly as far as parsedate is
| going right now. (Certainly we don't need to go farther into accepting
| any human-readable date; that involves things like 2002-09-03 23:23, which
| parsedate won't accept).
I can read it...
| I do think that at some point we say that news articles are supposed to
| follow RFC 1036 and RFC 2822 and people putting random junk into the Date
| header are generating malformed articles, just like people putting random
| junk into the Message-ID header.
To some extent that's apples and oranges, the date is parsed and the
msgid is atomic (or boolean if you will) and is accepted or not, but not
parsed other than as a string. Date is used as data for epire and such,
although I can make a good case for using posting date if the date
header is questionable.
You asked for comments, and that's mine. The more fleible it can be the
more choices the admins have.
| I guess my personal opinion is that I'd like to require RFC 2822 dates in
| nnrpd and use the same parser but with the above-mentioned tweaks in innd
| for article acceptance. Andrew's analysis pretty much convinced me that
| any lossage from doing that will be well into the noise.
Unfortunately that would lose some mail I really want to see, which I
gate into news for easier reading (better tools).
| But I'm not an ISP; maybe I'm too willing to require people use non-broken
| software.
That's a fine line, I walk that one with management all the time, with
various things including SPAM and off-topic blocking.
| > Long month names kind of fall under what I meant by "if I can read it"
| > option, matter of policy rather than technology.
|
| Well, I don't really agree that this is policy rather than technology.
| Compliance with a network protocol standard is technology to me.
Technology is what's possible, policy is what's permissible.|
--
bill davidsen <davidsen at tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
More information about the inn-workers
mailing list