Date parsing
bill davidsen
davidsen at tmr.com
Wed Sep 4 22:28:08 UTC 2002
In article <yl3csq7vp0.fsf at windlord.stanford.edu>,
Russ Allbery <rra at stanford.edu> wrote:
|
| Jeffrey M Vinocur <jeff at litech.org> writes:
| > On Mon, 2 Sep 2002, Russ Allbery wrote:
|
| >> I think that using the stricter parser in nnrpd for local posts is an
| >> obvious thing to do, in the "be strict in what we generate" department.
|
| > Where does a malformed date rejection sit relative to the nnrpd posting
| > filters? (That is, can someone who has users that Just Won't Upgrade
| > hack around it with a filter.)
|
| It sits after the filters (people wanted that so that they could
| manipulate the results of nnrpd's internal manipulations), so they
| wouldn't help there.
|
| We *could* add a new configuration option that says to drop and regenerate
| invalid dates rather than rejecting them. Or a configuration option to
| allow the loose parsing mode. Dunno. I have some inherent dislike for
| yet more options until we go back and get rid of a few, but that's just my
| bias. :)
I think in this case we want options because we (site admins) WILL want
to have local policy and MAY want that to be more strict than the feed
accept policy. As in be generous in what you accept and strict in what
you generate. In general, if I can read it, I would hope the date parser
can, too.
Now I think to avoid people asking for it later it would be nice to have
an option to return status of "date valid but format is ugly," for sites
who want to accept but rewrite date headers if they are readable but broken.
|
| >> Accepting dates with BST and UTC as time zones and dates with no time
| >> zones would cut the rejected count down to 2,072 articles (0.05%)
|
| > That's still a lot.
| I should mention that the only problem I saw with really old articles was
| just this problem with overlong month names and a few problems with
| missing time zones. Other than that, they didn't seem particularly prone
| to having more problems than other articles.
Long month names kind of fall under what I meant by "if I can read it"
option, matter of policy rather than technology.
--
bill davidsen <davidsen at tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
More information about the inn-workers
mailing list