Date parsing
Russ Allbery
rra at stanford.edu
Tue Sep 3 18:51:55 UTC 2002
Jeffrey M Vinocur <jeff at litech.org> writes:
> On Mon, 2 Sep 2002, Russ Allbery wrote:
>> I think that using the stricter parser in nnrpd for local posts is an
>> obvious thing to do, in the "be strict in what we generate" department.
> Where does a malformed date rejection sit relative to the nnrpd posting
> filters? (That is, can someone who has users that Just Won't Upgrade
> hack around it with a filter.)
It sits after the filters (people wanted that so that they could
manipulate the results of nnrpd's internal manipulations), so they
wouldn't help there.
We *could* add a new configuration option that says to drop and regenerate
invalid dates rather than rejecting them. Or a configuration option to
allow the loose parsing mode. Dunno. I have some inherent dislike for
yet more options until we go back and get rid of a few, but that's just my
bias. :)
>> Accepting dates with BST and UTC as time zones and dates with no time
>> zones would cut the rejected count down to 2,072 articles (0.05%)
> That's still a lot.
Well, it seems like a lot until you notice that my server, just yesterday
in a single day (in comparison to the between two weeks and a month of
articles on my server in most groups and years in local groups), rejected
1,059 articles for having invalid Date headers using the existing
parsedate parser. Adding at most another couple hundred a day by using a
stricter parser ends up not looking like that much of a difference.
> How does makehistory deal with this? (We certainly don't want old
> articles to suddenly vanish one day. But clearly it needs to know the
> date, so if the new parser can't parse...)
makehistory does the right thing; if it can't parse the date, it just uses
the arrived date for the article posting date. So when switching to a
stricter parser, we don't have to worry about running makehistory. We'd
still have to worry about feeding a spool to another server using innxmit,
but there we already have to worry about Xref headers. I could probably
clean up my fixref script a bit more and toss in a call to one of Perl's
insane date parsing modules to parse bizarre, random dates and turn them
into RFC 2822 dates to work around that problem.
I should mention that the only problem I saw with really old articles was
just this problem with overlong month names and a few problems with
missing time zones. Other than that, they didn't seem particularly prone
to having more problems than other articles.
--
Russ Allbery (rra at stanford.edu) <http://www.eyrie.org/~eagle/>
Please send questions to the list rather than mailing me directly.
<http://www.eyrie.org/~eagle/faqs/questions.html> explains why.
More information about the inn-workers
mailing list