Date parsing

bill davidsen davidsen at tmr.com
Tue Sep 3 11:39:50 UTC 2002


In article <yl1y8bipnz.fsf at windlord.stanford.edu>,
Russ Allbery  <rra at stanford.edu> wrote:
| 
| So today, in-between writing with friends, I played about with something
| that I've been wanting to play with since I first saw parsedate, namely a
| hand-written RFC 2822 date parser.

	[... details snipped ...]

| If the parser were modified to accept one-digit hours, that would cut the
| percentage of articles rejected down to 0.14% (5,607 articles), but I'm
| guessing that there would still be a few noticable ones in that pile.
| 
| Accepting dates with BST and UTC as time zones and dates with no time
| zones would cut the rejected count down to 2,072 articles (0.05%), and
| actually about 1,200 of those are articles from 1992 through 1995 on my
| server in the slac.* hierarchy that have fully spelled-out weekday names,
| so with those changes the rejections would probably be in the noise.
| 
| Anyone have any thoughts about all this?

Sounds like a good job to me! I think a strict mode and relaxed mode
would be enough, with relaxed mode accepting virtually any human
readable format. Unless you care about the day of the week, it can be
ignored however spelled. Once you have the date you can get the weekday
if you care. Common timezones and GMT+N seem common enough to be
acceptable.

And of course this can be a resource by itself, I'd love to plug a
better date parser into a few other applications!
-- 
bill davidsen <davidsen at tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


More information about the inn-workers mailing list