Perl filtering, pathological patterns

Russ Allbery rra at stanford.edu
Mon Mar 19 23:49:22 UTC 2001


Brandon Hume <hume at Den.BOFH.Halifax.NS.Ca> writes:

> I know some people (like Russ) have done some science as to what can
> trigger these events and how do deal with them.  I'm using Perl 5.6.0,
> and the rules which trigger the effect are:

>        $lines  = $hdr_hash{'__BODY__'} =~ tr/\n/\n/;

> ($hdr_hash{'__BODY__'} =~ 
> 	m%(?:^[ \t>]*(?>M[\x20-\x60]{59,60})[ \r]*\n){40}%mo ||
            
>  $hdr_hash{'__BODY__'} =~ 
> 	m%(?:^[ \t>]*[A-Za-z0-9+/]{59,76}[ \r]*\n){40}%mo)) {

> I can probably fix these up, but I thought I'd check with those who
> likely already have first.

It's the first pattern that's causing problems, and it's the {59,60} that
does it.  It triggers on lines starting with M that have trailing space.
It's pretty rare, but some people post stuff with lots of capital Ms that
are whitespace-filled to column 60.

I use:

# How we determine if a post is a binary.
sub is_binary {
    ($hdr{__BODY__} =~ m%(?:^[ \t>]*(?>M[\x20-\x60]{59,60})[ \r]*\n){40}%mo
     || $hdr{__BODY__} =~ m%(?:^[ \t>]*[A-Za-z0-9+/]{59,76}[ \r]*\n){40}%mo);
}

which requires Perl 5.005 or later.  An alternative is to disallow the 59
character case (just check for 60 characters), which I think catches
nearly all uuencoded material anyway.

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>


More information about the inn-workers mailing list