2.3 getting caught in tight Perl regex loop?

Wed Oct 25 22:18:50 UTC 2000

Wolfgang Breyha <wbreyha at gmx.net> writes:

> Ok I've decided to sent the rest of the article, too (since it is not
> much).  BTW, I'm using perl 5.005_03 with INN 2.2.3.

The binary filtering regexes in Cleanfeed, the last time I looked at it,
could go into exponential loops on certain types of articles with text
that looks like uuencoding but has varying numbers of trailing spaces.
I did a bunch of analysis of those regexes at one point and ended up using
the following, which I think is more robust:

# How we determine if a post is a binary.
sub is_binary {
    ($hdr{__BODY__} =~ m%(?:^[ \t>]*(?>M[\x20-\x60]{59,60})[ \r]*\n){40}%mo
     || $hdr{__BODY__} =~ m%(?:^[ \t>]*[A-Za-z0-9+/]{59,76}[ \r]*\n){40}%mo);
}

The key part is the (?>...) bit in the first regex, which prevents the
backtracking that was causing the exponential growth in processing time.
An alternate way of solving the same problem that doesn't use 5.005 regex
features is to break out the regex so that the part for matching uuencoded
content doesn't also match space in the 60th position:

    m%(?:^[ \t>]*M[\x20-\x60]{59}[\x21-\x60]?[ \r]*\n){40}%mo

-- 
Russ Allbery (rra at stanford.edu)             <http://www.eyrie.org/~eagle/>