bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU regex does not terminate when matching some regular expressions on a


From: Julian Büning
Subject: GNU regex does not terminate when matching some regular expressions on any input
Date: Wed, 11 Apr 2018 14:20:22 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2

While we are fully aware that GNU regex has been decommissioned for many
years (last release April 1993), several (in some cases modified)
versions are still in use in a number of actively maintained free and
source software projects. In some cases, GNU regex functions only as a
fallback, in others it provides the sole implementation of regular
expression matching. For example, GNU libiberty [1], mutt [2] and squid
[3] still utilize versions of GNU regex.

We therefore report this bug not in the expectation that it will be
fixed, but rather to remind developers of software still using GNU regex
that it might be time to cease using GNU regex in their projects. As
suggested in [4], Gnulib provides a modern alternative that should be
able to replace GNU regex with reasonable effort.

Example:
    regex_t preg;
    regmatch_t matched_range;
    int err = regcomp(&preg, ".?^*", REG_EXTENDED);

    if (err != 0) {
        char msg[255];
        regerror(err, &preg, msg, 255);
        printf("%s\n", msg);
        return -1;
    }

    regexec(&preg, "test", 1, &matched_range, 0);

Observed behavior:
When using GNU regex as implementation of regcomp() and regexec(), the
latter does never return on any input string. That is, matching the
compiled regular expression ".?^*" against "test", but also any other
input, including the empty string.

Alternative behavior:
When using a current system provided glibc implementation, regcomp()
returns an error code and regerror() outputs "Invalid preceding regular
expression" as message.

There are similar regular expressions that exhibit the same behavior,
such as ".*^+" or ".*$*".

While we were not able to trace down the origin of the bug, it seems
like the result of regcomp() triggers an infinite loop in regexec() by
pushing and popping failure points.

Note, that GNU Emacs [5] also uses a modified version of GNU regex,
which does not seem to exhibit this particular bug.

This bug was found in klee-uclibc (which uses GNU regex by default)
using Symbolic Execution techniques developed in the course of the
SYMBIOSYS research project at COMSYS, RWTH Aachen University. This
research is supported by the European Research Council (ERC) under the
EU's Horizon 2020 Research and Innovation Programme grant agreement n.
647295 (SYMBIOSYS).

[1] https://gcc.gnu.org/onlinedocs/libiberty/
[2] http://www.mutt.org/
[3] http://www.squid-cache.org/
[4] https://www.gnu.org/software/regex/
[5] https://www.gnu.org/software/emacs/

Best regards,
Julian



reply via email to

[Prev in Thread] Current Thread [Next in Thread]