octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #62704] Regexpi() fatal error triggers core du


From: Arun Giridhar
Subject: [Octave-bug-tracker] [bug #62704] Regexpi() fatal error triggers core dump
Date: Mon, 4 Jul 2022 12:45:58 -0400 (EDT)

Follow-up Comment #4, bug #62704 (project octave):

Localized the error.

By adding

std::cerr << __LINE__ << '\n';

through the regexpi functions, the crash could be localized to this statement
in the octregexp function in regexp.cc:

  const regexp::match_data rx_lst
    = regexp::match (pattern, buffer, options, who);


That statement in turn calls this function in lo-regexp.h:

    static match_data
    match (const std::string& pat, const std::string& buffer,
           const regexp::opts& opt = regexp::opts (),
           const std::string& who = "regexp")
    {
      regexp rx (pat, opt, who);

      return rx.match (buffer);
    }


The first line triggers the error:

      regexp rx (pat, opt, who);


That constructor is just a wrapper to compile_internal():

    regexp (const std::string& pat = "",
            const regexp::opts& opt = regexp::opts (),
            const std::string& w = "regexp")
      : m_pattern (pat), m_options (opt), m_data (nullptr), m_named_pats (),
        m_names (0), m_named_idx (), m_who (w)
    {
      compile_internal ();
    }


The buffer overflow happens in compile_internal() in lo-regexp.cc in this
sequence:

    85      std::cerr << __LINE__ << '\n';
    86  
    87      while ((new_pos = m_pattern.find ("(?", pos)) !=
std::string::npos)
    88        {
    89          std::cerr << __LINE__ << '\n';
    90          if (m_pattern.at (new_pos + 2) == '<'
    91              && !(m_pattern.at (new_pos + 3) == '='
    92                   || m_pattern.at (new_pos + 3) == '!'))
    93            {
    94              // The syntax of named tokens in pcre is "(?P<name>...)"
while
    95              // we need a syntax "(?<name>...)", so fix that here.  Also
an
    96              // expression like
    97              //
"(?<first>\w+)\s+(?<last>\w+)|(?<last>\w+),\s+(?<first>\w+)"
    98              // should be perfectly legal, while pcre does not allow the
same
    99              // named token name on both sides of the alternative.  Also
fix
   100              // that here by replacing name tokens by dummy names, and
dealing
   101              // with the dummy names later.
   102  
   103              std::cerr << __LINE__ << '\n';


The cerr calls at line 85 and line 89 both work, then the if-statement at line
90 causes an overflow and the next cerr at line 103 is not called.

Does there need to be a length guard before we call m_pattern.at()?


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?62704>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]