bug-glibc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MBS_SUPPORT in regex.c causes massive performance penalty


From: Marc Horowitz
Subject: MBS_SUPPORT in regex.c causes massive performance penalty
Date: 11 Apr 2001 22:17:36 -0400
User-agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.3

Using the regex.c from libc 2.2.2, I compiled with and without
MBS_SUPPORT, and linked against this test program:

#include <fstream>
#include <regex.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
    regex_t preg;

    int regerr = regcomp(&preg, argv[1], REG_EXTENDED);
    if (regerr) {
        cerr << "regcomp: " << regerr << endl;
        exit(1);
    }

    ifstream filtered(argv[2]);

    char re_in[1000000];
    filtered.get(re_in, sizeof(re_in), '\0');
    // terminate at the end of the file
    cerr << "input length: " << filtered.gcount() << endl;
    re_in[filtered.gcount()] = 0;
    // terminate somewhere else for testing
    cerr << "test length:  " << atoi(argv[3]) << endl;
    re_in[atoi(argv[3])] = 0;

    regerr = regexec(&preg, re_in, 0, 0, 0);
    if (regerr && regerr != REG_NOMATCH) {
        cerr << "regexec: " << regerr << endl;
        exit(1);
    }

    if (!regerr)
        cout << "found one" << endl;

    exit(0);
}

On a P3-700 running debian testing, the performance difference is
substantial:

> time ./slow aqxz /usr/share/dict/words 100000
input length: 409067
test length:  100000
0.010u 0.000s 0:00.02 50.0%     0+0k 0+0io 168pf+0w
> time ./slowmbs aqxz /usr/share/dict/words 100000
input length: 409067
test length:  100000
70.950u 1.380s 1:13.55 98.3%    0+0k 0+0io 173pf+0w

As a separate issue, with the MBS_SUPPORT changes compiled in, it
appears that any code which expects regcomp/regexec to work with 8-bit
character sets such a iso-latin1 could get screwed by the mbs to wc
conversion.  If this is true, then the whole idea of MBS_SUPPORT needs
to be thought out better.  Perhaps instead of a compile-time switch,
there should be a run-time switch based on a flag passed to regcomp().

                Marc



reply via email to

[Prev in Thread] Current Thread [Next in Thread]