[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MBS_SUPPORT in regex.c causes massive performance penalty
From: |
Marc Horowitz |
Subject: |
MBS_SUPPORT in regex.c causes massive performance penalty |
Date: |
11 Apr 2001 22:17:36 -0400 |
User-agent: |
Gnus/5.0807 (Gnus v5.8.7) Emacs/20.3 |
Using the regex.c from libc 2.2.2, I compiled with and without
MBS_SUPPORT, and linked against this test program:
#include <fstream>
#include <regex.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
regex_t preg;
int regerr = regcomp(&preg, argv[1], REG_EXTENDED);
if (regerr) {
cerr << "regcomp: " << regerr << endl;
exit(1);
}
ifstream filtered(argv[2]);
char re_in[1000000];
filtered.get(re_in, sizeof(re_in), '\0');
// terminate at the end of the file
cerr << "input length: " << filtered.gcount() << endl;
re_in[filtered.gcount()] = 0;
// terminate somewhere else for testing
cerr << "test length: " << atoi(argv[3]) << endl;
re_in[atoi(argv[3])] = 0;
regerr = regexec(&preg, re_in, 0, 0, 0);
if (regerr && regerr != REG_NOMATCH) {
cerr << "regexec: " << regerr << endl;
exit(1);
}
if (!regerr)
cout << "found one" << endl;
exit(0);
}
On a P3-700 running debian testing, the performance difference is
substantial:
> time ./slow aqxz /usr/share/dict/words 100000
input length: 409067
test length: 100000
0.010u 0.000s 0:00.02 50.0% 0+0k 0+0io 168pf+0w
> time ./slowmbs aqxz /usr/share/dict/words 100000
input length: 409067
test length: 100000
70.950u 1.380s 1:13.55 98.3% 0+0k 0+0io 173pf+0w
As a separate issue, with the MBS_SUPPORT changes compiled in, it
appears that any code which expects regcomp/regexec to work with 8-bit
character sets such a iso-latin1 could get screwed by the mbs to wc
conversion. If this is true, then the whole idea of MBS_SUPPORT needs
to be thought out better. Perhaps instead of a compile-time switch,
there should be a run-time switch based on a flag passed to regcomp().
Marc
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- MBS_SUPPORT in regex.c causes massive performance penalty,
Marc Horowitz <=