[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH] fall back to glibc matcher if a multibyte match is found
From: |
Jim Meyering |
Subject: |
Re: [RFC PATCH] fall back to glibc matcher if a multibyte match is found |
Date: |
Fri, 30 Apr 2010 18:31:57 +0200 |
Paolo Bonzini wrote:
> This patch works around the performance problems that are still in
> current grep. Red Hat will probably be using it in its own 2.6.x.
>
> For UTF-8 it should trigger only in the presence of MBCSET, e.g. [a-z]
> or [à] (nad the latter case could be avoided).
>
> For other character sets all brackets, and `.' as well, will trigger it.
>
> Thoughts?
> ---
> src/dfa.c | 9 +++++++++
> 1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/src/dfa.c b/src/dfa.c
> index 2bc0c0e..775943c 100644
> --- a/src/dfa.c
> +++ b/src/dfa.c
> @@ -3213,6 +3213,15 @@ dfaexec (struct dfa *d, char const *begin, char *end,
> continue;
> }
>
> + if (backref)
> + {
> + *backref = 1;
> + free(mblen_buf);
> + free(inputwcs);
> + *end = saved_end;
> + return (char *) p;
> + }
> +
> /* Can match with a multibyte character (and multi character
> collating element). Transition table might be updated. */
> s = transit_state(d, s, &p);
Sounds like a good change, but please add a comment.
Can you suggest a pathologically bad example
with which we can try to come up with a performance-measuring
addition to the test suite?
I'll take a closer look next week.