bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug in regular expression \B using DFA


From: Aharon Robbins
Subject: Re: Bug in regular expression \B using DFA
Date: Wed, 30 Jul 2008 23:15:01 +0300

Greetings. Re this:

> From: "T. X. G." <address@hidden>
> Subject: Bug in regular expression \B using DFA
> Date: Wed, 16 Jul 2008 05:23:09 -0700 (PDT)
> To: address@hidden
>
> ~ gawk --version
> GNU Awk 3.1.6
> Copyright (C) 1989, 1991-2007 Free Software Foundation.
>
> ......
>
> You should have received a copy of the GNU General Public License
> along with this program. If not, see http://www.gnu.org/licenses/.
>
> ~ LC_ALL=C gawk 'BEGIN{x="abcd";gsub(/\B/,":",x);print x}'
> a:b:cd
>
> ~ LC_ALL=en_US.UTF-8 gawk 'BEGIN{x="abcd";gsub(/\B/,":",x);print x}'
> a:b:c:d
>
> ~ GAWK_NO_DFA=1 gawk 'BEGIN{x="abcd";gsub(/\B/,":",x);print x}'
> a:b:c:d

This is indeed a bug. Please apply the following patch. It will
make its way to CVS shortly.

Thanks

Arnold
------------------------------------
Wed Jul 30 23:10:51 2008  Arnold D. Robbins  <address@hidden>

        * re.c (research): Don't ever use DFA if need_start. It can
        break on some weird cases.  Reported by
         "T. X. G." <address@hidden>.

--- re.c        11 Aug 2007 19:49:23 -0000      1.6
+++ re.c        30 Jul 2008 20:12:10 -0000
@@ -232,8 +232,11 @@
         * focused, perhaps we should relegate the DFA matcher to the
         * single byte case all the time. OTOH, the speed difference
         * between the matchers in non-trivial... Sigh.)
+        *
+        * 7/2008: Simplify: skip dfa matcher if need_start. The above
+        * problems are too much to deal with.
         */
-       if (rp->dfa && ! no_bol && (gawk_mb_cur_max == 1 || ! need_start)) {
+       if (rp->dfa && ! no_bol && ! need_start) {
                char save;
                int count = 0;
                /*




reply via email to

[Prev in Thread] Current Thread [Next in Thread]