bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Match returns impossible character range


From: Aharon Robbins
Subject: Re: Match returns impossible character range
Date: Wed, 14 Jul 2010 22:54:05 +0300
User-agent: Heirloom mailx 12.4 7/29/08

Actually, the patch was bad. I should know by now to run my test suite
before sending out patches. Here is the full fix.

Arnold
------------
Wed Jul 14 22:31:53 2010  Arnold D. Robbins  <address@hidden>

        * node.c (str2wstr): Keep going if get a bad multibyte sequence.
        Allows match to give correct answers for RSTART, RLENGTH.
        Add a lint warning.

Index: node.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/node.c,v
retrieving revision 1.24
diff -u -r1.24 node.c
--- node.c      13 Apr 2010 19:39:23 -0000      1.24
+++ node.c      14 Jul 2010 19:52:19 -0000
@@ -755,6 +755,7 @@
        char *sp;
        mbstate_t mbs;
        wchar_t wc, *wsp;
+       static short warned = FALSE;
 
        assert((n->flags & (STRING|STRCUR)) != 0);
 
@@ -803,7 +804,24 @@
                switch (count) {
                case (size_t) -2:
                case (size_t) -1:
-                       goto done;
+                       /*
+                        * Just skip the bad byte and keep going, so that
+                        * we get a more-or-less full string, instead of
+                        * stopping early. This is particularly important
+                        * for match() where we need to build the indices.
+                        */
+                       sp++;
+                       /*
+                        * mbrtowc(3) says the state of mbs becomes undefined
+                        * after a bad character, so reset it.
+                        */
+                       memset(& mbs, 0, sizeof(mbs));
+                       /* And warn the user something's wrong */
+                       if (do_lint && ! warned) {
+                               warned = TRUE;
+                               lintwarn(_("Invalid multibyte data detected. 
There may be a mismatch between your data and your locale"));
+                       }
+                       break;
 
                case 0:
                        count = 1;
@@ -820,9 +838,8 @@
                }
        }
 
-done:
        *wsp = L'\0';
-       n->wstlen = i;
+       n->wstlen = wsp - n->wstptr;
        n->flags |= WSTRCUR;
 #define ARBITRARY_AMOUNT_TO_GIVE_BACK 100
        if (n->stlen - n->wstlen > ARBITRARY_AMOUNT_TO_GIVE_BACK)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]