[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Match returns impossible character range
From: |
Aharon Robbins |
Subject: |
Re: Match returns impossible character range |
Date: |
Wed, 14 Jul 2010 22:54:05 +0300 |
User-agent: |
Heirloom mailx 12.4 7/29/08 |
Actually, the patch was bad. I should know by now to run my test suite
before sending out patches. Here is the full fix.
Arnold
------------
Wed Jul 14 22:31:53 2010 Arnold D. Robbins <address@hidden>
* node.c (str2wstr): Keep going if get a bad multibyte sequence.
Allows match to give correct answers for RSTART, RLENGTH.
Add a lint warning.
Index: node.c
===================================================================
RCS file: /d/mongo/cvsrep/gawk-stable/node.c,v
retrieving revision 1.24
diff -u -r1.24 node.c
--- node.c 13 Apr 2010 19:39:23 -0000 1.24
+++ node.c 14 Jul 2010 19:52:19 -0000
@@ -755,6 +755,7 @@
char *sp;
mbstate_t mbs;
wchar_t wc, *wsp;
+ static short warned = FALSE;
assert((n->flags & (STRING|STRCUR)) != 0);
@@ -803,7 +804,24 @@
switch (count) {
case (size_t) -2:
case (size_t) -1:
- goto done;
+ /*
+ * Just skip the bad byte and keep going, so that
+ * we get a more-or-less full string, instead of
+ * stopping early. This is particularly important
+ * for match() where we need to build the indices.
+ */
+ sp++;
+ /*
+ * mbrtowc(3) says the state of mbs becomes undefined
+ * after a bad character, so reset it.
+ */
+ memset(& mbs, 0, sizeof(mbs));
+ /* And warn the user something's wrong */
+ if (do_lint && ! warned) {
+ warned = TRUE;
+ lintwarn(_("Invalid multibyte data detected.
There may be a mismatch between your data and your locale"));
+ }
+ break;
case 0:
count = 1;
@@ -820,9 +838,8 @@
}
}
-done:
*wsp = L'\0';
- n->wstlen = i;
+ n->wstlen = wsp - n->wstptr;
n->flags |= WSTRCUR;
#define ARBITRARY_AMOUNT_TO_GIVE_BACK 100
if (n->stlen - n->wstlen > ARBITRARY_AMOUNT_TO_GIVE_BACK)