[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
gawk: length return incorrect value when MB_CUR_MAX > 1
From: |
KIMURA Koichi |
Subject: |
gawk: length return incorrect value when MB_CUR_MAX > 1 |
Date: |
Wed, 30 Nov 2005 09:29:56 +0900 |
Hi,
A certain user found the bug of gawk 3.1.5's length function.
$LANG=ja_JP.utf8 gawk 'BEGIN {print length("abc\0def")}'
This script prints '3', not '7'. I have tested Windows and GNU/Linux
(Fedora Core3).
In the place where I examined it, mbrtowc function seems don't convert
'\0' character (return 0).
Here is a patch.
--- node.c.1~ 2005-07-27 03:07:43.000000000 +0900
+++ node.c 2005-11-27 04:18:49.000000000 +0900
@@ -745,7 +745,13 @@
src_count = n->stlen;
memset(& mbs, 0, sizeof(mbs));
for (i = 0; src_count > 0; i++) {
- count = mbrtowc(& wc, sp, src_count, & mbs);
+ if (*sp != '\0') {
+ count = mbrtowc(& wc, sp, src_count, & mbs);
+ }
+ else { /* NUL character at middle of string */
+ wc = L'\0';
+ count = 1;
+ }
switch (count) {
case (size_t) -2:
case (size_t) -1:
Thank you,
--
KIMURA Koichi
- gawk: length return incorrect value when MB_CUR_MAX > 1,
KIMURA Koichi <=