bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] parser handling of \^A


From: Grisha Levit
Subject: [PATCH] parser handling of \^A
Date: Thu, 12 Oct 2023 21:36:48 -0700

There are some issues with parser output when the input has an unquoted
backslash followed by a raw ^A character:

$ bash -c $'echo ${_+\\\1}' |& cat -v
bash: line 1: bad substitution: no closing `}' in ${_+\^A^A}

$ bash -c $'[[ \1 =~ (\\\1) ]]' |& cat -v
bash: line 1: [[: invalid regular expression `(^A\)': parentheses not balanced

$ bash -c $'echo $\'\\\1\'' | cat -v
\^A^A

$ bash -c $'echo "\\\1\177"' | cat -v
\^A^A^?

The main loop in read_token_word usually ^A-escapes ^A and ^?, but not
when they are escaped by a backslash -- the char pairs \^A and \^? are
stored as is.  OTOH, the loop in parse_matched_pair special-cases \^A,
outputting \^A^A.

However, when expand_word_internal subsequently encounters this \^A^A,
the backslash escapes the first ^A, and the second ^A escapes whatever
character happens to follow.

AFAICT, everything works fine if parse_matched_pair just stores \^A as
is (as long as dequote_string doesn't drop trailing ^A's).  This seems
a lot easier than the alternative of teaching the subst.c functions to
handle \^A^A and \^A^? specially but maybe there's some other approach.
---
diff --git a/parse.y b/parse.y
index 3e5b814f..dd35ea76 100644
--- a/parse.y
+++ b/parse.y
@@ -3834,9 +3834,7 @@ parse_matched_pair (int qc, int open, int close,
size_t *lenp, int flags)
              continue;
            }

-         RESIZE_MALLOCED_BUFFER (ret, retind, 2, retsize, 64);
-         if MBTEST(ch == CTLESC)
-           ret[retind++] = CTLESC;
+         RESIZE_MALLOCED_BUFFER (ret, retind, 1, retsize, 64);
          ret[retind++] = ch;
          continue;
        }
diff --git a/subst.c b/subst.c
index 89ec6eb7..f075380c 100644
--- a/subst.c
+++ b/subst.c
@@ -4810,14 +4810,6 @@ dequote_string (const char *string)
       return (result);
     }

-  /* A string consisting of only a single CTLESC should pass through
unchanged */
-  if (string[0] == CTLESC && string[1] == 0)
-    {
-      result[0] = CTLESC;
-      result[1] = '\0';
-      return (result);
-    }
-
   /* If no character in the string can be quoted, don't bother examining
      each character.  Just return a copy of the string passed to us. */
   if (strchr (string, CTLESC) == NULL)
@@ -4827,12 +4819,8 @@ dequote_string (const char *string)
   s = (char *)string;
   while (*s)
     {
-      if (*s == CTLESC)
-       {
-         s++;
-         if (*s == '\0')
-           break;
-       }
+      if (*s == CTLESC && s[1])
+       s++;
       COPY_CHAR_P (t, s, send);
     }



reply via email to

[Prev in Thread] Current Thread [Next in Thread]