[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Provide octal (e.g. \0123) and hex (e.g. \x1f) c-escape
From: |
Jose E. Marchesi |
Subject: |
Re: [PATCH] Provide octal (e.g. \0123) and hex (e.g. \x1f) c-escape |
Date: |
Fri, 04 Oct 2019 16:19:49 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) |
Hi Henner.
This is a patch to ad c-escape parsing for bytes written as octal or hex in
C-strings.
It is a somewhat in a more C99 style, so using const where possible, and
declaring variables where first needed to add clarity,
Let me know what you think
Thanks for the patch. Please find some comments below.
Cheers,
-henner
From 737474fc21f8f63a5876999b9957ab99c362ff7f Mon Sep 17 00:00:00 2001
From: Henner Zeller <address@hidden>
Date: Thu, 3 Oct 2019 20:16:26 -0700
Subject: [PATCH] Provide octal (e.g. \0123) and hex (e.g. \x1f) c-escape
string parsing.
2019-10-03 Henner Zeller <address@hidden>
* src/pkl-trans.c: extend pkl_trans1_ps_string to parse numeric
c-escape characters.
---
src/pkl-trans.c | 80 +++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 67 insertions(+), 13 deletions(-)
diff --git a/src/pkl-trans.c b/src/pkl-trans.c
index 2b3e26d..8ffa8d6 100644
--- a/src/pkl-trans.c
+++ b/src/pkl-trans.c
@@ -430,6 +430,46 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_var)
}
PKL_PHASE_END_HANDLER
+static int parse_xdigit (char v) {
+ if (v >= '0' && v <= '9') return v - '0';
+ if (v >= 'A' && v <= 'F') return v - 'A' + 10;
+ if (v >= 'a' && v <= 'f') return v - 'a' + 10;
+ return 0xff; /* Arbitrary. Larger than 0xf: outside range indication */
+}
Please place the name of the function in the first column, and open
brace in it's own line.
+/* Parse a single octet from an octal (e.g. \0123) or hex (e.g. \xff)
+ c-escape until we consumed up to the maximum expected valid digits to
not
+ yield more than 255.
+ Input is the *pointer to the initial backslash; on return, it is
+ modified to point to the last valid character parsed.
+ (Handles the corner-case that \0 without following digits is a valid
escape
+ but \x is not.)
+ Returns single character in range 0..255 or -1 on error */
+
+static int consume_numeric_escape (const char **s)
+{
+ const char *begin = *s;
+ const int base = (begin[1] == '0') ? 8 : 16;
+ begin += (base == 8) ? 1 : 2; /* octal: include 0 */
+ const char *end = begin + ((base == 8) ? 4 : 2); /* max 'digit budget'
*/
+
+ int result = 0;
+ const char *it;
+ for (it = begin; it < end && *it; ++it)
+ {
+ const int digit = parse_xdigit (*it);
+ if (digit >= base)
+ break;
+ const int new_value = result * base + digit;
+ if (new_value > 0xff)
+ break;
+ result = new_value;
+ }
+ if (it == begin) return -1; /* Error: did not see any relevant digit */
+ *s = it - 1;
+ return result;
+}
Likewise regarding style.
Also, what about using strtol(3) instead to parse the numeric values after
\0 and \x? It supports different bases.
@@ -438,15 +478,16 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
pkl_ast_node string = PKL_PASS_NODE;
char *string_pointer = PKL_AST_STRING_POINTER (string);
char *new_string_pointer;
- char *p;
+ const char *p;
size_t string_length, i;
+ const char *errmsg = NULL;
/* Please keep this code in sync with the string printer in
pvm-val.c:pvm_print_val. */
/* First pass: calculate the size of the resulting string after
\-expansion, and report errors in the contents of the string. */
- for (p = string_pointer, string_length = 0; *p != '\0'; ++p)
+ for (p = string_pointer, string_length = 0; !errmsg && *p != '\0'; ++p)
{
if (p[0] == '\\')
{
@@ -455,18 +496,28 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
case '\\':
case 'n':
case 't':
- string_length++;
+ p++;
+ break;
+ case '0': /* octal escape */
+ case 'x': /* hex escape */
+ if (consume_numeric_escape (&p) < 0)
+ errmsg = "Invalid \\x-escape without digit.";
break;
+
default:
- pkl_error (PKL_PASS_AST, PKL_AST_LOC (string),
- "invalid \\%c sequence in string", p[1]);
- PKL_TRANS_PAYLOAD->errors++;
- PKL_PASS_ERROR;
+ errmsg = "Invalid \\-sequence in string.";
}
p++;
}
- else
- string_length++;
+
+ string_length++;
+ }
+
+ if (errmsg)
+ {
+ pkl_error (PKL_PASS_AST, PKL_AST_LOC (string), errmsg);
+ PKL_TRANS_PAYLOAD->errors++;
+ PKL_PASS_ERROR;
}
/* Second pass: compose the new string. */
@@ -478,13 +529,16 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
{
switch (p[1])
{
- case '\\': new_string_pointer[i] = '\\'; break;
- case 'n': new_string_pointer[i] = '\n'; break;
- case 't': new_string_pointer[i] = '\t'; break;
+ case '\\': new_string_pointer[i] = '\\'; p++; break;
+ case 'n': new_string_pointer[i] = '\n'; p++; break;
+ case 't': new_string_pointer[i] = '\t'; p++; break;
+ case 'x':
+ case '0':
+ new_string_pointer[i] = consume_numeric_escape (&p);
+ break;
default:
assert (0);
}
- p++;
}
else
new_string_pointer[i] = p[0];
On a side note, I think it would be good to include some tests for \0,
\x and the previously supported escapes in strings. I would place them
in the poke.pkl testsuite, named as strings-1.pk, strings-2.pk, etc. I
would suggest something like:
---->---- testsuite/poke.pkl/strings-1.pk ---->----
/* { dg-do run } */
/* { dg-command { "foo\012bar" } } */
/* { dg-output "foo\nbar" } */
----<---- testsuite/poke.pkl/strings-1.pk ----<----
also it would be good to have diagnostics tests in strings-diag-N.pk,
like:
---->---- testsuite/poke.pkl/strings-diag-1.pk ---->----
/* { dg-do compile } */
defvar foo = "foo\jbar"; /* { dg-error "Invalid.*sequence" } */
----<---- testsuite/poke.pkl/strings-diag-1.pk ----<----