poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] Provide octal (e.g. \0123) and hex (e.g. \x1f) c-escape


From: Jose E. Marchesi
Subject: Re: [PATCH] Provide octal (e.g. \0123) and hex (e.g. \x1f) c-escape
Date: Fri, 04 Oct 2019 16:19:49 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Hi Henner.

    This is a patch to ad c-escape parsing for bytes written as octal or hex in
    C-strings.
    
    It is a somewhat in a more C99 style, so using const where possible, and
    declaring variables where first needed to add clarity,
    
    Let me know what you think

Thanks for the patch.  Please find some comments below.

    Cheers,
      -henner
    From 737474fc21f8f63a5876999b9957ab99c362ff7f Mon Sep 17 00:00:00 2001
    From: Henner Zeller <address@hidden>
    Date: Thu, 3 Oct 2019 20:16:26 -0700
    Subject: [PATCH] Provide octal (e.g. \0123) and hex (e.g. \x1f) c-escape
     string parsing.
    
    2019-10-03  Henner Zeller  <address@hidden>
           * src/pkl-trans.c: extend pkl_trans1_ps_string to parse numeric
           c-escape characters.
    ---
     src/pkl-trans.c | 80 +++++++++++++++++++++++++++++++++++++++++--------
     1 file changed, 67 insertions(+), 13 deletions(-)
    
    diff --git a/src/pkl-trans.c b/src/pkl-trans.c
    index 2b3e26d..8ffa8d6 100644
    --- a/src/pkl-trans.c
    +++ b/src/pkl-trans.c
    @@ -430,6 +430,46 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_var)
     }
     PKL_PHASE_END_HANDLER
     
    +static int parse_xdigit (char v) {
    +  if (v >= '0' && v <= '9') return v - '0';
    +  if (v >= 'A' && v <= 'F') return v - 'A' + 10;
    +  if (v >= 'a' && v <= 'f') return v - 'a' + 10;
    +  return 0xff;  /* Arbitrary. Larger than 0xf: outside range indication */
    +}

Please place the name of the function in the first column, and open
brace in it's own line.

    +/* Parse a single octet from an octal (e.g. \0123) or hex (e.g. \xff)
    +   c-escape until we consumed up to the maximum expected valid digits to 
not
    +   yield more than 255.
    +   Input is the *pointer to the initial backslash; on return, it is
    +   modified to point to the last valid character parsed.
    +   (Handles the corner-case that \0 without following digits is a valid 
escape
    +   but \x is not.)
    +   Returns single character in range 0..255 or -1 on error */
    +
    +static int consume_numeric_escape (const char **s)
    +{
    +  const char *begin = *s;
    +  const int base = (begin[1] == '0') ? 8 : 16;
    +  begin += (base == 8) ? 1 : 2;                     /* octal: include 0 */
    +  const char *end = begin + ((base == 8) ? 4 : 2);  /* max 'digit budget' 
*/
    +
    +  int result = 0;
    +  const char *it;
    +  for (it = begin; it < end && *it; ++it)
    +    {
    +      const int digit = parse_xdigit (*it);
    +      if (digit >= base)
    +        break;
    +      const int new_value = result * base + digit;
    +      if (new_value > 0xff)
    +        break;
    +      result = new_value;
    +    }
    +  if (it == begin) return -1;  /* Error: did not see any relevant digit */
    +  *s = it - 1;
    +  return result;
    +}

Likewise regarding style.

Also, what about using strtol(3) instead to parse the numeric values after
\0 and \x?  It supports different bases.

    @@ -438,15 +478,16 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
       pkl_ast_node string = PKL_PASS_NODE;
       char *string_pointer = PKL_AST_STRING_POINTER (string);
       char *new_string_pointer;
    -  char *p;
    +  const char *p;
       size_t string_length, i;
    +  const char *errmsg = NULL;
     
       /* Please keep this code in sync with the string printer in
          pvm-val.c:pvm_print_val.  */
     
       /* First pass: calculate the size of the resulting string after
          \-expansion, and report errors in the contents of the string.  */
    -  for (p = string_pointer, string_length = 0; *p != '\0'; ++p)
    +  for (p = string_pointer, string_length = 0; !errmsg && *p != '\0'; ++p)
         {
           if (p[0] == '\\')
             {
    @@ -455,18 +496,28 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
                 case '\\':
                 case 'n':
                 case 't':
    -              string_length++;
    +              p++;
    +              break;
    +            case '0':   /* octal escape */
    +            case 'x':   /* hex escape */
    +              if (consume_numeric_escape (&p) < 0)
    +                errmsg = "Invalid \\x-escape without digit.";
                   break;
    +
                 default:
    -              pkl_error (PKL_PASS_AST, PKL_AST_LOC (string),
    -                         "invalid \\%c sequence in string", p[1]);
    -              PKL_TRANS_PAYLOAD->errors++;
    -              PKL_PASS_ERROR;
    +              errmsg = "Invalid \\-sequence in string.";
                 }
               p++;
             }
    -      else
    -        string_length++;
    +
    +      string_length++;
    +    }
    +
    +  if (errmsg)
    +    {
    +      pkl_error (PKL_PASS_AST, PKL_AST_LOC (string), errmsg);
    +      PKL_TRANS_PAYLOAD->errors++;
    +      PKL_PASS_ERROR;
         }
     
       /* Second pass: compose the new string.  */
    @@ -478,13 +529,16 @@ PKL_PHASE_BEGIN_HANDLER (pkl_trans1_ps_string)
             {
               switch (p[1])
                 {
    -            case '\\': new_string_pointer[i] = '\\'; break;
    -            case 'n':  new_string_pointer[i] = '\n'; break;
    -            case 't':  new_string_pointer[i] = '\t'; break;
    +            case '\\': new_string_pointer[i] = '\\'; p++; break;
    +            case 'n':  new_string_pointer[i] = '\n'; p++; break;
    +            case 't':  new_string_pointer[i] = '\t'; p++; break;
    +            case 'x':
    +            case '0':
    +              new_string_pointer[i] = consume_numeric_escape (&p);
    +              break;
                 default:
                   assert (0);
                 }
    -          p++;
             }
           else
             new_string_pointer[i] = p[0];

On a side note, I think it would be good to include some tests for \0,
\x and the previously supported escapes in strings.  I would place them
in the poke.pkl testsuite, named as strings-1.pk, strings-2.pk, etc.  I
would suggest something like:

---->---- testsuite/poke.pkl/strings-1.pk ---->----
/* { dg-do run } */

/* { dg-command { "foo\012bar" } } */
/* { dg-output "foo\nbar" } */
----<---- testsuite/poke.pkl/strings-1.pk ----<----

also it would be good to have diagnostics tests in strings-diag-N.pk,
like:

---->---- testsuite/poke.pkl/strings-diag-1.pk ---->----
/* { dg-do compile } */

defvar foo = "foo\jbar"; /* { dg-error "Invalid.*sequence" } */
----<---- testsuite/poke.pkl/strings-diag-1.pk ----<----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]