bug-m4
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: format bug


From: Eric Blake
Subject: Re: format bug
Date: Tue, 29 May 2007 07:03:07 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.10) Gecko/20070221 Thunderbird/1.5.0.10 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 5/28/2007 10:15 PM:
> Try this for a fun time:
> 
> $ echo 'format(%*.*d,-1,-1,1)' | m4 | wc
>       1       1 2280281
> 
> Oops - that was 2 million+ characters that I wasn't expecting!  Patch
> coming up later.  I don't think the bug can be exploited to run arbitrary
> code, but executing printf without enough arguments is never a good idea.

Fixed on cygwin with this (some of it borrowed from head, the rest needs
porting to head):

2007-05-29  Eric Blake  <address@hidden>

        Improve format support.
        * m4/gnulib-cache.m4: Augment with 'gnulib-tool --import
        vasprintf-posix'.
        * src/format.c (format): Parse %'hhd, %a, %A.  Avoid calling
        printf with too few arguments, as in format(%*.*d,-1,-1,1).
        * doc/m4.texinfo (Format): Expand tests, and improve
        documentation.
        * NEWS: Document this change.

Still broken on mingw, since mingw has a broken atof.  Also, the gnulib
stdint and unistd modules are failing on mingw.  I'm thinking of cutting
m4 1.4.9b today, because of the extensive gnulib changes, to hopefully get
a bit more of a shakedown from random platforms before going with the
official m4 1.4.10.

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGXCSK84KuGfSFAYARAi4rAKCkN0B32VIYp9wZmweL7kat+eGnKwCdGcB/
6YCw4TUMzEEndU96jajsKwI=
=WTbs
-----END PGP SIGNATURE-----
Index: NEWS
===================================================================
RCS file: /sources/m4/m4/NEWS,v
retrieving revision 1.1.1.1.2.102
diff -u -p -r1.1.1.1.2.102 NEWS
--- NEWS        28 May 2007 20:24:27 -0000      1.1.1.1.2.102
+++ NEWS        29 May 2007 12:31:05 -0000
@@ -13,6 +13,9 @@ Version 1.4.10 - ?? ??? 2007, by ????  (
 * Work around a number of corner-case POSIX compliance bugs in various
   broken stdio libraries.  In particular, the `syscmd' builtin behaves
   more predictably when stdin is seekable.
+* The `format' builtin now understands formats such as %a, %A, and %'hhd,
+  and works around a number of platform printf bugs.  Furthermore, the
+  sequence format(%*.*d,-1,-1,1) no longer outputs random data.
 
 Version 1.4.9 - 23 Mar 2007, by Eric Blake  (CVS version 1.4.8c)
 
Index: m4/gnulib-cache.m4
===================================================================
RCS file: /sources/m4/m4/m4/Attic/gnulib-cache.m4,v
retrieving revision 1.1.2.26
diff -u -p -r1.1.2.26 gnulib-cache.m4
--- m4/gnulib-cache.m4  28 May 2007 17:07:04 -0000      1.1.2.26
+++ m4/gnulib-cache.m4  29 May 2007 12:31:05 -0000
@@ -15,11 +15,11 @@
 
 
 # Specification in the form of a command-line invocation:
-#   gnulib-tool --import --dir=. --lib=libm4 --source-base=lib --m4-base=m4 
--doc-base=doc --aux-dir=. --with-tests --no-libtool --macro-prefix=M4 
avltree-oset binary-io clean-temp cloexec close-stream closein config-h error 
fdl fflush fopen-safer free fseeko gendocs getopt gnupload mkstemp obstack 
regex stdbool stdint stdlib-safer strtol unlocked-io verror version-etc-fsf 
xalloc xvasprintf
+#   gnulib-tool --import --dir=. --lib=libm4 --source-base=lib --m4-base=m4 
--doc-base=doc --aux-dir=. --with-tests --no-libtool --macro-prefix=M4 
avltree-oset binary-io clean-temp cloexec close-stream closein config-h error 
fdl fflush fopen-safer free fseeko gendocs getopt gnupload mkstemp obstack 
regex stdbool stdint stdlib-safer strtol unlocked-io vasprintf-posix verror 
version-etc-fsf xalloc xvasprintf
 
 # Specification in the form of a few gnulib-tool.m4 macro invocations:
 gl_LOCAL_DIR([])
-gl_MODULES([avltree-oset binary-io clean-temp cloexec close-stream closein 
config-h error fdl fflush fopen-safer free fseeko gendocs getopt gnupload 
mkstemp obstack regex stdbool stdint stdlib-safer strtol unlocked-io verror 
version-etc-fsf xalloc xvasprintf])
+gl_MODULES([avltree-oset binary-io clean-temp cloexec close-stream closein 
config-h error fdl fflush fopen-safer free fseeko gendocs getopt gnupload 
mkstemp obstack regex stdbool stdint stdlib-safer strtol unlocked-io 
vasprintf-posix verror version-etc-fsf xalloc xvasprintf])
 gl_AVOID([])
 gl_SOURCE_BASE([lib])
 gl_M4_BASE([m4])
Index: doc/m4.texinfo
===================================================================
RCS file: /sources/m4/m4/doc/m4.texinfo,v
retrieving revision 1.1.1.1.2.126
diff -u -p -r1.1.1.1.2.126 m4.texinfo
--- doc/m4.texinfo      25 May 2007 20:29:17 -0000      1.1.1.1.2.126
+++ doc/m4.texinfo      29 May 2007 12:31:05 -0000
@@ -4887,15 +4887,27 @@ The macro @code{format} is recognized on
 
 Its use is best described by a few examples:
 
address@hidden This test is a bit fragile, if someone tries to port to a
address@hidden platform without infinity.
 @example
 define(`foo', `The brown fox jumped over the lazy dog')
 @result{}
 format(`The string "%s" uses %d characters', foo, len(foo))
 @result{}The string "The brown fox jumped over the lazy dog" uses 38 characters
+format(`%*.*d', `-1', `-1', `1')
address@hidden
 format(`%.0f', `56789.9876')
 @result{}56790
-len(format(`%-*X', `300', `1'))
address@hidden
+len(format(`%-*X', `5000', `1'))
address@hidden
+ifelse(format(`%010F', `infinity'), `       INF', `success',
+       format(`%010F', `infinity'), `  INFINITY', `success',
+       `fail')
address@hidden
+ifelse(format(`%.1A', `1.999'), `0X1.0P+1', `success',
+       format(`%.1A', `1.999'), `0X2.0P+0', `success',
+       `fail')
address@hidden
 @end example
 
 Using the @code{forloop} macro defined earlier (@pxref{Forloop}), this
@@ -4921,18 +4933,32 @@ forloop(`i', `1', `10', `format(`%6d squ
 @end example
 
 The builtin @code{format} is modeled after the ANSI C @samp{printf}
-function, and supports these @samp{%} specifiers: @samp{c},
address@hidden, @samp{d}, @samp{o}, @samp{x}, @samp{X}, @samp{u}, @samp{e},
address@hidden, @samp{f}, @samp{F}, @samp{g}, @samp{G}, and @samp{%}; it
-supports field widths and precisions, and the
-modifiers @samp{+}, @samp{-}, @address@hidden }}, @samp{0}, @samp{#}, @samp{h} 
and
address@hidden  For more details on the functioning of @code{printf}, see the
-C Library Manual.
-
-For now, unrecognized specifiers are silently ignored, but it is
-anticipated that a future release of @acronym{GNU} @code{m4} will support more
-specifiers, and give warnings when problems are encountered.  Likewise,
-escape sequences are not yet recognized.
+function, and supports these @samp{%} specifiers: @samp{c}, @samp{s},
address@hidden, @samp{o}, @samp{x}, @samp{X}, @samp{u}, @samp{a}, @samp{A},
address@hidden, @samp{E}, @samp{f}, @samp{F}, @samp{g}, @samp{G}, and
address@hidden; it supports field widths and precisions, and the flags
address@hidden, @samp{-}, @samp{ }, @samp{0}, @samp{#}, and @samp{'}.  For
+integer specifiers, the width modifiers @samp{hh}, @samp{h}, and
address@hidden are recognized, and for floating point specifiers, the width
+modifier @samp{l} is recognized.  Items not yet supported include
+positional arguments, the @samp{n}, @samp{p}, @samp{S}, and @samp{C}
+specifiers, the @samp{z}, @samp{t}, @samp{j}, @samp{L} and @samp{ll}
+modifiers, and any platform extensions available in the native
address@hidden  For more details on the functioning of @code{printf},
+see the C Library Manual, or the @acronym{POSIX} specification (for
+example, @samp{%a} is supported even on platforms that haven't yet
+implemented C99 hexadecimal floating point output natively).
+
+Unrecognized specifiers result in a warning.  It is anticipated that a
+future release of @acronym{GNU} @code{m4} will support more specifiers,
+and give better warnings when various problems such as overflow are
+encountered.  Likewise, escape sequences are not yet recognized.
+
address@hidden
+format(`%p', `0')
address@hidden:stdin:1: Warning: unrecognized specifier in `%p'
address@hidden
address@hidden example
 
 @node Arithmetic
 @chapter Macros for doing arithmetic
Index: src/format.c
===================================================================
RCS file: /sources/m4/m4/src/Attic/format.c,v
retrieving revision 1.1.1.1.2.6
diff -u -p -r1.1.1.1.2.6 format.c
--- src/format.c        26 Oct 2006 21:11:56 -0000      1.1.1.1.2.6
+++ src/format.c        29 May 2007 12:31:05 -0000
@@ -1,6 +1,6 @@
 /* GNU m4 -- A simple macro processor
 
-   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006
+   Copyright (C) 1989, 1990, 1991, 1992, 1993, 1994, 2006, 2007
    Free Software Foundation, Inc.
 
    This program is free software; you can redistribute it and/or modify
@@ -24,24 +24,17 @@
 #include "m4.h"
 #include "xvasprintf.h"
 
-/* Simple varargs substitute.  */
+/* Simple varargs substitute.  We assume int and unsigned int are the
+   same size; likewise for long and unsigned long.  */
 
 #define ARG_INT(argc, argv) \
        ((argc == 0) ? 0 : \
         (--argc, argv++, atoi (TOKEN_DATA_TEXT (argv[-1]))))
 
-#define ARG_UINT(argc, argv) \
-       ((argc == 0) ? 0 : \
-        (--argc, argv++, (unsigned int) atoi (TOKEN_DATA_TEXT (argv[-1]))))
-
 #define ARG_LONG(argc, argv) \
        ((argc == 0) ? 0 : \
         (--argc, argv++, atol (TOKEN_DATA_TEXT (argv[-1]))))
 
-#define ARG_ULONG(argc, argv) \
-       ((argc == 0) ? 0 : \
-        (--argc, argv++, (unsigned long) atol (TOKEN_DATA_TEXT (argv[-1]))))
-
 #define ARG_STR(argc, argv) \
        ((argc == 0) ? "" : \
         (--argc, argv++, TOKEN_DATA_TEXT (argv[-1])))
@@ -51,44 +44,60 @@
         (--argc, argv++, atof (TOKEN_DATA_TEXT (argv[-1]))))
 
 
-/*------------------------------------------------------------------------.
-| The main formatting function.  Output is placed on the obstack OBS, the |
-| first argument in ARGV is the formatting string, and the rest is       |
-| arguments for the string.                                              |
-`------------------------------------------------------------------------*/
+/*------------------------------------------------------------------.
+| The main formatting function.  Output is placed on the obstack    |
+| OBS, the first argument in ARGV is the formatting string, and the |
+| rest is arguments for the string.  Warn rather than invoke       |
+| unspecified behavior in the underlying printf when we do not     |
+| recognize a format.                                              |
+`------------------------------------------------------------------*/
 
 void
 format (struct obstack *obs, int argc, token_data **argv)
 {
-  char *fmt;                   /* format control string */
-  const char *fstart;          /* beginning of current format spec */
-  int c;                       /* a simple character */
+  const char *f;                       /* format control string */
+  const char *fmt;                     /* position within f */
+  char fstart[] = "%'+- 0#*.*hhd";     /* current format spec */
+  char *p;                             /* position within fstart */
+  unsigned char c;                     /* a simple character */
 
   /* Flags.  */
-  char flags;                  /* 1 iff treating flags */
+  char flags;                          /* flags to use in fstart */
+  enum {
+    THOUSANDS  = 0x01, /* ' */
+    PLUS       = 0x02, /* + */
+    MINUS      = 0x04, /* - */
+    SPACE      = 0x08, /*   */
+    ZERO       = 0x10, /* 0 */
+    ALT                = 0x20, /* # */
+    DONE       = 0x40  /* no more flags */
+  };
 
   /* Precision specifiers.  */
   int width;                   /* minimum field width */
   int prec;                    /* precision */
   char lflag;                  /* long flag */
-  char hflag;                  /* short flag */
+
+  /* Specifiers we are willing to accept.  ok['x'] implies %x is ok.
+     Various modifiers reduce the set, in order to avoid undefined
+     behavior in printf.  */
+  char ok[128];
 
   /* Buffer and stuff.  */
   char *str;                   /* malloc'd buffer of formatted text */
-  enum {INT, UINT, LONG, ULONG, DOUBLE, STR} datatype;
+  enum {INT, LONG, DOUBLE, STR} datatype;
 
-  fmt = (char *) ARG_STR (argc, argv);
+  f = fmt = (const char *) ARG_STR (argc, argv);
+  memset (ok, 0, sizeof ok);
   for (;;)
     {
       while ((c = *fmt++) != '%')
        {
-         if (c == 0)
+         if (c == '\0')
            return;
          obstack_1grow (obs, c);
        }
 
-      fstart = fmt - 1;
-
       if (*fmt == '%')
        {
          obstack_1grow (obs, '%');
@@ -96,73 +105,145 @@ format (struct obstack *obs, int argc, t
          continue;
        }
 
+      p = fstart + 1; /* % */
+      lflag = 0;
+      ok['a'] = ok['A'] = ok['c'] = ok['d'] = ok['e'] = ok['E']
+       = ok['f'] = ok['F'] = ok['g'] = ok['G'] = ok['i'] = ok['o']
+       = ok['s'] = ok['u'] = ok['x'] = ok['X'] = 1;
+
       /* Parse flags.  */
-      flags = 1;
+      flags = 0;
       do
        {
          switch (*fmt)
            {
-           case '-':           /* left justification */
+           case '\'':          /* thousands separator */
+             ok['a'] = ok['A'] = ok['c'] = ok['e'] = ok['E']
+               = ok['o'] = ok['s'] = ok['x'] = ok['X'] = 0;
+             flags |= THOUSANDS;
+             break;
+
            case '+':           /* mandatory sign */
+             ok['c'] = ok['o'] = ok['s'] = ok['u'] = ok['x'] = ok['X'] = 0;
+             flags |= PLUS;
+             break;
+
            case ' ':           /* space instead of positive sign */
+             ok['c'] = ok['o'] = ok['s'] = ok['u'] = ok['x'] = ok['X'] = 0;
+             flags |= SPACE;
+             break;
+
            case '0':           /* zero padding */
+             ok['c'] = ok['s'] = 0;
+             flags |= ZERO;
+             break;
+
            case '#':           /* alternate output */
+             ok['c'] = ok['d'] = ok['i'] = ok['s'] = ok['u'] = 0;
+             flags |= ALT;
+             break;
+
+           case '-':           /* left justification */
+             flags |= MINUS;
              break;
 
            default:
-             flags = 0;
+             flags |= DONE;
              break;
            }
        }
-      while (flags && fmt++);
-
-      /* Minimum field width.  */
-      width = -1;
+      while (!(flags & DONE) && fmt++);
+      if (flags & THOUSANDS)
+       *p++ = '\'';
+      if (flags & PLUS)
+       *p++ = '+';
+      if (flags & MINUS)
+       *p++ = '-';
+      if (flags & SPACE)
+       *p++ = ' ';
+      if (flags & ZERO)
+       *p++ = '0';
+      if (flags & ALT)
+       *p++ = '#';
+
+      /* Minimum field width; an explicit 0 is the same as not giving
+        the width.  */
+      width = 0;
+      *p++ = '*';
       if (*fmt == '*')
        {
          width = ARG_INT (argc, argv);
          fmt++;
        }
-      else if (isdigit (to_uchar (*fmt)))
+      else
        {
-         do
+         while (isdigit (to_uchar (*fmt)))
            {
+             width = 10 * width + *fmt - '0';
              fmt++;
            }
-         while (isdigit (to_uchar (*fmt)));
        }
 
-      /* Maximum precision.  */
+      /* Maximum precision; an explicit negative precision is the same
+        as not giving the precision.  A lone '.' is a precision of 0.  */
       prec = -1;
+      *p++ = '.';
+      *p++ = '*';
       if (*fmt == '.')
        {
+         ok['c'] = 0;
          if (*(++fmt) == '*')
            {
              prec = ARG_INT (argc, argv);
              ++fmt;
            }
-         else if (isdigit (to_uchar (*fmt)))
+         else
            {
-             do
+             prec = 0;
+             while (isdigit (to_uchar (*fmt)))
                {
+                 prec = 10 * prec + *fmt - '0';
                  fmt++;
                }
-             while (isdigit (to_uchar (*fmt)));
            }
        }
 
-      /* Length modifiers.  */
-      lflag = (*fmt == 'l');
-      hflag = (*fmt == 'h');
-      if (lflag || hflag)
-       fmt++;
-
-      switch (*fmt++)
+      /* Length modifiers.  We don't yet recognize ll, j, t, or z.  */
+      if (*fmt == 'l')
        {
+         *p++ = 'l';
+         lflag = 1;
+         fmt++;
+         ok['c'] = ok['s'] = 0;
+       }
+      else if (*fmt == 'h')
+       {
+         *p++ = 'h';
+         fmt++;
+         if (*fmt == 'h')
+           {
+             *p++ = 'h';
+             fmt++;
+           }
+         ok['a'] = ok['A'] = ok['c'] = ok['e'] = ok['E'] = ok['f'] = ok['F']
+           = ok['g'] = ok['G'] = ok['s'] = 0;
+       }
 
-       case '\0':
-         return;
+      c = *fmt++;
+      if (c > sizeof ok || !ok[c])
+       {
+         M4ERROR ((warning_status, 0,
+                   "Warning: unrecognized specifier in `%s'", f));
+         if (c == '\0')
+           fmt--;
+         continue;
+       }
+      *p++ = c;
+      *p = '\0';
 
+      /* Specifiers.  We don't yet recognize C, S, n, or p.  */
+      switch (c)
+       {
        case 'c':
          datatype = INT;
          break;
@@ -173,30 +254,15 @@ format (struct obstack *obs, int argc, t
 
        case 'd':
        case 'i':
-         if (lflag)
-           {
-             datatype = LONG;
-           }
-         else
-           {
-             datatype = INT;
-           }
-         break;
-
        case 'o':
        case 'x':
        case 'X':
        case 'u':
-         if (lflag)
-           {
-             datatype = ULONG;
-           }
-         else
-           {
-             datatype = UINT;
-           }
+         datatype = lflag ? LONG : INT;
          break;
 
+       case 'a':
+       case 'A':
        case 'e':
        case 'E':
        case 'f':
@@ -207,86 +273,31 @@ format (struct obstack *obs, int argc, t
          break;
 
        default:
-         continue;
+         abort ();
        }
 
-      c = *fmt;
-      *fmt = '\0';
-
       switch(datatype)
        {
        case INT:
-         if (width != -1 && prec != -1)
-           str = xasprintf (fstart, width, prec, ARG_INT(argc, argv));
-         else if (width != -1)
-           str = xasprintf (fstart, width, ARG_INT(argc, argv));
-         else if (prec != -1)
-           str = xasprintf (fstart, prec, ARG_INT(argc, argv));
-         else
-           str = xasprintf (fstart, ARG_INT(argc, argv));
-         break;
-
-       case UINT:
-         if (width != -1 && prec != -1)
-           str = xasprintf (fstart, width, prec, ARG_UINT(argc, argv));
-         else if (width != -1)
-           str = xasprintf (fstart, width, ARG_UINT(argc, argv));
-         else if (prec != -1)
-           str = xasprintf (fstart, prec, ARG_UINT(argc, argv));
-         else
-           str = xasprintf (fstart, ARG_UINT(argc, argv));
+         str = xasprintf (fstart, width, prec, ARG_INT(argc, argv));
          break;
 
        case LONG:
-         if (width != -1 && prec != -1)
-           str = xasprintf (fstart, width, prec, ARG_LONG(argc, argv));
-         else if (width != -1)
-           str = xasprintf (fstart, width, ARG_LONG(argc, argv));
-         else if (prec != -1)
-           str = xasprintf (fstart, prec, ARG_LONG(argc, argv));
-         else
-           str = xasprintf (fstart, ARG_LONG(argc, argv));
-         break;
-
-       case ULONG:
-         if (width != -1 && prec != -1)
-           str = xasprintf (fstart, width, prec, ARG_ULONG(argc, argv));
-         else if (width != -1)
-           str = xasprintf (fstart, width, ARG_ULONG(argc, argv));
-         else if (prec != -1)
-           str = xasprintf (fstart, prec, ARG_ULONG(argc, argv));
-         else
-           str = xasprintf (fstart, ARG_ULONG(argc, argv));
+         str = xasprintf (fstart, width, prec, ARG_LONG(argc, argv));
          break;
 
        case DOUBLE:
-         if (width != -1 && prec != -1)
-           str = xasprintf (fstart, width, prec, ARG_DOUBLE(argc, argv));
-         else if (width != -1)
-           str = xasprintf (fstart, width, ARG_DOUBLE(argc, argv));
-         else if (prec != -1)
-           str = xasprintf (fstart, prec, ARG_DOUBLE(argc, argv));
-         else
-           str = xasprintf (fstart, ARG_DOUBLE(argc, argv));
+         str = xasprintf (fstart, width, prec, ARG_DOUBLE(argc, argv));
          break;
 
        case STR:
-         if (width != -1 && prec != -1)
-           str = xasprintf (fstart, width, prec, ARG_STR(argc, argv));
-         else if (width != -1)
-           str = xasprintf (fstart, width, ARG_STR(argc, argv));
-         else if (prec != -1)
-           str = xasprintf (fstart, prec, ARG_STR(argc, argv));
-         else
-           str = xasprintf (fstart, ARG_STR(argc, argv));
+         str = xasprintf (fstart, width, prec, ARG_STR(argc, argv));
          break;
 
        default:
          abort();
        }
 
-      *fmt = c;
-
       /* NULL was returned on failure, such as invalid format string.  For
         now, just silently ignore that bad specifier.  */
       if (str == NULL)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]