bug-m4
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: HEAD: inclusion order wrong for input.c


From: Gary V. Vaughan
Subject: Re: HEAD: inclusion order wrong for input.c
Date: Tue, 3 Apr 2007 10:44:25 +0100

Hi Eric,

On 3 Apr 2007, at 03:53, Eric Blake wrote:
According to Gary V. Vaughan on 4/2/2007 4:37 PM:
Cast the subscript to unsigned char before using it as index.
Otherwise, on a system where char is signed, and its high bit is set, and you haven't adjusted the array range to allow for negative values,
fun will ensue.

If the table value for META-^A is held at element 128 of the array (since the table was built assuming char* is unsigned by default), and we compile on a host with signed chars, does the signed char value of META-^A still become 128 when cast to unsigned char? Or does 2's complement come into
play and scramble the order of the negative signed char values when
casting them before doing a table lookup?

As long as the table is handled consistently (in other words, as long as
ALL uses of characters as indices occur as unsigned char or within
to_uchar), then META-^A (usually encoded as -128 in signed char) will
always appear at the same index, regardless of whether that index is 128 (as it will be on 2's complement machine; the bulk of what exists today),
or 255 (which is what (unsigned char) -128 might become on a 1's
compliment machine, mostly theoretical). You only run into the bug that
you were describing if you also reference the array based on a given
integer encoding of characters.

My point exactly.  Here's a violation of that consistency in syntax.c:

   109  m4_syntax_table *
   110  m4_syntax_create (void)
   111  {
   112    m4_syntax_table *syntax = xzalloc (sizeof *syntax);
   113    int ch;
   114
115 /* Set up default table. This table never changes during operation. */
   116    for (ch = 256; --ch >= 0;)
   117      switch (ch)
   118        {
   119        case '(':
   120          syntax->orig[ch] = M4_SYNTAX_OPEN;
   121          break;

In this case, we let a possibly signed literal char self promote
to an int, but assume that those values with the high bit set will map
correctly when manually fed through to_uchar when we do lookups in that
table.

In practice, we don't have any case statements for high-bit-set chars
inside the switch, so it hasn't caught us out.  Even so, with portable
defensive coding style, it seems better to use the same method of
dereferencing indices when building the table as when looking up entries
in it... I've probably made this same bad assumption in a few other
places where I wrote code to do table lookups for char values :-(

Cheers,
        Gary
--
  ())_.              Email me: address@hidden
  ( '/           Read my blog: http://blog.azazil.net
  / )=         ...and my book: http://sources.redhat.com/autobook
`(_~)_ Join my AGLOCO Network: http://www.agloco.com/r/BBBS7912




Attachment: PGP.sig
Description: This is a digitally signed message part


reply via email to

[Prev in Thread] Current Thread [Next in Thread]