[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Tinycc-devel] parsing 0x1e+1 as 0x1e +1
From: |
Vincent Lefevre |
Subject: |
Re: [Tinycc-devel] parsing 0x1e+1 as 0x1e +1 |
Date: |
Wed, 27 Apr 2016 17:50:22 +0200 |
User-agent: |
Mutt/1.6.0-6623-vl-r87826 (2016-04-14) |
On 2016-04-27 18:21:11 +0300, Sergey Korshunoff wrote:
> > CompCert 2.4 outputs 0x10 0x1e (following its interpretation of 6.4.8)
> > though if the user expects a subtraction in both cases, he probably
> > expects that E yields the same value in both cases
>
> pcc outputs 0x10 0x1e too.
Note that this is fixed in CompCert 2.5, as the authors now agree that
the previous interpretation was wrong.
> But tcc with first patch outputs 0x10 0x1d (as user expects)
This may be the most intuitive behavior, but not conforming to the
ISO C standard. 0x1e-E is a preprocessing token (Clause 6.4.8), more
precisely a pp-number, and "each preprocessing token is converted
into a token" (Clause 5.1.1.2). Since the token is invalid, one gets
an error. Initially, CompCert didn't take Clause 5.1.1.2 into account
here: the pp-number 0x1e-E was further parsed (*after* preprocessing)
into 3 tokens 0x1e, - and E; hence the result 0x1e.
One may find the rules for pp-number awkward, but they have been
designed on purpose. According to the C rationale:
The notion of preprocessing numbers was introduced to simplify the
description of preprocessing. It provides a means of talking about
the tokenization of strings that look like numbers, or initial
substrings of numbers, prior to their semantic interpretation. In
the interests of keeping the description simple, occasional spurious
forms are scanned as preprocessing numbers. For example, 0x123E+1 is
a single token under the rules. The C89 Committee felt that it was
better to tolerate such anomalies than burden the preprocessor with
a more exact, and exacting, lexical specification. It felt that this
anomaly was no worse than the principle under which the characters
a+++++b are tokenized as a ++ ++ + b (an invalid expression), even
though the tokenization a ++ + ++ b would yield a syntactically
correct expression. In both cases, exercise of reasonable precaution
in coding style avoids surprises.
and it is also important that things like 1m be seen as a single
preprocessing token (thus a pp-number) so that the following code
yields an identifier:
#define mkident(s) s ## 1m
int mkident(int) = 0;
--
Vincent Lefèvre <address@hidden> - Web: <https://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <https://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)