[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH 0/3] yacc: compute the best type for the state number
From: |
Paul Eggert |
Subject: |
[PATCH 0/3] yacc: compute the best type for the state number |
Date: |
Tue, 1 Oct 2019 01:35:32 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 |
On 9/29/19 11:34 AM, Akim Demaille wrote:
As a matter of fact, we used two types:
- most arrays (such as state stack, and its correspondance in the LAC
infrastructure) are using int16_t. A few "temporary" variables also
have this type.
- yystate, which is an intermediate variable, was an int.
Actually those arrays use int_fast16_t not int16_t, as C99/C11/C18 does not
require support for int16_t. It could well be more efficient for them to use
int_least16_t instead, for better caching; see below.
In my proposal below, I fuse both cases to use a single type,
yy_state_num, which is the smallest type that contains the set of
states: an unsigned type.
In other GNU applications, we've been moving away from using unsigned types due
to their confusing behavior (particularly when comparing signed vs unsigned),
and because modern tools such as 'gcc -fsanitize=undefined' can check for signed
integer overflow but (obviously) not for unsigned integer overflow. In this
newer style, it's OK to use unsigned types for bit vectors and hash values since
these typically don't involve integer comparisons and integer overflow is
typically harmless; for indexes, though, unsigned types are so error-prone that
they should be avoided.
In the past, Bison has gotten away with using uint8_fast_t and uint16_fast_t for
storing indexes, because on machines with 32-bit int (which is the minimum
supported by GNU) these types typically promote to 'int' and so avoid most of
the signed-vs-unsigned confusion. But if Bison starts using 32-bit unsigned
types it won't have this luxury since these types won't promote to 'int'. And
anyway, Bison shouldn't assume that uint8_fast_t etc. promote to 'int', because
the C standard doesn't guarantee that.
So, I suggest using signed types for indexes in Bison templates; please see the
attached patch. I haven't checked for compatibility between this and your
proposed patches, but I assume that's straightforward.
This patch does not address the int_fast16_t versus int_least16_t issue, but
that should be easy enough to fix in a followup patch.
That's where I need you Paul: do you think that, for sake of
performances, I should keep the scalar variable as an 'int', and move
to the unsigned type only for arrays? Is it ok to move to stacks of
uint8_t instead of uint16_t, or should we stick to larger types? I
can see how using small types is cache friendly, but I can also
imagine that cpus don't like small types.
For local scalar variables it doesn't matter much these days; 'int' is fine and
is well-understood, but if the index might not fit into 'int' then 'ptrdiff_t'
is needed instead.
Typically, if you have a large number of integers that all fit into 8 bits,
you're better off using int_least8_t than wider types since that decreases cache
pressure. Cache lines are typically wider than 64 bits anyway, so you're not
saving CPU cycles by using 64-bit rather than 8-bit integers.
0001-Prefer-signed-types-for-indexes-in-skeletons.patch
Description: Text Data
- [PATCH 0/3] yacc: compute the best type for the state number,
Paul Eggert <=