[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sizeof in Macros. (Was: Specifying dependencies more clearly)
From: |
Alejandro Colomar |
Subject: |
Re: sizeof in Macros. (Was: Specifying dependencies more clearly) |
Date: |
Tue, 15 Nov 2022 13:56:45 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 |
Hi Ralph,
As you noticed, if you don't address me in To or Cc, I may not notice the email.
I try to, but sometimes I miss it. Sorry.
On 11/10/22 10:31, Ralph Corderoy wrote:
Howdy Alejandro,
Okay, here we go for a rant.
Consider the cost of lost opportunities.
Since I wrote the code from memory, I had a few typos, but the idea
was there...
typedef struct {
size_t length;
u_char *start;
} str_t;
#define length(s) (sizeof(s) - 1)
#define str_set(str, text) do \
{ \
(str)->length = length(test); \
(str)->start = (u_char *) text; \
} while (0)
And s/test/text/.
The lack of parenthesis around ‘text’ in the assignment to ‘start’ looks
wrong.
Yeah, I had a few typos. That's common in me when I write from memory, and
don't have a compiler to remind me of stupid bugs. :)
Of course, cowboy programmers don't need terminating NUL bytes
The Unix filesystem didn't terminate full-length filenames in the
sixteen-byte directory entries; two for the inode number, fourteen bytes
for the filename.
utmp(5) still has a lot of entries that don't have a NUL terminator, which is
the reason I wrote this:
<https://github.com/shadow-maint/shadow/pull/569/files#diff-12b560bab6b4fb8f7f3a16f01aaa994de539a8bed3058c976be0daebe16405c1>
<https://github.com/shadow-maint/shadow/pull/569>
Not using terminating NULs allows substrings to be sliced from the
original without copying the bytes.
Certainly, it has it's benefits, but it better be accompanied of some way to get
the compiler on your side and warn you before you write some important bugs.
And the use of that macro is typically for things like:
str_t str;
str_set(&str, "foo");
And then some new programmer in the team writes a line of code
that's something like:
str_set(&str, cond ? "someword" : "another");
Then testing reveals some issue if cond is true. You see only
"somewor"; hmm.
...
I quickly realize that it is due to the ternary operator decaying
the array into a pointer and sizeof later doing shit.
...
My patch just changes
#define length(s) (sizeof(s) - 1)
to:
#define length(s) (nitems(s) - 1)
(nitems() is defined to be the obvious sizeof division (called
ARRAY_SIZE(9) in the Linux kernel)
The bug is the original macro str_set() doesn't document its ‘text’
parameter must be a string constant.
I disagree. Comments tend to be obsolete from the moment they are written. My
code usually has quite good documentation in the commit message that adds the
macro[1], and if I need to fix the macro, I take some time to make sure the
commit message also documents it again completely if the changes are big enough
that it's necessary to refresh it, and then the code goes plain, with no
comments, but with plenty of features that static analyzers and compilers can
use to warn me.
Macros I have to hand are more
explicit: ‘string constant s’.
/* DIM gives the number of elements in the one-dimensional array a. */
That is obvious, I'd rather remove the comment.
#define DIM(a) (sizeof (a) / sizeof (*(a)))
/* LEN gives the strlen() of string constant s, excluding the
* terminating NUL. */
That is obvious, and moreover, the compiler can't enforce it. Rather, I suggest
the following.
#define LEN(s) (sizeof (s) - 1)
[1]: This is written from today, since nitems() wasn't readable enough in that
context, it seems.
/*
* SYNOPSIS
* size_t sizeof_array(array);
*
* ARGUMENTS
* array Array to be sized.
*
* DESCRIPTION
* Measure the size of an array --equivalent to sizeof(array)--.
* This macro has the benefit that it triggers the warning
* '-Wsizeof-pointer-div' when the argument is a pointer instead
* of an array, which can cause subtle bugs, sometimes impossible
* to detect through tests.
*
* EXAMPLES
* int *p;
* int a[1];
* size_t sp, sa;
*
* sp = sizeof_array(p); // Warning: -Wsizeof-pointer-div
* sa = sizeof_array(a); // OK
*
* SEE ALSO
*
<https://stackoverflow.com/questions/37538/how-do-i-determine-the-size-of-my-array-in-c/57537491#57537491>
*/
The above comment is really in the commit message, not in the code.
#define nitems(a) (sizeof (a) / sizeof (*(a)))
#define sizeof_array(a) (sizeof(a) + (0 * nxt_nitems(a)))
And then, the change for the infamous length() macro:
-#define length(s) (sizeof(s) - 1)
+#define length(s) (nxt_sizeof_array(s) - 1)
By passing a char pointer on what's presumably a 64-bit pointer machine,
P64, sizeof gives the pointer's eight bytes, one is knocked off as if
there were a terminating NUL, and the first seven bytes of "someword"
result.
Your patch switches to nitems() and your description makes me think it's
like DIM() above. That takes the eight bytes of the pointer given by
the first sizeof and divides it by one, doing nothing, as the second
sizeof gives the size of a u_char. So we still arrive at eight, knock
one off for the NUL as you write ‘nitems(s) - 1’, giving seven. So I
don't see why this wouldn't also give "somewor" instead of "someword".
How did it fix the problem?
-Wsizeof-pointer-div
It would have been caught much earlier. :)
Unless the value needs to be known at compile time, using strlen(3)
would work in all cases, save much human time at the cost of a little
accumulated machine time, and in the simple common case of a string
constant it will probably be evaluated by any optimising compiler into a
constant.
I agree. I bet this code has its origins when compilers weren't that good at
optimizing. Now that compilers also warn on the sizeof division, the sizeof
macro can be good for guaranteeing that it's a constexpr.
Cheers,
Alex
typedef struct {
size_t length;
u_char *start;
} str_t;
#define str_set(str, text) \
do { \
(str)->length = strlen(text); \
(str)->start = (u_char *)(text); \
} while (0)
- Specifying dependencies more clearly, Alejandro Colomar, 2022/11/08
- Re: Specifying dependencies more clearly, G. Branden Robinson, 2022/11/08
- Pascal rides again (was: Specifying dependencies more clearly), G. Branden Robinson, 2022/11/10
- Re: Pascal rides again (was: Specifying dependencies more clearly), Alejandro Colomar, 2022/11/10
- Re: Pascal rides again (was: Specifying dependencies more clearly), Alejandro Colomar, 2022/11/10
- Re: Pascal rides again (was: Specifying dependencies more clearly), G. Branden Robinson, 2022/11/10
- Re: Pascal rides again (was: Specifying dependencies more clearly), Dave Kemper, 2022/11/11
- Re: Pascal rides again (was: Specifying dependencies more clearly), Alejandro Colomar, 2022/11/12
- C Strings and String Literals. (Was: Pascal rides again), Ralph Corderoy, 2022/11/13
- Re: C Strings and String Literals. (Was: Pascal rides again), Larry McVoy, 2022/11/13
- Re: C Strings and String Literals. (Was: Pascal rides again), Alejandro Colomar, 2022/11/13
- Re: C Strings and String Literals. (Was: Pascal rides again), Alejandro Colomar, 2022/11/13