groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pascal rides again (was: Specifying dependencies more clearly)


From: Alejandro Colomar
Subject: Re: Pascal rides again (was: Specifying dependencies more clearly)
Date: Fri, 11 Nov 2022 03:09:51 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1

On 11/10/22 22:19, G. Branden Robinson wrote:
(Some of this is off-topic for the groff list.)

Hi Branden,


Hi Alex,

At 2022-11-08T22:05:25+0100, Alejandro Colomar wrote:
Okay, here we go for a rant.

Let's say there's some software with cowboy programmers, which has
things like:

     typedef struct {
         size_t  length;
         u_char  *start;
     } str_t;

Those who do not learn Pascal strings are condemned to reinvent them.

     #define length(s)   (sizeof(s) - 1)

     #define str_set(str, text)  do       \
     {                                    \
         (str)->length = length(test);    \
         (str)->start = (u_char *) text;  \
     }

(Of course, cowboy programmers don't need terminating NUL bytes,
that's for newbies, but that's not today's rant.)

There are some advantages to Pascal strings.  Having determination of
any string's size be O(1) is a big win over C string-scanning functions
(or loops doing repeated "*ch != '\0'" comparisons),

Yeah, I can understand that, and considering that it's a high performance server, it's even expected (I remember that Ulrich Drepper rejected any string copy functions to glibc on the grounds that programmers should just use memcpy(3) and remember the lengths). But, since sooner or later, many of those strings will be passed to some syscall or libc function, having both might make more sense. Sure, keep the size for optimizing passes, but leave a trailing NUL for making sure that you can pass the string to libc. The current implementation seems to do that _only_ if it's strictly necessary, which ends up having to check unreadable code initialized miles away from its use, and so I can't promise that there's not a buffer overrun for a given string (and I'm pretty sure there's more than one, since there have been patches in the past fixing buffer overruns).

especially when
these are used by the naïve or slack-jawed.[1]

I developed this stpecpy()[1] to fix this performance problem present in strlcpy(3). I plan to start using it (or similar ones adapted to the strings used in this project) in the project some day, when waters calm down, and my changes for adding readability and safety to the code are viewed with better eyes.

[1]: <https://software.codidact.com/posts/285946>

[...]

     str = str_set(cond ? "someword" : "another");

Right.  C doesn't _really_ have strings, except at the library level.
It has character arrays and one grain of syntactic sugar for encoding
"string literals", which should not have been called that because
whether they get the null terminator is context-dependent.

         char a[5] = "fooba";
         char *b = "bazqux";

Oh yeah, I hate that one. I wish the compiler had a flag to warn about that (maybe there is and I don't know it).


I see some Internet sources claim that C is absolutely reliable about
null-terminating such literals, but I can't agree.  The assignment to
`b` above adds a null terminator, and the one to `a` does not.  This is
the opposite of absolute reliability.  Since I foresee someone calling
me a liar for saying that, I'll grant that if you carry a long enough
list of exceptional cases for the syntax in your head, both are
predictable.  But it's simply a land mine for the everyday programmer.
Worse, the hype around C that "arrays are really pointers in disguise!1!
They're interchangeable!1!" constitutes a neon sign directing the
learner directly into the mine field.

Maybe it's because I learnt C mostly in Stackoverflow, but I learnt the distinction between arrays and pointers from day 1 (programmers there seem to have a tendency to avoid UB and keep a puritan correctness that I haven't seen so much outside of that forum), and have had the difference quite clear. They should never be considered the same thing, certainly (but syntactic sugar in function parameters is fine once you know the difference).

In fact, the enhanced VLA syntax to function prototypes, combined with an old paper that someone proposed for C2x of adding the _Lengthof() operator to C (but it's been delayed for C3x it seems), might bring some array features to pointers, such as getting the number of elements in the false array, as if it were a fat pointer.

That would be very interesting for C, and might end up having safer arrays and pointers than C++. It would be ironic, considering most of the C++ hate for C comes from pointers and arrays, from what I've seen.

[...]


"There are two fundamentally difficult problems in computer science:
cache invalidation, naming things, and off-by-one errors." -- anon.

:P

[...]

int main(void) {
   char s[] = { 'H', 'e', 'l', 'l', 'o', ',', ' ',
                'W', 'o', 'r', 'l', 'd', '!', '\n' + 0x80 };

   char *p = s;
   do {
     putchar(*p & 0x7F);
   } while (*p++ >= 0); // (*p++ < 0x80) on systems w/ unsigned chars
}

Good that computers have 64 GiB of RAM these days, rather than 64 KiB (or less). I haven't found this code, thank $DEITY.


You may notice that there's no way of encoding an empty string with this
mechanism.  That was by design.  Why would you ever point to an empty
string?  That wastes not one but TWO bytes (16-bit pointers)!

Heh, wasting one size_t and a NUL byte at the end of the string at the same time would be a shooting offense, I guess.


The author of the patch decides to completely rewrite that line even
if the bug is not really understood, and it just works after it.

Yes.  A sloppy lexicon, combined with cultural and managerial
preoccupations with "cadence" (always implicitly a higher one), manures
the ground thickly for kludges, black magic, and a habit of individual
contributors abandoning projects so that they experience no
accountability for their coding errors.  And I don't mean "punishment"
as a synonym for "accountability"--though that is a substitution typical
of hard-driving, "type A", "get 'er done" engineers and managers alike.
I mean accountability in terms of someone being able to find out _that_
they erred, and _learning_ from it, without a thick gravy of operant
conditioning ladled over it.

I love having to dict(1) every other word of your emails. Nice English lessons. 8-)


God forbid we have _that_ sort of personal development in our industry.

I may have said this before on this list, since it's one of my favorite
things to hold forth about, but, at least in the U.S., civilian air
traffic controllers have a maxim.

Safe, orderly, efficient.[2]

Having been in the air force for half a decade, that sounds familiar to me  :)


You meet these criteria in order from left to right, and you satisfy one
completely, or to some accepted, documented, and well-known standard
measure, before you move on to the next.  The obvious reason for this is
that when aircraft meet each other at cruise altitudes, many people die.

I haven't yet settled on a counterpart for software engineering that I
like, but my latest stab at it is this.

Comprehensible, correct, efficient.

Sounds reasonable.

[...]


and I am blaspheming by insinuating that it was unsafe code.

Write and demonstrate an exploit[6].  It won't make you any more popular
than your present approach, but it will knock the jocks' cowboy hats
askew.

Then I need to defend my one-line patch (I already defended it in the
commit message with a somewhat extended explanation, including a
dissection of the bug that would have been prevented by a compiler
warning) 2 times with what would will more than what I would write in
two hypothetical manual pages about sizeof() and the ternary operator.
Just imagine around 10 terminal pages of rationale for that change.
And then 3 meetings with different people.  And so we decide to bring
this issue to one of the oldest programmers in the group. Then things
go as follows.

Some people don't like reading long messages, and will hector you about
opportunity costs while spending time in code reviews (or on mailing
lists) that they might prefer on a golf course.

Yeah, seen that. The most ironic part is that some of them mentioned that I might have introduced risk in the project by the code change, referring to code as a liability, while I'm by far the contributor that has chopped most code, compared to additions (around +1000 -1600 total changes; ignoring changelogs, it is even more noticeable), which would mean I'm removing a lot of that liability.


I get a review that starts by saying that this makes the macro
unreadable (seriously, wtf?  I mean, the length() name is probably the
less useful name that could be given to such a macro, and my change is
making it unreadable?  okay, okay).

"nitems" is unreadable?  I guess if emails and code review web forms try
the patience of a reader, books are right out.

https://www.google.com/search?q=%22nitems%22+variable&tbm=bks

It's not my favorite name for an lvalue but I've seen it my entire
career.  It's hard to read much C without hitting it.

Then the review continues by saying that the reviewers are so bad that
"actually do allow such trivial bugs to happen".

And goes on to say that it's sad but it's expected of "new
developers".

You might ask this person what they believe the purpose of code reviews
to be.  Don't bring preconceptions to this conversation, and don't get
drawn into a discussion with them right away.  Find out what they think.

I have a feeling from the latest discussions that he simply doesn't want to make it easy to write safe code in his project, because contributors to his project should be at a level that they don't need the safety. That's of course flawed, but we've seen this before, don't we?

In fact, it's funny that my commit made reference to a stackoverflow post[2] of mine, which references an occasion where the bug made it into some tree of the Linux kernel, and was only caught when it arrived to Linus. Well, he downvoted it right after (or someone else reading the mailing list; the timings are quite obvious), with no comments; a childish response, I should say.

[2]: <https://stackoverflow.com/a/57537491>

There's a chance they'll have some great insight, but many times I've
found that people with great reputations have a shockingly superficial
understanding of certain things.  (We all do about _something_.)

At least, it seems my old manager (which now is a higher-level manager, but is still around), contrary to the old programmers and some lower managers, is starting to realize that my contributions of that type do make sense, and decided to take my changes. He didn't want to confront the old programmers, which made it difficult, but it finally worked. He also seems to have read my full explanation of why the change is necessary, and agreed with it, while trying to still sound not to disruptive to old programmers. :-)

Passive reception of answers to open-ended questions can tell you a lot
about a person.

I feel about your INSTALL.* (and other files) what I felt about the
same man-pages files.  RST is not the easiest thing to read.

I'd hesitate to call it RST.

Yeah I guess it's not written to be RST; I meant more that it is resemblant of RST (probably RST just took what was already existing at some point in the past).

[...]


If you're reading it like a book, it might make sense.  If you have
technical documentation, which is likely to be organized in unrelated
sections that you may want to consult independently,

There may be a generational shift in evidence here.  :)  Roughly, each
plain text documentation file in the root of the groff source tree
should be read in its entirety if it need be read at all.[7]

The first time, yes. But afterwards, using it as a reference should be easy, I guess.

 True,
there are cases where you can bail out early, or skip a section if its
title suggests irrelevance to the reader's needs.  It's pretty recently
that I began seriously attacking this aspect of the groff documentation,
having started out much more concerned with end-user- (rather than
developer-) facing materials like man pages.  Maybe I can further
improve this stuff.

indentation can play a big difference.

Possibly.  I don't see much of a role for it in these text files at
present, but others may have different vision.

What's your opinion of the man-like files in the man-pages repo?


That's why I rewrote the man-pages repo documentation in a
man-pages\[en]like (:P) document.  I find it much easier now to see
the organization of the files at a short glance, and look for what you
need.

Does it make sense to you?

Somewhat.  There is a place for plain text (_truly_ plain text)
documentation, and with groff there's a bit of a bootstrapping issue; an
configuration and installation manual written in a roff macro language
would deter users who thought they needed to have the system built first
before they could read it.[8]  (Some people are easily discouraged.)

The plain text documents in the man-pages repo root are truly plain text (I wrote them by hand; not generated them from any other source), but I wrote them to resemble man(7) output, which (might be corruption by experience as well) I find extremely readable. You may notice that they are not filled, which is because I write them by hand.

While I find *roff source documents plenty readable as-is in a text
editor, I acknowledge that I may have been corrupted by experience.  :P

man(7) and its diff(1)s are quite readable to me.  Can't blame you.  :P


[...]

[6] But do it in a sandbox lest you become the next Tom Christiansen.

I didn't understand the reference. I guess some exploit of him escaped his computer, but couldn't find anything in a web search.

[...]

[8] Our Texinfo manual had a section for this.  It sat empty for over 20
     years.
     
https://git.savannah.gnu.org/cgit/groff.git/commit/?id=e6736968552aa98b0aa602460a3c08de47adfe87

I wouldn't call that a problem. Textinfo is still unreadable after one has the system installed. Maybe it's that I've never used emacs(1), but I never liked it.

Cheers,

Alex


--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]