Re: Pascal rides again (was: Specifying dependencies more clearly)

groff

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pascal rides again (was: Specifying dependencies more clearly)

From:	Alejandro Colomar
Subject:	Re: Pascal rides again (was: Specifying dependencies more clearly)
Date:	Fri, 11 Nov 2022 03:09:51 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1

On 11/10/22 22:19, G. Branden Robinson wrote:

(Some of this is off-topic for the groff list.)


Hi Branden,


Hi Alex,

At 2022-11-08T22:05:25+0100, Alejandro Colomar wrote:

Okay, here we go for a rant.

Let's say there's some software with cowboy programmers, which has
things like:

     typedef struct {
         size_t  length;
         u_char  *start;
     } str_t;


Those who do not learn Pascal strings are condemned to reinvent them.

     #define length(s)   (sizeof(s) - 1)

     #define str_set(str, text)  do       \
     {                                    \
         (str)->length = length(test);    \
         (str)->start = (u_char *) text;  \
     }

(Of course, cowboy programmers don't need terminating NUL bytes,
that's for newbies, but that's not today's rant.)


There are some advantages to Pascal strings.  Having determination of
any string's size be O(1) is a big win over C string-scanning functions
(or loops doing repeated "*ch != '\0'" comparisons),

Yeah, I can understand that, and considering that it's a high performanceserver, it's even expected (I remember that Ulrich Drepper rejected any stringcopy functions to glibc on the grounds that programmers should just usememcpy(3) and remember the lengths). But, since sooner or later, many of thosestrings will be passed to some syscall or libc function, having both might makemore sense. Sure, keep the size for optimizing passes, but leave a trailing NULfor making sure that you can pass the string to libc. The currentimplementation seems to do that _only_ if it's strictly necessary, which ends uphaving to check unreadable code initialized miles away from its use, and so Ican't promise that there's not a buffer overrun for a given string (and I'mpretty sure there's more than one, since there have been patches in the pastfixing buffer overruns).

especially when
these are used by the naïve or slack-jawed.[1]

I developed this stpecpy()[1] to fix this performance problem present instrlcpy(3). I plan to start using it (or similar ones adapted to the stringsused in this project) in the project some day, when waters calm down, and mychanges for adding readability and safety to the code are viewed with better eyes.


[1]: <https://software.codidact.com/posts/285946>

[...]

     str = str_set(cond ? "someword" : "another");


Right.  C doesn't _really_ have strings, except at the library level.
It has character arrays and one grain of syntactic sugar for encoding
"string literals", which should not have been called that because
whether they get the null terminator is context-dependent.

         char a[5] = "fooba";
         char *b = "bazqux";

Oh yeah, I hate that one. I wish the compiler had a flag to warn about that(maybe there is and I don't know it).


I see some Internet sources claim that C is absolutely reliable about
null-terminating such literals, but I can't agree.  The assignment to
`b` above adds a null terminator, and the one to `a` does not.  This is
the opposite of absolute reliability.  Since I foresee someone calling
me a liar for saying that, I'll grant that if you carry a long enough
list of exceptional cases for the syntax in your head, both are
predictable.  But it's simply a land mine for the everyday programmer.
Worse, the hype around C that "arrays are really pointers in disguise!1!
They're interchangeable!1!" constitutes a neon sign directing the
learner directly into the mine field.

Maybe it's because I learnt C mostly in Stackoverflow, but I learnt thedistinction between arrays and pointers from day 1 (programmers there seem tohave a tendency to avoid UB and keep a puritan correctness that I haven't seenso much outside of that forum), and have had the difference quite clear. Theyshould never be considered the same thing, certainly (but syntactic sugar infunction parameters is fine once you know the difference).

In fact, the enhanced VLA syntax to function prototypes, combined with an oldpaper that someone proposed for C2x of adding the _Lengthof() operator to C (butit's been delayed for C3x it seems), might bring some array features topointers, such as getting the number of elements in the false array, as if itwere a fat pointer.

That would be very interesting for C, and might end up having safer arrays andpointers than C++. It would be ironic, considering most of the C++ hate for Ccomes from pointers and arrays, from what I've seen.


[...]


"There are two fundamentally difficult problems in computer science:
cache invalidation, naming things, and off-by-one errors." -- anon.


:P

[...]

int main(void) {
   char s[] = { 'H', 'e', 'l', 'l', 'o', ',', ' ',
                'W', 'o', 'r', 'l', 'd', '!', '\n' + 0x80 };

   char *p = s;
   do {
     putchar(*p & 0x7F);
   } while (*p++ >= 0); // (*p++ < 0x80) on systems w/ unsigned chars
}

Good that computers have 64 GiB of RAM these days, rather than 64 KiB (or less).I haven't found this code, thank $DEITY.


You may notice that there's no way of encoding an empty string with this
mechanism.  That was by design.  Why would you ever point to an empty
string?  That wastes not one but TWO bytes (16-bit pointers)!

Heh, wasting one size_t and a NUL byte at the end of the string at the same timewould be a shooting offense, I guess.

The author of the patch decides to completely rewrite that line even
if the bug is not really understood, and it just works after it.


Yes.  A sloppy lexicon, combined with cultural and managerial
preoccupations with "cadence" (always implicitly a higher one), manures
the ground thickly for kludges, black magic, and a habit of individual
contributors abandoning projects so that they experience no
accountability for their coding errors.  And I don't mean "punishment"
as a synonym for "accountability"--though that is a substitution typical
of hard-driving, "type A", "get 'er done" engineers and managers alike.
I mean accountability in terms of someone being able to find out _that_
they erred, and _learning_ from it, without a thick gravy of operant
conditioning ladled over it.

I love having to dict(1) every other word of your emails. Nice English lessons.8-)


God forbid we have _that_ sort of personal development in our industry.

I may have said this before on this list, since it's one of my favorite
things to hold forth about, but, at least in the U.S., civilian air
traffic controllers have a maxim.

Safe, orderly, efficient.[2]


Having been in the air force for half a decade, that sounds familiar to me  :)


You meet these criteria in order from left to right, and you satisfy one
completely, or to some accepted, documented, and well-known standard
measure, before you move on to the next.  The obvious reason for this is
that when aircraft meet each other at cruise altitudes, many people die.

I haven't yet settled on a counterpart for software engineering that I
like, but my latest stab at it is this.

Comprehensible, correct, efficient.


Sounds reasonable.

[...]

and I am blaspheming by insinuating that it was unsafe code.


Write and demonstrate an exploit[6].  It won't make you any more popular
than your present approach, but it will knock the jocks' cowboy hats
askew.

Then I need to defend my one-line patch (I already defended it in the
commit message with a somewhat extended explanation, including a
dissection of the bug that would have been prevented by a compiler
warning) 2 times with what would will more than what I would write in
two hypothetical manual pages about sizeof() and the ternary operator.
Just imagine around 10 terminal pages of rationale for that change.
And then 3 meetings with different people.  And so we decide to bring
this issue to one of the oldest programmers in the group. Then things
go as follows.


Some people don't like reading long messages, and will hector you about
opportunity costs while spending time in code reviews (or on mailing
lists) that they might prefer on a golf course.

Yeah, seen that. The most ironic part is that some of them mentioned that Imight have introduced risk in the project by the code change, referring to codeas a liability, while I'm by far the contributor that has chopped most code,compared to additions (around +1000 -1600 total changes; ignoring changelogs, itis even more noticeable), which would mean I'm removing a lot of that liability.

I get a review that starts by saying that this makes the macro
unreadable (seriously, wtf?  I mean, the length() name is probably the
less useful name that could be given to such a macro, and my change is
making it unreadable?  okay, okay).


"nitems" is unreadable?  I guess if emails and code review web forms try
the patience of a reader, books are right out.

https://www.google.com/search?q=%22nitems%22+variable&tbm=bks

It's not my favorite name for an lvalue but I've seen it my entire
career.  It's hard to read much C without hitting it.

Then the review continues by saying that the reviewers are so bad that
"actually do allow such trivial bugs to happen".

And goes on to say that it's sad but it's expected of "new
developers".


You might ask this person what they believe the purpose of code reviews
to be.  Don't bring preconceptions to this conversation, and don't get
drawn into a discussion with them right away.  Find out what they think.

I have a feeling from the latest discussions that he simply doesn't want to makeit easy to write safe code in his project, because contributors to his projectshould be at a level that they don't need the safety. That's of course flawed,but we've seen this before, don't we?

In fact, it's funny that my commit made reference to a stackoverflow post[2] ofmine, which references an occasion where the bug made it into some tree of theLinux kernel, and was only caught when it arrived to Linus. Well, he downvotedit right after (or someone else reading the mailing list; the timings are quiteobvious), with no comments; a childish response, I should say.


[2]: <https://stackoverflow.com/a/57537491>

There's a chance they'll have some great insight, but many times I've
found that people with great reputations have a shockingly superficial
understanding of certain things.  (We all do about _something_.)

At least, it seems my old manager (which now is a higher-level manager, but isstill around), contrary to the old programmers and some lower managers, isstarting to realize that my contributions of that type do make sense, anddecided to take my changes. He didn't want to confront the old programmers,which made it difficult, but it finally worked. He also seems to have read myfull explanation of why the change is necessary, and agreed with it, whiletrying to still sound not to disruptive to old programmers. :-)

Passive reception of answers to open-ended questions can tell you a lot
about a person.

I feel about your INSTALL.* (and other files) what I felt about the
same man-pages files.  RST is not the easiest thing to read.


I'd hesitate to call it RST.

Yeah I guess it's not written to be RST; I meant more that it is resemblant ofRST (probably RST just took what was already existing at some point in the past).


[...]

If you're reading it like a book, it might make sense.  If you have
technical documentation, which is likely to be organized in unrelated
sections that you may want to consult independently,


There may be a generational shift in evidence here.  :)  Roughly, each
plain text documentation file in the root of the groff source tree
should be read in its entirety if it need be read at all.[7]

The first time, yes. But afterwards, using it as a reference should be easy, Iguess.

 True,
there are cases where you can bail out early, or skip a section if its
title suggests irrelevance to the reader's needs.  It's pretty recently
that I began seriously attacking this aspect of the groff documentation,
having started out much more concerned with end-user- (rather than
developer-) facing materials like man pages.  Maybe I can further
improve this stuff.

indentation can play a big difference.


Possibly.  I don't see much of a role for it in these text files at
present, but others may have different vision.


What's your opinion of the man-like files in the man-pages repo?

That's why I rewrote the man-pages repo documentation in a
man-pages\[en]like (:P) document.  I find it much easier now to see
the organization of the files at a short glance, and look for what you
need.

Does it make sense to you?


Somewhat.  There is a place for plain text (_truly_ plain text)
documentation, and with groff there's a bit of a bootstrapping issue; an
configuration and installation manual written in a roff macro language
would deter users who thought they needed to have the system built first
before they could read it.[8]  (Some people are easily discouraged.)

The plain text documents in the man-pages repo root are truly plain text (Iwrote them by hand; not generated them from any other source), but I wrote themto resemble man(7) output, which (might be corruption by experience as well) Ifind extremely readable. You may notice that they are not filled, which isbecause I write them by hand.

While I find *roff source documents plenty readable as-is in a text
editor, I acknowledge that I may have been corrupted by experience.  :P


man(7) and its diff(1)s are quite readable to me.  Can't blame you.  :P


[...]

[6] But do it in a sandbox lest you become the next Tom Christiansen.

I didn't understand the reference. I guess some exploit of him escaped hiscomputer, but couldn't find anything in a web search.


[...]

[8] Our Texinfo manual had a section for this.  It sat empty for over 20
     years.
     
https://git.savannah.gnu.org/cgit/groff.git/commit/?id=e6736968552aa98b0aa602460a3c08de47adfe87

I wouldn't call that a problem. Textinfo is still unreadable after one has thesystem installed. Maybe it's that I've never used emacs(1), but I never liked it.


Cheers,

Alex


--
<http://www.alejandro-colomar.es/>

OpenPGP_signature
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

Specifying dependencies more clearly, Alejandro Colomar, 2022/11/08
- Re: Specifying dependencies more clearly, G. Branden Robinson, 2022/11/08
  - Re: Specifying dependencies more clearly, Alejandro Colomar, 2022/11/08
    - Re: Specifying dependencies more clearly, Alejandro Colomar, 2022/11/09
    - sizeof in Macros. (Was: Specifying dependencies more clearly), Ralph Corderoy, 2022/11/10
    - Re: sizeof in Macros. (Was: Specifying dependencies more clearly), Alejandro Colomar, 2022/11/15
    - Pascal rides again (was: Specifying dependencies more clearly), G. Branden Robinson, 2022/11/10
    - Re: Pascal rides again (was: Specifying dependencies more clearly), Alejandro Colomar <=
    - Re: Pascal rides again (was: Specifying dependencies more clearly), Alejandro Colomar, 2022/11/10
    - Re: Pascal rides again (was: Specifying dependencies more clearly), G. Branden Robinson, 2022/11/10
    - Re: Pascal rides again (was: Specifying dependencies more clearly), Dave Kemper, 2022/11/11
    - Re: Pascal rides again (was: Specifying dependencies more clearly), Alejandro Colomar, 2022/11/12
    - C Strings and String Literals. (Was: Pascal rides again), Ralph Corderoy, 2022/11/13
    - Re: C Strings and String Literals. (Was: Pascal rides again), Larry McVoy, 2022/11/13
    - Re: C Strings and String Literals. (Was: Pascal rides again), Alejandro Colomar, 2022/11/13
    - Re: C Strings and String Literals. (Was: Pascal rides again), Alejandro Colomar, 2022/11/13
    - Re: C Strings and String Literals. (Was: Pascal rides again), Larry McVoy, 2022/11/13
    - Re: C Strings and String Literals. (Was: Pascal rides again), Alejandro Colomar, 2022/11/13

Prev by Date: Re: MR macro 4th argument (was: [PATCH] Various pages: SYNOPSIS: Use VLA syntax in function parameters)
Next by Date: Re: Pascal rides again (was: Specifying dependencies more clearly)
Previous by thread: Pascal rides again (was: Specifying dependencies more clearly)
Next by thread: Re: Pascal rides again (was: Specifying dependencies more clearly)
Index(es):
- Date
- Thread