groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] src/: ceil_prime(): Add function to get the lowest prime


From: Alejandro Colomar
Subject: Re: [PATCH v2] src/: ceil_prime(): Add function to get the lowest prime not smaller than n
Date: Wed, 13 Mar 2024 13:27:15 +0100

Hi Branden,

On Wed, Mar 13, 2024 at 06:52:10AM -0500, G. Branden Robinson wrote:
> Hi Alex,
> 
> At 2024-03-13T12:01:32+0100, Alejandro Colomar wrote:
> > I didn't write an exploit, as that would be a little bit more work
> > (including learning what is indxbib at all).
> 
> Go ahead!  This, I want to see!  3:)

If you add an "Examples" section to indxbib(1), with some exemplifying
shell session with an input and output, it might help reduce the time to
learn how I'd want to use that program, and what to expect from it.

        $ indxbib --some-usual-flags <<__EOF__;
        some usual input
        more usual input
        __EOF__
        corresponding output
        more corresponding output

> 
> > But by reading the code, I'd say yes.
> > 
> > $ git show master:src/utils/indxbib/indxbib.cpp \
> > | grepc main \
> > | grep -C5 is_prime;
> >     int requested_hash_table_size;
> >     check_integer_arg('h', optarg, 1, &requested_hash_table_size);
> >     hash_table_size = requested_hash_table_size;
> >     if ((hash_table_size > 2) && (hash_table_size % 2) == 0)
> >             hash_table_size++;
> >     while (!is_prime(hash_table_size))
> >       hash_table_size += 2;
> >     if (hash_table_size != requested_hash_table_size)
> >       warning("requested hash table size %1 is not prime: using %2"
> >               " instead", optarg, hash_table_size);
> >       }
> > 
> > Let's imagine 'requested_hash_table_size' is 1.
> 
> You missed the change I made, and mentioned at the end of
> <https://lists.gnu.org/archive/html/groff/2024-03/msg00060.html>.

I didn't miss it, I considered it an alternative fix, so my patch
applies on top of the code that still has a minimum of 1, which has the
bogus behavior.

Especially, since adding a library function, we never know when someone
will grab that function and use it elsewhere, where 1 is not rejected.
So, a library function should behave well for 1 (or also assert it, if
you want, but then you'd need to state $reasons for rejecting 1, or we
might find ourselves with the same problem as the assertion that we
triggered recently from 30 years ago which nobody understands).

> > You had is_prime() with the same exact number of call sites: 2.
> 
> Well, "you" is kind of "the vague collective of groff developers" here.

Yep, and I've mixed several meanings of "you" in this email.  :)

> > If you want to misbehave for an input of 0 from a library call, by
> > adding an assertion, well that's not too bad, since you're the only
> > user.  In fact, is_prime() has such an assertion, even being a library
> > call.
> 
> I do.  I love assertions.  I believe that misbehaving programmers who
> slouch their way through software engineering without checking their
> assumptions should be punished at every occasion.

But I'm not sure ceil_prime() is the place for such an assertion.
Mathematically, it's fine to ask for the ceil_prime() of 0.  What's the
smallest prime not less than 0?  2.  No domain problems; no poles.

So, the right place to reject 0 is check_integer_arg(), IMO, which is
already done.

> Because eventually, those assumptions _will_ be tested in the field.
> 
> Possibly in flight or while reactor core temperature is rising rapidly.
> 
> > *Mis*quoting someone (was it Torvalds who wrote the Linux coding
> > style?  I guess he was):
> > 
> >     First off, I'd suggest printing out a copy of the _C++_
> >     standard, and NOT read it.  Burn it, it's a great symbolic
> >     gesture.
> 
> Linus's fans and C++ deserve each other.  :-P

:)

> > I don't like idiomatic C++.  In fact, the only thing I like from C++
> > is that I can write idiomatic C in it (or mostly; some things don't
> > compile, like using VLA notation in function parameters that are
> > pointers).
> 
> C++ does have a problem in that its idioms change--with greater velocity
> than many software practitioners would like.

Indeed.

> That doesn't change the fact that groff is mostly implemented in it, and
> the C++98 standardized version of the language is more widely understood
> than the circa-1990 Annotated Reference Manual version of it.

Well, idiomatic C++98 isn't so bad.  It's close to idiomatic modern C.

> The advantage to switching to STL containers for these hash tables would
> be to get groff out of the business of maintaining its own
> implementations in favor of those that have been more rigorously tested
> and have a chance of being considerably more performant (because
> optimization-happy lunatics like those in Boost have been chasing the
> bleeding edge for decades, whereas groff's containers have only
> stability to recommend them).
> 
> Obviously I'm somewhat lukewarm on that point.  The real advantage to me
> is the prospect of deleting a lot of code that doesn't directly serve a
> typesetting objective, making swaths of code Someone Else's Problem.
> 
> C has gone through a similar thing.  Slowly, slowly, its practitioners
> have decided that, huh, maybe every project shouldn't carry around its
> own implementations of elementary data structures.  Hence BSD's
> sys/queue.h and GNU's glib.
> 
> It seems that, eventually, kids in industry got tired of doing on their
> employer's dime the Data Structures and Algorithms homework assignments
> they plagiarized in their sophomore year.  Or their managers finally
> cottoned on to what was happening and learned what "software modularity"
> was, apart from a buzzword.
> 
> (I remember those days.  What actually happened is that they and their
> managers took turns counting their unvested stock options and taking
> long hits from a huge crack pipe labelled "J2EE".)
> 
> Anyway, my point is that I'm not willing to do violence to groff's
> codebase as a vehicle for expressing my dissatisfaction with C++,[1] and
> I don't think anyone serving as its maintainer or lead developer should
> allow that, either.  People are still getting work done with groff.

I never claimed aversion to using a library.  I claimed aversion to
using a modern C++ library.  If you find a C library (probably
compatible with C++98) that implements what you need, then let's use it.

I think that using modern idiomatic C++ would be more violent to groff's
code base than using modern idiomatic C.  I think C23 has (will have,
we're in 2024 still waiting for its release) taken most of the good
things of the C++ language (not the library), so maybe you could
consider changing direction, and put modern C as the future of groff.

> At 2024-03-13T12:05:37+0100, Alejandro Colomar wrote:
> > On Wed, Mar 13, 2024 at 05:46:38AM -0500, G. Branden Robinson wrote:
> > > Forgot to attach my working copy, of course.
> > 
> > You could have attached/inlined a diff; it would be smaller.  :)
> 
> Sorry.  I was in haste.
> 
> > This is a breaking change.
> 
> I disagree.  It's tighter validation, now rejecting an input value that
> it should have been rejecting all along.
> 
> > My suggestion keeps accepting 1, with a warning --as before--, and
> > translates it to 2 (instead of 3, which was being done previously).
> > 
> > So, I don't like this change, I think, unless you *want* to break 1,
> > which I'm neutral to, since I don't know what this program does at
> > all.
> 
> A hash table (or map) of size 1 is not a hash table, it's a list.

Hmm, agree.  check_integer_arg() should reject 1.

Cheers,
Alex

> Creating a hashed index of bibliographic references that doesn't speed
> lookups by being O(n) instead of O(1) in a linear search, but is instead
> O(n) itself, complete with the overhead of additional storage (the
> database file on disk) and I/O operations, is not an optimization, but a
> pessimization.  Recalling indxbib's purpose, one might as well let
> refer(1) read the text files in the first place, instead of diverting
> its attention to binary database files that will be unhelpful.
> 
> Maurice J. Bach's _The Design of the Unix Operating System_ (1986), the
> first book-length exposition of the Unix kernel that AT&T tolerated
> sale of on the open market, contains a practical-minded exploration of
> this subject in its discussion of the System V kernel's "buffer heads".
> 
> That's where I first learned about hash maps, because I tend to come at
> things the wrong way around.
> 
> The BSD and Linux kernels do similar things.
> 
> Regards,
> Branden
> 
> [1] Outside of source code comments, where it's fine to be opinionated.
>     I don't regard those as "violence to the code base"; the code will
>     work or it won't, irrespective of the intemperance of code comments.



-- 
<https://www.alejandro-colomar.es/>
Looking for a remote C programming job at the moment.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]