bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #64576] [pdf.tmac] pdf*href option handling insufficiently flexible


From: G. Branden Robinson
Subject: [bug #64576] [pdf.tmac] pdf*href option handling insufficiently flexible
Date: Mon, 21 Aug 2023 05:50:42 -0400 (EDT)

URL:
  <https://savannah.gnu.org/bugs/?64576>

                 Summary: [pdf.tmac] pdf*href option handling insufficiently
flexible
                   Group: GNU roff
               Submitter: gbranden
               Submitted: Mon 21 Aug 2023 09:50:40 AM UTC
                Category: Macro - others/general
                Severity: 3 - Normal
              Item Group: Incorrect behaviour
                  Status: In Progress
                 Privacy: Public
             Assigned to: gbranden
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None


    _______________________________________________________

Follow-up Comments:


-------------------------------------------------------
Date: Mon 21 Aug 2023 09:50:40 AM UTC By: G. Branden Robinson <gbranden>
This code:


.\"
.\" Macros "pdf:href.flag" and "pdf:href.option"
.\" provide a generic mechanism for switching on flag type options,     
.\" and for decoding options with arguments, respectively
.\"
.de pdf:href.flag
.\" ----------------------------------------------------------------------
.\" ----------------------------------------------------------------------
.nr pdf:href\\$1 1
.nr pdf:href.argc 1
..
.de pdf:href.option
.\" ----------------------------------------------------------------------
.\" ----------------------------------------------------------------------
.ds pdf:href\\$1 \\$2
.nr pdf:href.argc 2


...is insufficiently flexible.  It assume that its inputs will consist only of
ordinary characters, but special characters and escape sequences, particular
for callers of `pdf:href.option`, are conceivable.

For example, a macro like _groff man_(7)'s `UR`, when used with no link text
(which is a bit lazy, but accepted), will run into problems in cases like the
following.


.P
.I ps2eps
is available from CTAN mirrors, e.g.,
.UR ftp://\:ftp\:.dante\:.de/\:tex\-archive/\:support/\:ps2eps/
.UE .


That's a real example from our _pic_(1) page.  One approach to resolving it
implies laboriously walking the arguments to macros that call `pdf:href.flag`
and `pdf:href.option` (which are internals--not externally documented and
therefore not an API), attempting to scrub them of unexpected content, and
getting peevish with other _groff_ developers when encountering arbitrary
_roff_ input that is *unexpectedly* unexpected; see, e.g., bug #64202.

That it is so tedious to iterate through strings in _groff_ (and as I have
said elsewhere, nigh-impossible in AT&T _troff_) is doubtless one of the
factors that turns up the temperature on this problem.  See bug #62264 for a
proposed, but not yet implemented, quality-of-life improvement in this area.

Another possibility is simply for _pdf.tmac_- or _pdfmark.tmac_-using
documents and macro packages to be aware of the intolerance/irritability of
its internals, and work around them--for instance, _groff_'s _an.tmac_, when
seeing that a `UR` or `MT` has no link text, could simply inject some known,
well-behaved link text like "(link)", that aforementioned internals won't barf
on.  This works (I tried it), but it is pretty lame.

1.  That text isn't localized.
2.  That text might not appropriate or clear in all situations.

Now, one _could_ kick both of the above back into the user's face.  ("Just
supply some link text, damn it!")  But for another problem...

3.  Worst, you can't format punctuation after it without intervening space. 
To do that, you need the `\c` escape sequence, which becomes part of one of
`pdfhref`'s arguments, and _pdfmark.tmac_ / _pdf.tmac_ insist on populating
_roff_ register or string names incorporating each such argument, and we're
back to the original problem of escape sequences.


troff:<standard input>:1473: error: an escaped 'c' is not allowed in an
identifier


And in fact use of `\c` is wholly defeated here--you'll get space (and
possibly a break) before the punctuation anyway.  So tossing the burden of
specifying link text--which is supposed to be formatted output in the first
place--on the user and then going aggro on them if they dare to use escape
sequences that are wholly valid in formatted output is not a satisfactory
solution.

Intriguingly, the `\A` escape sequence to test a character sequence for
validity as a _groff_ identifier name has been around since 1991, but
_pdfmark.tmac_ and _pdf.tmac_ don't bother to use it.  Possibly this problem
would have been recognized and addressed long ago if they had.  It certainly
seems to me like a Recommended Best Practice if one is going to be populating
_groff_ identifiers based on user input (or even _any_ external input, like a
macro package written by someone who isn't as careful as you are).  But nobody
ever got a fellowship for validating input, did they?

Moreover, it appears that the main reason _pdfmark.tmac_ / _pdf.tmac_ are
taking this approach is because the _roff_ language doesn't have a list type,
so it's a pain in the ass to search for things.  _pdfmark.tmac_ / _pdf.tmac_'s
solution, to use the macro/request/string name space as a dictionary, with the
identifiers as keys and the string contents as values, does have obvious
appeal given that limitation...but for blundering into the other limitations
of assuming either that (a) any input makes a valid identifier, or (b) your
users won't wander off the lit path of ordinary characters.  And as noted
above, scrubbing a character sequence for things that are invalid (in _any_
context)--the "sanitiziation problem", is Yet Another pain in the ass.  See
bug #62264 again.

Fortunately, the use of this mechanism, in _pdf.tmac_ at least, appears to be
fairly limited.

`pdf.href.flag` would seem to be okay, since its values only ever come from
macro arguments that identify "flags", and these are going to have
straightforward names.

For instance, these seem okay (includes annotations from my working copy).


671 .\" XXX: predefined flag
672 .if !dpdf:href-D .pdf:href.option -D \\$1
673 .if '\\*[pdf:href-D]'' \{\
674 .   pdf:error pdfhref has no destination
675 .   nr pdf:href.ok 0
676 .   \}

690 .\" XXX: predefined flag
691 .if dpdf:href-P \&\\*[pdf:href-P]\c
692 .ie \\n[pdf:href.ok] \{\
693 .   \"
[~40 lines of brace scope follow]


No, the problem seems to be limited to eating what, on the Unix command line,
we'd call operands and option arguments, but which can be URLs with escape
sequences like \: and \c in them, and spitting them verbatim into suffixes on
_roff_ identifiers, and that just doesn't work in general.


423 .   \"
424 .   \" Handle the case where subcommand is specified as "-class",
425 .   \" setting up appropriate macro aliases for subcommand handlers.
426 .   \"
427 .\" XXX
428 .      if dpdf*href\\$1       .als pdf*href      pdf*href\\$1
429 .      if dpdf*href\\$1.link  .als pdf*href.link pdf*href\\$1.link
430 .      if dpdf*href\\$1.file  .als pdf*href.file pdf*href\\$1.file
431 .   \"
432 .   \" Repeat macro alias setup
433 .   \" for the case where the subcommand is specified as "class",
434 .   \" (without a leading hyphen)
435 .   \"
436 .\" XXX
437 .      if dpdf*href-\\$1      .als pdf*href      pdf*href-\\$1
438 .      if dpdf*href-\\$1.link .als pdf*href.link pdf*href-\\$1.link
439 .      if dpdf*href-\\$1.file .als pdf*href.file pdf*href-\\$1.file


An immense amount of code in _pdf.tmac_ seems to be dedicated to an
exploration of the question "hey, what if we chucked established _roff_
programming idioms out the window and re-implemented _getopt_long_(3) in it so
that shell script programmers had macro interfaces that looked vaguely
familiar"?







    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64576>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]