groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 53/115: [pdfpic]: Fix Savannah #64061.


From: G. Branden Robinson
Subject: [groff] 53/115: [pdfpic]: Fix Savannah #64061.
Date: Thu, 1 Jun 2023 10:46:08 -0400 (EDT)

gbranden pushed a commit to branch branden-2022-06-01
in repository groff.

commit e0c27d6c19152f245b4886f92957c128c2fa8565
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Mon Apr 17 16:41:33 2023 -0500

    [pdfpic]: Fix Savannah #64061.
    
    * tmac/pdfpic.tmac: Refactor to make comprehensible some woefully
      undocumented cleverness and improve efficiency.
    
      (PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy` usage
      into its own macro, calling from here and relocating its requests from
      here...
    
      (pdfpic@get-image-dimensions): ...to here.  When using `sy` request to
      collect and munge output of pdfinfo(1), (a) disable the escape
      character while defining the macro; (b) construct the command in a
      roff string, appending to it in discrete, hopefully comprehensible
      chunks; (c) disable the escape character during macro interpretation
      wherever possible (most of it); (d) retain doubled backslashes so that
      they survive subsequent string interpolation; (e) stop using grep(1)
      in the pipeline when sed(1) is perfectly capable of performing its own
      input filtering; (f) invoke sed with '-n' option and emit output only
      upon a successful substitution; (g) replace unportable(!) POSIX BRE
      character class '[:digit:]' in substitution match text with '[0-9]';
      and most importantly (h) replace multi-line sed 's' replacement text
      (see below for the reason we can't use it) with single roff control
      line employing the groff extension escape sequence `\R` to assign
      multiple registers.  Annotate portability and escaping challenges.
      Tested on GNU/Linux, macOS 12, and (with simulated pdfinfo(1) output)
      Solaris 11.
    
    There is a problem with trying to embed true newlines into the arguments
    of a `sy` request.  The C++ function that GNU troff uses to assemble the
    command string (character by character) _does not recognize C/C++ string
    literal escape sequences_.  This means that you _cannot_ embed "\n" in
    `sy`'s arguments and have it survive, as a newline character, into the
    command string passed to the standard C library's system(3) function.
    ("A\nB" gets encoded as 'A', '\\', 'n', 'B', not 'A', '\n', 'B'.)
    Unfortunately, this appears to be AT&T troff-compatible behavior.  But
    it means that you _cannot_ portably construct multi-line replacement
    text for sed's 's' command.  (Other sed commands like 'a', 'c', and 'i'
    will be similarly affected.)  See Savannah #64071.
    
    * PROBLEMS: Drop item.
    
    Fixes <https://savannah.gnu.org/bugs/?64061>.  Thanks to Bruno Haible
    for the report, and to him and Ralph Corderoy for the discussion of
    portable and efficient sed constructs.
---
 ChangeLog        | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 PROBLEMS         | 20 --------------------
 tmac/pdfpic.tmac | 49 ++++++++++++++++++++++++++++++++++++-------------
 3 files changed, 82 insertions(+), 33 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 85ee6bdbc..9246c0dff 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,49 @@
+2023-04-21  G. Branden Robinson <g.branden.robinson@gmail.com>
+
+       * tmac/pdfpic.tmac: Refactor to make comprehensible some
+       woefully undocumented cleverness and improve efficiency.
+       (PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy`
+       usage into its own macro, calling from here and relocating its
+       requests from here...
+       (pdfpic@get-image-dimensions): ...to here.  When using `sy`
+       request to collect and munge output of pdfinfo(1), (a) disable
+       the escape character while defining the macro; (b) construct the
+       command in a roff string, appending to it in discrete, hopefully
+       comprehensible chunks; (c) disable the escape character during
+       macro interpretation wherever possible (most of it); (d) retain
+       doubled backslashes so that they survive subsequent string
+       interpolation; (e) stop using grep(1) in the pipeline when
+       sed(1) is perfectly capable of performing its own input
+       filtering; (f) invoke sed with '-n' option and emit output only
+       upon a successful substitution; (g) replace unportable(!) POSIX
+       BRE character class '[:digit:]' in substitution match text with
+       '[0-9]'; and most importantly (h) replace multi-line sed 's'
+       replacement text (see below for the reason we can't use it) with
+       single roff control line employing the groff extension escape
+       sequence `\R` to assign multiple registers.  Annotate
+       portability and escaping challenges.  Tested on GNU/Linux, macOS
+       12, and (with simulated pdfinfo(1) output) Solaris 11.
+
+       There is a problem with trying to embed true newlines into the
+       arguments of a `sy` request.  The C++ function that GNU troff
+       uses to assemble the command string (character by character)
+       _does not recognize C/C++ string literal escape sequences_.
+       This means that you _cannot_ embed "\n" in `sy`'s arguments and
+       have it survive, as a newline character, into the command string
+       passed to the standard C library's system(3) function.  ("A\nB"
+       gets encoded as 'A', '\\', 'n', 'B', not 'A', '\n', 'B'.)
+       Unfortunately, this appears to be AT&T troff-compatible
+       behavior.  But it means that you _cannot_ portably construct
+       multi-line replacement text for sed's 's' command.  (Other sed
+       commands like 'a', 'c', and 'i' will be similarly affected.)
+       See Savannah #64071.
+
+       * PROBLEMS: Drop item.
+
+       Fixes <https://savannah.gnu.org/bugs/?64061>.  Thanks to Bruno
+       Haible for the report, and to him and Ralph Corderoy for the
+       discussion of portable and efficient sed constructs.
+
 2023-04-27  G. Branden Robinson <g.branden.robinson@gmail.com>
 
        [pdfpic]: Clean up better.
diff --git a/PROBLEMS b/PROBLEMS
index c40d49915..a5a3d7090 100644
--- a/PROBLEMS
+++ b/PROBLEMS
@@ -59,26 +59,6 @@ Some portability issues are known to affect groff's gdiffmk 
utility.
 
 ----------------------------------------------------------------------
 
-* The `PDFPIC` macro doesn't work / its automated test fails.
-
-  FAIL: tmac/tests/pdfpic_does-not-choke-on-bad-pdfinfo-output.sh
-
-Due to a limitation (shared by AT&T troff) in the way the `sy` request
-constructs a C string argument to the C library's system(3) function,
-groff requires a GNU sed(1) extension that interprets "\n" as a newline
-in the replacement text of the 's' command.  (We might enhance GNU
-troff's `sy` request to avoid this dependency in the future.)  We have
-observed this problem on Solaris 10 and 11 and Mac OS X 10.11.6, but not
-macOS 12: the last's sed supports the extension in question.
-
-Install GNU sed in the default $PATH as "gsed" and edit
-tmac/pdfpic.tmac.  On line 172, change "sed" to "gsed".  Alternatively,
-you can use the absolute path to GNU sed's location.  (`system()`
-sanitizes $PATH to avoid privilege escalation.)  Then (re-)make the
-"check" target or format your PDFPIC-employing document again.
-
-----------------------------------------------------------------------
-
 [groff 1.19.2]
 
 * When viewing man pages, some characters on my UTF-8 terminal emulator
diff --git a/tmac/pdfpic.tmac b/tmac/pdfpic.tmac
index 08d04d277..1ba5c004e 100644
--- a/tmac/pdfpic.tmac
+++ b/tmac/pdfpic.tmac
@@ -55,6 +55,41 @@
 .  rr pdfpic*desired-height
 ..
 .
+.\" Get image dimensions.  The `tr` command to strip null bytes is
+.\" distasteful, but its necessity is imposed on us.  See
+.\" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
+.\"
+.\" The following is a circus of portability and escaping constraints.
+.\" See <https://savannah.gnu.org/bugs/index.php?64061>.  Modify it
+.\" only with great caution and after testing with a variety of seds.
+.\" (See the "HACKING" file in groff's source distribution.)  Keep in
+.\" mind that the argument(s) to `sy` are assembled into a C string
+.\" and then passed to system(3), which invokes "sh -c" on the string.
+.\"
+.\" We therefore shut off roff's escape mechanism _twice_: once while
+.\" defining the macro, and once while interpreting part of it, to
+.\" preserve backslashes that we need in the constructed C string.
+.\"
+.\" We _still_ must escape the backslashes in the string appendments to
+.\" keep them from being interpreted as commencing roff escape sequences
+.\" when the string they populate is later interpolated.
+.eo
+.de pdfpic@get-image-dimensions
+.  ds pdfpic*command pdfinfo \$1
+.  eo
+.  as pdfpic*command " | tr -d '\\000'
+.  as pdfpic*command " | sed -n -e '/Page  *size:/
+.  as pdfpic*command s/Page  *size:  *\\([0-9.]*\\)  *x  *\([0-9.]*\\).*$/
+.  as pdfpic*command . \\\\R@pdfpic*width  (p;\\1)@
+.  as pdfpic*command " \\\\R@pdfpic*height (p;\\2)@
+.  as pdfpic*command /p'
+.  ec
+.  as pdfpic*command " > \*[pdfpic*temporary-file]
+.  sy \*[pdfpic*command]
+.  rm pdfpic*command
+..
+.ec
+.
 .de PDFPIC
 .  if !\\n[.U] \{\
 .    pdfpic@error use of \\$0 requires GNU troff's unsafe mode \
@@ -165,19 +200,7 @@ skipping '\\$1'
 .    return
 .  \}
 .  ds pdfpic*temporary-file \\*[pdfpic*temporary-directory]/pdfpic\n[$$]
-.
-.  \" Get image dimensions.  The `tr` command to strip null bytes is
-.  \" distasteful, but its necessity is imposed on us.  See
-.  \" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
-.  ec @
-.  sy pdfinfo @$1 | \
-tr -d '\000' | \
-grep "Page *size" | \
-sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
-.nr pdfpic*width (p;\\1)\\n\
-.nr pdfpic*height  (p;\\2)/' \
-> @*[pdfpic*temporary-file]
-.  ec
+.  pdfpic@get-image-dimensions \\$1
 .  if \\n[systat] \{\
 .    pdfpic@error retrieval of '\\$1' image dimensions failed with \
 exit status \\n[systat]; skipping



reply via email to

[Prev in Thread] Current Thread [Next in Thread]