[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[groff] 53/115: [pdfpic]: Fix Savannah #64061.
From: |
G. Branden Robinson |
Subject: |
[groff] 53/115: [pdfpic]: Fix Savannah #64061. |
Date: |
Thu, 1 Jun 2023 10:46:08 -0400 (EDT) |
gbranden pushed a commit to branch branden-2022-06-01
in repository groff.
commit e0c27d6c19152f245b4886f92957c128c2fa8565
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Mon Apr 17 16:41:33 2023 -0500
[pdfpic]: Fix Savannah #64061.
* tmac/pdfpic.tmac: Refactor to make comprehensible some woefully
undocumented cleverness and improve efficiency.
(PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy` usage
into its own macro, calling from here and relocating its requests from
here...
(pdfpic@get-image-dimensions): ...to here. When using `sy` request to
collect and munge output of pdfinfo(1), (a) disable the escape
character while defining the macro; (b) construct the command in a
roff string, appending to it in discrete, hopefully comprehensible
chunks; (c) disable the escape character during macro interpretation
wherever possible (most of it); (d) retain doubled backslashes so that
they survive subsequent string interpolation; (e) stop using grep(1)
in the pipeline when sed(1) is perfectly capable of performing its own
input filtering; (f) invoke sed with '-n' option and emit output only
upon a successful substitution; (g) replace unportable(!) POSIX BRE
character class '[:digit:]' in substitution match text with '[0-9]';
and most importantly (h) replace multi-line sed 's' replacement text
(see below for the reason we can't use it) with single roff control
line employing the groff extension escape sequence `\R` to assign
multiple registers. Annotate portability and escaping challenges.
Tested on GNU/Linux, macOS 12, and (with simulated pdfinfo(1) output)
Solaris 11.
There is a problem with trying to embed true newlines into the arguments
of a `sy` request. The C++ function that GNU troff uses to assemble the
command string (character by character) _does not recognize C/C++ string
literal escape sequences_. This means that you _cannot_ embed "\n" in
`sy`'s arguments and have it survive, as a newline character, into the
command string passed to the standard C library's system(3) function.
("A\nB" gets encoded as 'A', '\\', 'n', 'B', not 'A', '\n', 'B'.)
Unfortunately, this appears to be AT&T troff-compatible behavior. But
it means that you _cannot_ portably construct multi-line replacement
text for sed's 's' command. (Other sed commands like 'a', 'c', and 'i'
will be similarly affected.) See Savannah #64071.
* PROBLEMS: Drop item.
Fixes <https://savannah.gnu.org/bugs/?64061>. Thanks to Bruno Haible
for the report, and to him and Ralph Corderoy for the discussion of
portable and efficient sed constructs.
---
ChangeLog | 46 ++++++++++++++++++++++++++++++++++++++++++++++
PROBLEMS | 20 --------------------
tmac/pdfpic.tmac | 49 ++++++++++++++++++++++++++++++++++++-------------
3 files changed, 82 insertions(+), 33 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 85ee6bdbc..9246c0dff 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,49 @@
+2023-04-21 G. Branden Robinson <g.branden.robinson@gmail.com>
+
+ * tmac/pdfpic.tmac: Refactor to make comprehensible some
+ woefully undocumented cleverness and improve efficiency.
+ (PDFPIC): Break out flaming-hoop-leaping "clever" bit of `sy`
+ usage into its own macro, calling from here and relocating its
+ requests from here...
+ (pdfpic@get-image-dimensions): ...to here. When using `sy`
+ request to collect and munge output of pdfinfo(1), (a) disable
+ the escape character while defining the macro; (b) construct the
+ command in a roff string, appending to it in discrete, hopefully
+ comprehensible chunks; (c) disable the escape character during
+ macro interpretation wherever possible (most of it); (d) retain
+ doubled backslashes so that they survive subsequent string
+ interpolation; (e) stop using grep(1) in the pipeline when
+ sed(1) is perfectly capable of performing its own input
+ filtering; (f) invoke sed with '-n' option and emit output only
+ upon a successful substitution; (g) replace unportable(!) POSIX
+ BRE character class '[:digit:]' in substitution match text with
+ '[0-9]'; and most importantly (h) replace multi-line sed 's'
+ replacement text (see below for the reason we can't use it) with
+ single roff control line employing the groff extension escape
+ sequence `\R` to assign multiple registers. Annotate
+ portability and escaping challenges. Tested on GNU/Linux, macOS
+ 12, and (with simulated pdfinfo(1) output) Solaris 11.
+
+ There is a problem with trying to embed true newlines into the
+ arguments of a `sy` request. The C++ function that GNU troff
+ uses to assemble the command string (character by character)
+ _does not recognize C/C++ string literal escape sequences_.
+ This means that you _cannot_ embed "\n" in `sy`'s arguments and
+ have it survive, as a newline character, into the command string
+ passed to the standard C library's system(3) function. ("A\nB"
+ gets encoded as 'A', '\\', 'n', 'B', not 'A', '\n', 'B'.)
+ Unfortunately, this appears to be AT&T troff-compatible
+ behavior. But it means that you _cannot_ portably construct
+ multi-line replacement text for sed's 's' command. (Other sed
+ commands like 'a', 'c', and 'i' will be similarly affected.)
+ See Savannah #64071.
+
+ * PROBLEMS: Drop item.
+
+ Fixes <https://savannah.gnu.org/bugs/?64061>. Thanks to Bruno
+ Haible for the report, and to him and Ralph Corderoy for the
+ discussion of portable and efficient sed constructs.
+
2023-04-27 G. Branden Robinson <g.branden.robinson@gmail.com>
[pdfpic]: Clean up better.
diff --git a/PROBLEMS b/PROBLEMS
index c40d49915..a5a3d7090 100644
--- a/PROBLEMS
+++ b/PROBLEMS
@@ -59,26 +59,6 @@ Some portability issues are known to affect groff's gdiffmk
utility.
----------------------------------------------------------------------
-* The `PDFPIC` macro doesn't work / its automated test fails.
-
- FAIL: tmac/tests/pdfpic_does-not-choke-on-bad-pdfinfo-output.sh
-
-Due to a limitation (shared by AT&T troff) in the way the `sy` request
-constructs a C string argument to the C library's system(3) function,
-groff requires a GNU sed(1) extension that interprets "\n" as a newline
-in the replacement text of the 's' command. (We might enhance GNU
-troff's `sy` request to avoid this dependency in the future.) We have
-observed this problem on Solaris 10 and 11 and Mac OS X 10.11.6, but not
-macOS 12: the last's sed supports the extension in question.
-
-Install GNU sed in the default $PATH as "gsed" and edit
-tmac/pdfpic.tmac. On line 172, change "sed" to "gsed". Alternatively,
-you can use the absolute path to GNU sed's location. (`system()`
-sanitizes $PATH to avoid privilege escalation.) Then (re-)make the
-"check" target or format your PDFPIC-employing document again.
-
-----------------------------------------------------------------------
-
[groff 1.19.2]
* When viewing man pages, some characters on my UTF-8 terminal emulator
diff --git a/tmac/pdfpic.tmac b/tmac/pdfpic.tmac
index 08d04d277..1ba5c004e 100644
--- a/tmac/pdfpic.tmac
+++ b/tmac/pdfpic.tmac
@@ -55,6 +55,41 @@
. rr pdfpic*desired-height
..
.
+.\" Get image dimensions. The `tr` command to strip null bytes is
+.\" distasteful, but its necessity is imposed on us. See
+.\" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
+.\"
+.\" The following is a circus of portability and escaping constraints.
+.\" See <https://savannah.gnu.org/bugs/index.php?64061>. Modify it
+.\" only with great caution and after testing with a variety of seds.
+.\" (See the "HACKING" file in groff's source distribution.) Keep in
+.\" mind that the argument(s) to `sy` are assembled into a C string
+.\" and then passed to system(3), which invokes "sh -c" on the string.
+.\"
+.\" We therefore shut off roff's escape mechanism _twice_: once while
+.\" defining the macro, and once while interpreting part of it, to
+.\" preserve backslashes that we need in the constructed C string.
+.\"
+.\" We _still_ must escape the backslashes in the string appendments to
+.\" keep them from being interpreted as commencing roff escape sequences
+.\" when the string they populate is later interpolated.
+.eo
+.de pdfpic@get-image-dimensions
+. ds pdfpic*command pdfinfo \$1
+. eo
+. as pdfpic*command " | tr -d '\\000'
+. as pdfpic*command " | sed -n -e '/Page *size:/
+. as pdfpic*command s/Page *size: *\\([0-9.]*\\) *x *\([0-9.]*\\).*$/
+. as pdfpic*command . \\\\R@pdfpic*width (p;\\1)@
+. as pdfpic*command " \\\\R@pdfpic*height (p;\\2)@
+. as pdfpic*command /p'
+. ec
+. as pdfpic*command " > \*[pdfpic*temporary-file]
+. sy \*[pdfpic*command]
+. rm pdfpic*command
+..
+.ec
+.
.de PDFPIC
. if !\\n[.U] \{\
. pdfpic@error use of \\$0 requires GNU troff's unsafe mode \
@@ -165,19 +200,7 @@ skipping '\\$1'
. return
. \}
. ds pdfpic*temporary-file \\*[pdfpic*temporary-directory]/pdfpic\n[$$]
-.
-. \" Get image dimensions. The `tr` command to strip null bytes is
-. \" distasteful, but its necessity is imposed on us. See
-. \" <https://gitlab.freedesktop.org/poppler/poppler/-/issues/776>.
-. ec @
-. sy pdfinfo @$1 | \
-tr -d '\000' | \
-grep "Page *size" | \
-sed -e 's/Page *size: *\\([[:digit:].]*\\) *x *\\([[:digit:].]*\\).*$/\
-.nr pdfpic*width (p;\\1)\\n\
-.nr pdfpic*height (p;\\2)/' \
-> @*[pdfpic*temporary-file]
-. ec
+. pdfpic@get-image-dimensions \\$1
. if \\n[systat] \{\
. pdfpic@error retrieval of '\\$1' image dimensions failed with \
exit status \\n[systat]; skipping
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [groff] 53/115: [pdfpic]: Fix Savannah #64061.,
G. Branden Robinson <=