[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Fwd: [groff] 28/28: [pdf]: Implement linear bookmark tag search.
From: |
G. Branden Robinson |
Subject: |
Fwd: [groff] 28/28: [pdf]: Implement linear bookmark tag search. |
Date: |
Mon, 4 Mar 2024 06:52:42 -0600 |
Hi folks,
The following change could probably use some additional eyeballs. It's
one of the things Deri and I disagreed about.
First, I'd like to know if it horrifies anyone, or if anyone has
concerns about the performance hit. I'm willing to pay a little bit of
performance if I can buy robustness with it, and I think that's what
this does. But that's me.
The other thing to ask is of Peter: assuming you are among the
non-horrified, would you like me to prepare a patch to om.tmac to
migrate it to this new `pdf:lookup` macro?
If that is done (regardless of who does it), I can also chop out the `ie
d PRINTSTYLE` branches from pdf.tmac shown below.
It's becoming clear to me how to add hyperlink support to the ms, me,
and mm packages.
Regards,
Branden
----- Forwarded message from "G. Branden Robinson"
<g.branden.robinson@gmail.com> -----
Date: Mon, 4 Mar 2024 07:42:40 -0500 (EST)
From: "G. Branden Robinson" <g.branden.robinson@gmail.com>
To: groff-commit@gnu.org
Subject: [groff] 28/28: [pdf]: Implement linear bookmark tag search.
gbranden pushed a commit to branch master
in repository groff.
commit cd9fde325fd889f673f9603b43e1204b1fde42fc
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Mon Mar 4 05:24:40 2024 -0600
[pdf]: Implement linear bookmark tag search.
* tmac/pdf.tmac (pdf:lookup): Given an argument, search defined bookmark
tags for a match and return one (if found) in the string
`pdf:lookup-result`, which is defined but empty if there is no match.
Previously (and, actually, still--see below), lookups were O(1)
because strings named `pdf:look($TAG_NAME)` were defined. The speed
was great but unfortunately, in practical use, tags often got *roff
escape sequences stuck into them, which drew diagnostic messages from
the formatter and could defeat the matches.
(pdfbookmark, pdf*href-M): Use the new mechanism to record a bookmark
tag if `PRINTSTYLE` (a mom(7) macro) is _not_ defined, so as to not
regress documents using that package. Store the tag text in the
string `pdf:bm\\n[pdf:bm.nr].tag`; every PDF bookmark in a groff
document gets a serial number already.
(pdf*href): Use the new mechanism to call `pdf:lookup` and locate a
match for the desired tag.
* doc/GMPfront.t.in (an*cln): Delete this macro. It iterated through a
string, scrubbing it of `\%` escape sequences. Much more exotic
things could be placed in bookmark tags; we were fortunate that man
page cross references tend to stick to ASCII, plus `\%` to suppress
hyphenation. But that's only suitable for man page references; if we
want taggable (sub)section headings, like groff_mmse(7)'s "Se ocksÄ",
we need strings, not just valid groff identifiers.
(an*bookmark): Populate `an*page-ref-nm` without cleaning it first.
So what's the performance hit? I built the tree before this commit,
then ran the following simple script and saved the output. Applied the
commit and did it again.
target=doc/groff-man-pages.pdf
date
for n in $(seq 20)
do
rm -f $target
time -v make $target
done
date
This 381-page PDF's generation slowed from 5.99-6.71 seconds to
7.58-8.49 seconds.
Scraping out the elapsed time and flinging these to GNU datamash:
$ datamash mean 1 sstdev 1 < 20times-before.data
6.3565 0.22806451072587
$ datamash mean 1 sstdev 1 < 20times-after.data
8.0925 0.2556904995868
...for a mean performance hit of 27.3%.
If this is intolerable, then the priority for resolving Savannah #62264
could be increased. Once we have properly iterable strings (and a
strings.tmac "standard library") it will be straightforward for people
to write (and contribute!) string-hashing algorithms in *roff.
---
ChangeLog | 32 ++++++++++++++++++++++++++++++++
doc/GMPfront.t.in | 13 +------------
tmac/pdf.tmac | 48 ++++++++++++++++++++++++++++++++++++++++++------
3 files changed, 75 insertions(+), 18 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index a58151907..f6e0587ea 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,35 @@
+2024-03-04 G. Branden Robinson <g.branden.robinson@gmail.com>
+
+ [pdf]: Implement linear bookmark tag search.
+
+ * tmac/pdf.tmac (pdf:lookup): Given an argument, search defined
+ bookmark tags for a match and return one (if found) in the
+ string `pdf:lookup-result`, which is defined but empty if there
+ is no match. Previously (and, actually, still--see below),
+ lookups were O(1) because strings named `pdf:look($TAG_NAME)`
+ were defined. The speed was great but unfortunately, in
+ practical use, tags often got *roff escape sequences stuck into
+ them, which drew diagnostic messages from the formatter and
+ could defeat the matches.
+ (pdfbookmark, pdf*href-M): Use the new mechanism to record a
+ bookmark tag if `PRINTSTYLE` (a mom(7) macro) is _not_ defined,
+ so as to not regress documents using that package. Store the
+ tag text in the string `pdf:bm\\n[pdf:bm.nr].tag`; every PDF
+ bookmark in a groff document gets a serial number already.
+ (pdf*href): Use the new mechanism to call `pdf:lookup` and
+ locate a match for the desired tag.
+
+ * doc/GMPfront.t.in (an*cln): Delete this macro. It iterated
+ through a string, scrubbing it of `\%` escape sequences. Much
+ more exotic things could be placed in bookmark tags; we were
+ fortunate that man page cross references tend to stick to ASCII,
+ plus `\%` to suppress hyphenation. But that's only suitable for
+ man page references; if we want taggable (sub)section headings,
+ like groff_mmse(7)'s "Se ocksÄ", we need strings, not just valid
+ groff identifiers.
+ (an*bookmark): Populate `an*page-ref-nm` without cleaning it
+ first.
+
2024-03-04 G. Branden Robinson <g.branden.robinson@gmail.com>
* doc/GMPfront.t.in: Resync `MR` replacement with its
diff --git a/doc/GMPfront.t.in b/doc/GMPfront.t.in
index 367190478..4390ea5f3 100644
--- a/doc/GMPfront.t.in
+++ b/doc/GMPfront.t.in
@@ -7,17 +7,6 @@
.nr PDFOUTLINE.FOLDLEVEL 1
.defcolor pdf:href.colour rgb 0.00 0.25 0.75
.pdfinfo /Title "groff Collected Reference Pages"
-.de an*cln
-. ds \\$1
-. als an*cln:res \\$1
-. shift
-. ds an*cln:res \\$*\"
-. ds an*cln:chr \\$*
-. substring an*cln:chr 0 0
-. if '\%'\\*[an*cln:chr]' \{\
-. substring an*cln:res 1
-. \}
-..
.
.de END
..
@@ -25,7 +14,7 @@
.am reload-man END
.de an*bookmark
. ie (\\\\$1=1) \{\
-. an*cln an*page-ref-nm \\\\$2\"
+. ds an*page-ref-nm \\\\$2\"
. pdfbookmark -T "\\\\*[an*page-ref-nm]" \\\\$1 \\\\$2
. \}
. el .pdfbookmark \\\\$1 \\\\$2
diff --git a/tmac/pdf.tmac b/tmac/pdf.tmac
index e26cd1cd0..3824e7c6d 100644
--- a/tmac/pdf.tmac
+++ b/tmac/pdf.tmac
@@ -191,6 +191,24 @@ am solely responsible for any bugs I may have introduced
into this file.
.\"
.nr PDFOUTLINE.FOLDLEVEL 10000
.\"
+.\" Search defined bookmarks for a tag matching $1. This gets around
+.\" problems with *roff escape sequences embedded in identifiers (which
+.\" is not allowed by the language syntax), and the need to tediously
+.\" scrub the strings of them (and throw diagnostics in the user's face
+.\" if we don't do a good enough job) at the expense of an O(n) search
+.\" every time we reference a bookmark instead of an O(1) one.
+.\"
+.de pdf:lookup
+.nr pdf:index 0 1
+.ds pdf:lookup-result \" empty
+.while d pdf:bm\\n+[pdf:index].tag \{\
+. if '\\$1'\\*[pdf:bm\\n[pdf:index].tag]' \{\
+. ds pdf:lookup-result \\*[pdf:bm\\n[pdf:index].tag]\"
+. break
+. \}
+. \}
+..
+.
.\" The actual job of creating an outline reference
.\" is performed by the "pdfbookmark" macro.
.\"
@@ -267,8 +285,14 @@ am solely responsible for any bugs I may have introduced
into this file.
. rm pdf:clean
. ev
. tr \[em]\[em]
-. ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned]
-. if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned]
+. ie d PRINTSTYLE \{\
+. ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\*[pdf:cleaned]
+. if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME])
\\*[pdf:cleaned]
+. \}
+. el \{\
+. ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME]
+. if d PDF.EXPORT .tm .ds pdf:bm\\n[pdf:bm.nr].tag
\\*[PDFBOOKMARK.NAME]
+. \}
. pdfmark /Dest /\\*[PDFBOOKMARK.NAME] /View [\\*[PDFBOOKMARK.VIEW]] /DEST
. nop \!x X ps:exec [/Dest /\\*[PDFBOOKMARK.NAME] /Title
(\\*[pdf:cleaned]) /Level \\n[pdf:bm.lev] /OUT pdfmark
. pdf:href.options.clear
@@ -589,8 +613,14 @@ am solely responsible for any bugs I may have introduced
into this file.
.\"
.ds PDFBOOKMARK.NAME "\\*[pdf:href-N]\\*[pdf:href-D]
.pdf*href.set \\*[PDFBOOKMARK.NAME] \\$1
-.ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
-.if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
+.ie d PRINTSTYLE \{\
+. ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
+. if dPDF.EXPORT .tm .ds pdf:look(\\*[PDFBOOKMARK.NAME]) \\$*
+. \}
+.el \{\
+. ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME]
+. if d PDF.EXPORT .tm .ds pdf:bm\\n[pdf:bm.nr].tag \\*[PDFBOOKMARK.NAME]
+. \}
.\"
.\"
.\" Irrespective of whether this marker is created, or not,
@@ -704,8 +734,14 @@ am solely responsible for any bugs I may have introduced
into this file.
. ds PDFHREF.DESC \\$*
. \}
. el \{\
-. ie dpdf:look(\\*[pdf:href-D]) .ds PDFHREF.DESC
\\*[pdf:look(\\*[pdf:href-D])]
-. el .ds PDFHREF.DESC Unknown
+. ds PDFHREF.DESC Unknown
+. ie d PRINTSTYLE \{\
+. if dpdf:look(\\*[pdf:href-D]) .ds PDFHREF.DESC
\\*[pdf:look(\\*[pdf:href-D])]
+. \}
+. el \{\
+. pdf:lookup \\*[pdf:href-D]
+. ie !'\\*[pdf:lookup-result]'' .ds PDFHREF.DESC
\\*[\\*pdf:lookup-result]
+. \}
. \}
. \" Apply border and colour specifications to the PDFMARK string
. \" definition, as required.
_______________________________________________
Groff-commit mailing list
Groff-commit@gnu.org
https://lists.gnu.org/mailman/listinfo/groff-commit
----- End forwarded message -----
signature.asc
Description: PGP signature
- Fwd: [groff] 28/28: [pdf]: Implement linear bookmark tag search.,
G. Branden Robinson <=