[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
"transparent" output and throughput, demystified
From: |
G. Branden Robinson |
Subject: |
"transparent" output and throughput, demystified |
Date: |
Fri, 30 Aug 2024 18:07:57 -0500 |
Hi folks,
Those of us who worked with groff 1.22.4 may remember a couple of
diagnostic messages that gobsmacked one with their incomprehensibility.
Here's the source code that produced them.
error("can't transparently output node at top level");
error("can't translate %1 to special character '%2'"
" in transparent throughput",
input_char_description(cc),
ci->nm.contents());
For groff 1.23.0, I silenced them with a blunt instrument, since I
concluded that they weren't entirely spurious, but nobody seemed to have
any idea what, exactly, anyone was supposed to do about them. In
general usage scenarios, a diagnostic that the user cannot address is a
diagnostic that should not be issued.
In a recent thread, Peter noted the resurrection of these messages.[1]
At that time, I promised an explanation of these long-vexing problems.
In the course of documenting my fix for Savannah #63074[2]--a process
that is fairly involved and not yet complete--I ended up writing a
change to the groff_diff(7) man page that seems to cover the bases.
Here's some language that I have queued up for my next push. First, a
comment from the man page source that summarizes where we (I) still have
work to do.
.\" TODO: When we get this giant headache generalized and adapted to the
.\" `\!` escape sequence and `device`, `output`, `cf`, and `trf`
.\" requests, move this discussion into a dedicated subsection above.
And now, the explanation.
groff_diff(7):
\X'contents' GNU troff transforms the argument to the device
control escape sequence to avoid leaking to device‐
independent output data that are unrepresentable in
that format, and to address the problem of expressing
character code points outside of the Unicode basic
Latin range in an output file format that restricts
itself to that range. (See subsection “Basic Latin”
of groff_char(7).) The typesetting of such
characters is a problem long‐solved in device‐
independent troff by the “C” command; see
groff_out(5). The expression of such characters in
other contexts, such as device extension commands,
was not addressed by the same design. Where
possible, GNU troff represents such characters in
device‐independent but non‐typesetting contexts using
its notation for Unicode special character escape
sequences; see subsection “Special character escape
forms” of groff_char(7).
GNU troff converts several ordinary characters that
typeset as non‐basic Latin code points to code points
outside that range to avoid confusion when these
characters are used in ways that are ultimately
visible, as in tag names for PDF bookmarks, which can
appear in a viewer’s navigation pane. These ordinary
characters are “'”, “-”, “^”, “`”, and “~”; others
are written as‐is.
Special characters that typeset as Unicode basic
Latin characters are translated to basic Latin
characters accordingly. For this transformation,
character translations and definitions are ignored.
So that any Unicode code point can be represented in
device extension commands, for example in an author’s
name in document metadata or as a usefully named
bookmark or hyperlink anchor, GNU troff translates
other special characters into their Unicode special
character notation. Special characters without a
Unicode representation, and escape sequences that do
not interpolate a sequence of ordinary and/or special
characters, produce warnings in category “char”.
I hope to boil some fat off of that.
I want to emphasize that error and warning diagnostics will remain
possible. But they should not occur when a document or macro package
attempts to do things that are "sane", like storing accented letters
appearing in an author's name or a section heading into a string and
that interpolating that string in a device extension/control escape
sequence.
Peter's mom(7) package is more muscular even than that; I've noticed
that in his example documents he is not shy even of including vertical
motions in author names.
contrib/mom/examples/mom-pdf.mom:.AUTHOR "Deri James" "\*[UP .5p]and" "Peter
Schaffter"
Inside the formatter, a vertical motion becomes a "node" and has no
possible representation in a device extension/control escape sequence.
In the future, I want the formatter to complain about such
impossibilities, but not yet--it's not fair to document and macro
package authors to be so prescriptive without providing a handy
mechanism for cleaning such things out of strings that are destined for
device extension/control commands. To date, solutions have included
creating a diversion, interpolating the string inside it, then using the
`asciify` or `unformat` format requests on it to strip things like
vertical motions (and `chop` of course to rip out the undesired newline
at the end of the diversion). This is painful because formatting things
into a diversion as a rule _creates_ more nodes than it eliminates.
Hence the unformatting.
It would be cleaner and simpler to provide a mechanism for processing a
string directly, discarding escape sequences (like vertical motions or
break points [with or without hyphenation). This point is even more
emphatic because of the heavy representation of special characters in
known use cases. That, to "sanitize" (or "pdfclean") such strings by
round-tripping them through a process that converts a sequence of easily
handled bytes like "\ [ 'a ]" or "\ [ u 0 4 1 1 ]" into a special
character node and then back again seems wasteful and fragile to me.
But, to get things where I'd like to see them, we an in-language string
iterator for the groff language. And because strings, macros, and
diversions can be punned with each other, in practice that means we will
need an iterator than can handle any of these. That, in turn, means
that we will also require a new conditional expression operator to test
whether an element of a string is a "node".
I haven't been able to get all of that together in the year since we
released groff 1.23. My understanding of the formatter still has
significant lacunae. Ah well. Maybe for 1.25.
So, in the meantime, my plan is to silently discard things from device
extension/control commands that an output device would not able to do
anything useful with.
Thanks for your patience with this explan-a-thon.[3]
Regards,
Branden
[1] https://lists.gnu.org/archive/html/groff/2024-08/msg00045.html
[2] https://savannah.gnu.org/bugs/?63074
[3] In my opinion, the words "special" and "transparent" are the two
most relentlessly and unhelpfully overused terms in troff
literature. They each mean several different things, and make me a
cross and grumpy (semi-official) groff maintainer. Expect
documentary reforms addressing these ambiguities.
signature.asc
Description: PGP signature
- "transparent" output and throughput, demystified,
G. Branden Robinson <=