groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

UTF-8 in grout and a performance regression (was: synchronous and asynch


From: G. Branden Robinson
Subject: UTF-8 in grout and a performance regression (was: synchronous and asynchronous grout)
Date: Thu, 19 Dec 2024 12:15:18 -0600

Hi Deri,

At 2024-12-19T17:20:09+0000, Deri wrote:
> On Thursday, 19 December 2024 00:34:49 GMT G. Branden Robinson wrote:
> > Is [UTF-8-encoded characters in grout] something you want?  What
> > should the consequences of invalid or incomplete UTF-8 sequences be?
>
> I don't mind, I just thought you considered readability of the grout
> file important [1],

I do.  ISO 646/ASCII is readable practically everywhere.  There remain
places where UTF-8, at least when exercising code points greater than
U+007F, is not, and even if the processing stream supports UTF-8
perfectly, lack of font coverage can make UTF-8 unreadable again.

> currently lots of \[uXXXX] and \(XX are harder to read than UTF-8.

I don't intend for the latter form to survive into grout (and I don't
think it does--please correct me if you observe a case), unless someone
deliberately puts it there literally, as when overriding the *roff
escape character...

$ printf '\\e(xx\n' | nroff -Z | grep '^t'
t\(xx

...or exercising provisions for escaping interpretation by the
formatter, like .cf, .trf, .output, and `\!`.

> It could be a setting in the DESC file so you don't need to change
> output drivers in one go.

I'd be fine with that, for the same reason I mused about a "caveman
mode" where the "tcommand" DESC directive is ignored and we fall back to
fully synchronous 'c', 'C', and 'h' sequences.

Supporting emission of "plain ASCII, damn it" is a high priority for me.

I concede that being able to opt-in to production of grout like this:

tさざ波

would deny advocates of Heirloom Doctools and Plan 9 troffs one thing to
punk on groff for, even if I think that, with respect to this point,
such advocates spend a lot more time reading and posting about formatter
feature comparison check lists than using any program to set type.

Attending too closely to fanboi culture can distort one's priorities.

(UTF-8 in document _input_ and TTF/OTF support are much more serious and
practical concerns, in my opinion.)

> I would have hoped that troff would barf at illegal UTF-8 sequences
> rather than pass them on through grout!

Certainly my intention (except via the now-unsafe-mode-only `cf`).[A]
Hobnail boots get mushy if one doesn't go stomping on invalid input once
in a while.

> [1] This was your stated reason for not committing "stringhex" from my
> branch, even though I told you I had a version of pdf.tmac which did
> not pollute the grout file with hex,

Does this mean we agreed that emitting hexadecimal sequences in the way
"stringhex" did was not great for readability?  Its "pollution" was not
even limited to grout, but showed up inside the formatter too.

<https://lists.gnu.org/archive/html/groff/2024-02/msg00027.html>:
>>> If I'm debugging using troff and dump the string/macro list, then I
>>> envision it being disheartening to see something like this.
>>>
>>> .pm
>>> PDFLB   9
>>> pdfswitchtopage 32
>>> pdfnote 380
>>> pdf:note-T      57
>>> pdfpause        29
>>> PDFBOOKMARK.VIEW        21
>>> pdf:look(0073007500700065007200630061006c006900660072006100670069006c0069007300740069006300650078007000690061006c00690064006f00630069006f007500732602)
>>>  41
>>> pdfmark 31
>>> pdftransition   58
>>> pdfbackground   40
>>> pdfpagenumbering        37
>>> pdfbookmark     1677

I regarded the foregoing naming convention as an uncomfortable barrier
to observability.  Do you disagree?

> instead you decided to introduce a substandard solution during my
> sabbatical. When I told you that one particular document was now
> taking over 13 minutes to produce, you did say you would need to do
> something about it.

I haven't been able to reproduce it or anything like it.  I haven't
experienced any noticeable rendering time degradation at all, and I use
bleeding-edge groff pretty much daily.  I also haven't heard complaints
from Alex Colomar, who produces documents even more gigantic than
groff-man-pages.pdf.

Can you send me an exhibit that reproduces the problem?

Also, can you tell me what apart from your complaint about performance
renders my solution substandard?  Equivalently, assuming I can make my
solution performant again, would you regard it as standard?  If not,
why not?

Regards,
Branden

[A] https://lists.gnu.org/archive/html/groff-commit/2024-12/msg00082.html

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]