bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #63074] [troff] support expression of arbitrary byte sequences in d


From: G. Branden Robinson
Subject: [bug #63074] [troff] support expression of arbitrary byte sequences in device control commands
Date: Tue, 12 Nov 2024 14:32:27 -0500 (EST)

Follow-up Comment #41, bug #63074 (group groff):

Hi Deri,

At 2024-11-12T13:47:33-0500, Deri James wrote:
> Follow-up Comment #40, bug #63074 (group groff):
>
> [comment #38 comment #38:]
>> One could envision three levels of support for encoding arbitrary
>> characters.
>>
>> 1.  By Unicode code point.  Reusing _groff_'s own syntax for Unicode special
>> character escape sequences was irresistibly tempting, so that's what I
>> implemented.  We have that in Git HEAD.
>> 2.  By (simple) _groff_ special character escape sequence, like \['o'] (in
>> "Cicerón").  We have that in Git HEAD too.
>> 3.  By composite special character escape sequence, like "\[o aa]", which we
>> might also use to write "Cicerón"--"Cicer\[o aa]n".  We don't have that.
>> It
>> proved to be difficult.  (The formatter warns if it encounters this syntax
>> where it can't handle it.)
>
> If you implement (3) you realise that searching a document for
> "Cicerón" (which was formed using \[o aa]) may not be found.

I think that's not correct, except in `output`/`\!` arguments, where you
get exactly the "grout" you ask for.

Here's the commented out test, preceded by the corresponding input:

input='.
.ds h Caf\[e aa] Hyphen-Minus and \[rs]\[u2010]
\X"ps:exec 5:\\X     [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark"
\!x X ps:exec 6:\!     [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark
.device ps:exec 7:device [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT pdfmark
.output x X ps:exec 8:output [/Dest /pdf:bm1 /Title (\*[h]) /Level 1 /OUT
pdfmark
.'

#echo "checking practical bookmarking with device request" >&2
#printf "%s\n" "$output" \
#  | grep -Fqx 'x X ps:exec 7:device [/Dest /pdf:bm1 /Title (Caf\[u00E9]
Hyphen-Minus and \\[u2010]) /Level 1 /OUT pdfmark' \
#  || wail

As you can see, I expect `\[e aa]` to be transformed to \[u00E9].  But
this doesn't work presently, because the `device` request reads its
argument in copy mode, unlike the `\X` escape sequence.

The goal is to have `\[u00E9]` show up in the output no matter how it
was spelled in the input: `\['e]`, `\[e aa]`, `\[e ']`, `\[u0065_0301]`,
or indeed `\[u00E9]`.

That should present no complications for searching.

(We may want to someday migrate to a different policy for Unicode
decomposition--I leave that problem for when it ripens.)

> So I prefer not allowing composites, unless you have a zinger argument
> for them.

My zinger is simply that no user should have to remember this absurdly
esoteric detail.  Right now, they get a warning if they bump into it.

> Yes, both of these work perfectly:-
>
> printf "Caf\\['e]\n.br\n.output x X ps:exec [/Dest /pdf:bm1 /Title (Eat at
> Joe's Caf\\['e].) /Level 1 /OUT pdfmark\n" | test-groff -T pdf | okular -
>
> printf "Caf\\['e]\n.br\n.device ps:exec [/Dest /pdf:bm1 /Title (Eat at Joe's
> Caf\\['e].) /Level 1 /OUT pdfmark\n" | test-groff -T pdf | okular -

That was one of my goals!

I'm pleased that _something_ worked out after all this struggle.  :-O

Best,
Branden



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63074>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]