[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: an observation and proposal about hyphenation codes
From: |
G. Branden Robinson |
Subject: |
Re: an observation and proposal about hyphenation codes |
Date: |
Tue, 6 Aug 2024 13:33:28 -0500 |
Hi Dave,
At 2024-08-06T12:08:29-0500, Dave Kemper wrote:
> On Tue, Aug 6, 2024 at 9:48 AM G. Branden Robinson
> I'm [...]certain it has to do with when latin1.tmac is loaded and when
> it isn't.
>
> $ echo ".tm Hi, I'm latin1.tmac!" >> tmac/latin1.tmac
> $ groff-latest -a < /dev/null
> $ groff-latest -Tutf8 < /dev/null
> Hi, I'm latin1.tmac!
> $ groff-latest -Tascii < /dev/null
> $
[...]
> You DID reproduce it. Look at the first output line of each of your
> test cases:
Yes, you've got it. I:
1. hyperfocused on the full-caps RÉSUMÉ case because that was the
failing instance in a regression test recently added to the suite (a
case contributed by you, as I recall), and
2. forgot that "en.tmac" is going to have to select a character
encoding even if none of the hyphenation patterns in "hyphen.en"
actually use characters from the Latin-1 Supplement (and they
don't).
You can even/still override the language's choice of character encoding.
Caveat dictator.
$ ./build/test-groff -Tps -a -m latin1 -ww -Wbreak
EXPERIMENTS/resume-special.groff
.hy=4
<beginning of page>
r<'e><hy>
sum<'e>
r<'e><hy>
sum<'e>
R<'E><hy>
SUM<'E>
$ ./build/test-groff -Tps -a -m latin9 -ww -Wbreak
EXPERIMENTS/resume-special.groff
.hy=4
<beginning of page>
r<'e><hy>
sum<'e>
r<'e><hy>
sum<'e>
R<'E><hy>
SUM<'E>
> OK, now I'm certain.
>
> > But as it happens I can't reproduce this misbehavior anyway.
>
> > $ ./build/test-groff -Tutf8 -ww -Wbreak EXPERIMENTS/resume-special.groff
> > troff:EXPERIMENTS/resume-special.groff:2: warning: setting computed line
> > length 0u to device horizontal motion quantum
> > ré‐
> > sumé
>
> vs
>
> > $ ./build/test-groff -Tps -a -ww -Wbreak EXPERIMENTS/resume-special.groff
> > <beginning of page>
> > r<'e>sum<'e>
>
> This is the only line in your test file output before any .hcode
> requests were run, so this shows the default hyphenation for the
> system.
Well, kind of. The hyphenation language (`.hla`) and hyphenation mode
(`.hy`) are the same for these two scenarios. What's happened is that
these requests in "latin1.tmac" didn't get read, because the file wasn't
sourced at all.
.hcode é é
.hcode É é
Therefore these characters did not acquire nonzero hyphenation codes,
and therefore were not valid hyphenation breakpoints.
Does this make sense?
If so, what I will do is make "en.tmac" `.mso latin1.tmac`.
And add another regression test case.
Thanks for the report!
The subtleties involved in machine-driven hyphenation seem to be
endless. Someone ought to write a Ph.D. thesis about how hard it is.[1]
Regards,
Branden
[1] Yes, I know they did. I added a citation of it to the groff Texinfo
manual a while back.
signature.asc
Description: PGP signature