[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: 1.23 prints some strange error
From: |
G. Branden Robinson |
Subject: |
Re: 1.23 prints some strange error |
Date: |
Thu, 26 Oct 2023 09:35:07 -0500 |
At 2023-10-25T16:20:27+0200, Walter Alejandro Iglesias wrote:
> What you did above is not the step by step way I posted to reproduce
> the bug. Of course it won't be helpful if you overlook it.
You've already gotten what I would have thought to be a sufficient
explanation of the diagnostic messages you saw.
1. GNU troff (the formatter program) doesn't accept UTF-8 input;
2. Your list of hyphenation exceptions (`hw` requests) is formatted in
UTF-8;
3. Your document is using `mso` rather than `so` requests to load the
list of hyphenation exceptions;
4. The soelim(1) program does not operate on `mso` requests (nor should
it, in my opinion); therefore,
5. Your input confuses the formatter, producing diagnostics.
Why these exact diagnostics?
Well, let's have a look at the first.
$ nroff -M. ./doc.tr
troff:./list.tr:1: error: expected ordinary or special character, got an
escaped '%'
$ head -n 1 list.tr
.hw a-hí
$ hd list.tr
00000000 2e 68 77 20 61 2d 68 c3 ad 0a 2e 68 77 20 61 2d |.hw a-h....hw a-|
00000010 c3 b1 6f 0a 2e 68 77 20 c3 a1 72 2d 62 6f 6c 0a |..o..hw ..r-bol.|
00000020 2e 68 77 20 63 75 2d 62 72 c3 ad 2d 61 0a 2e 68 |.hw cu-br..-a..h|
00000030 77 20 65 2d 74 c3 a9 2d 72 65 2d 6f 0a 2e 68 77 |w e-t..-re-o..hw|
00000040 20 63 61 2d 6d 69 c3 b3 6e 0a 2e 68 77 20 c3 ba | ca-mi..n..hw ..|
00000050 2d 74 65 2d 72 6f 0a 2e 68 77 20 70 69 6e 2d 67 |-te-ro..hw pin-g|
00000060 c3 bc 69 2d 6e 6f 0a |..i-no.|
00000067
GNU troff reads line 1 of list.tr, interpreting it as ISO Latin-1. The
bytes of interest are therefore 0xc3 and 0xad.
C3 is "LATIN CAPITAL LETTER A WITH TILDE".
AD is "SOFT HYPHEN".
groff_char(7) explains what the formatter does with the latter.
Eight‐bit encodings and Latin‐1 supplement
ISO 646 is a seven‐bit code encoding 128 code points; eight‐bit
codes are twice the size. ISO 8859‐1 and code page 1047
allocated the additional space to what Unicode calls “C1
controls” (control characters) and the “Latin‐1 supplement”. The
C1 controls are neither printable nor usable as groff input.
Two Latin‐1 supplement characters are handled specially on input.
troff never produces them as output.
NBSP encodes a no‐break space; it is mapped to \~, the
adjustable non‐breaking space escape sequence.
SHY encodes a soft hyphen; it is mapped to \%, the hyphenation
control escape sequence.
The formatter does not expect to see a hyphen control escape sequence
inside the definition of a hyphenation exception, and it complains if it
gets one.
That is why you got the error message you did.
That is why my advice is to either maintain files you `mso` in Latin-1
(or ASCII), or go ahead and maintain them in UTF-8, but as ".in" files
that your Makefile converts to input GNU troff will accept, using
preconv.
list.tr: list.tr.in
preconv -e utf-8 $< > $@
GNU troff does not reject code points A0-FF as invalid because they
aren't invalid; every single one might be found in a valid Latin-1
document. The formatter _does_ reject code points 80-9F as input. That
might not come up when inadvertently giving the formatter (valid) UTF-8
input, however; I haven't done the arithmetic, but it seems possible to
me that some or all of these would be treated as "overlong encodings" of
Basic Latin code points.
See, e.g., "Canonicalization of Non-Shortest Form UTF-8".
https://websec.github.io/unicode-security-guide/character-transformations/
As it happens, GNU troff uses the C1 Control block (U+0080..U+009F) for
internal purposes. That is one of the reasons it's non-trivial to
covert it to understand UTF-8 natively, an outcome pretty much everyone
desires.
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/input.h?h=1.23.0
Regards,
Branden
signature.asc
Description: PGP signature
- Re: 1.23 prints some strange error, (continued)
- Re: 1.23 prints some strange error, Walter Alejandro Iglesias, 2023/10/04
- Re: 1.23 prints some strange error, Bjarni Ingi Gislason, 2023/10/04
- Re: 1.23 prints some strange error, Bjarni Ingi Gislason, 2023/10/04
- Re: 1.23 prints some strange error, Walter Alejandro Iglesias, 2023/10/05
- Re: 1.23 prints some strange error, Walter Alejandro Iglesias, 2023/10/05
- Re: 1.23 prints some strange error, Dave Kemper, 2023/10/12
- Re: 1.23 prints some strange error, G. Branden Robinson, 2023/10/25
- Re: 1.23 prints some strange error, Walter Alejandro Iglesias, 2023/10/25
- Re: 1.23 prints some strange error, G. Branden Robinson, 2023/10/25
- Re: 1.23 prints some strange error, Walter Alejandro Iglesias, 2023/10/25
- Re: 1.23 prints some strange error,
G. Branden Robinson <=
- Re: 1.23 prints some strange error, Walter Alejandro Iglesias, 2023/10/26
- Re: 1.23 prints some strange error, G. Branden Robinson, 2023/10/26