Hi Alejandro,
If you know a (hopefully trivial) filter that transforms any
multi-byte sequences in exactly the number of bytes that will be
visible (and hopefully those bytes should be similar to the original
UTF-8 content), that would greatly help.
...
tbl man1/memusage.1 \
| eqn -Tutf8 \
| troff -man -t -M ./etc/groff/tmac -m checkstyle -rCHECKSTYLE=3 \
-ww -Tutf8 -rLL=78n \
| grotty -c \
| col -b -x \
| toplaintext \
| (! grep -n '.\{80\}.' >&2)
I'm unclear on the problem trying to be solved. grep(1) in a UTF-8
locale already treats a multi-byte UTF-8 sequence for one rune as
matched by ā.ā which leaves the terminal's escape sequences, but they've
been disabled by grotty's ā-cā, and over-striking for underlining, dealt
with by col(1).
In other words, what's wrong with
zcat man7/groff_char.7.gz |
eqn -Tutf8 |
troff -man -t -ww -Tutf8 -rLL=78n |
grotty -c |
col -pbx |
(! grep -n '.\{80\}.' >&2)
Does it miss overlong lines or wrongly report a short line as too long?
If so, an example would help target further suggestions.