groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 08/10: [troff]: Fix Savannah #61401.


From: G. Branden Robinson
Subject: [groff] 08/10: [troff]: Fix Savannah #61401.
Date: Sat, 30 Oct 2021 19:26:35 -0400 (EDT)

gbranden pushed a commit to branch master
in repository groff.

commit eb695ab2b5e2bae54afa102355c493bda6e29d3e
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Sat Oct 30 15:45:29 2021 +1100

    [troff]: Fix Savannah #61401.
    
    [troff]: Handle special character escape sequences that map to basic
    Latin glyphs in device control escape sequences consistently among
    output devices.
    
    * src/roff/troff/input.cpp (encode_char): Rearrange conditionals.  This
      is the logic that puts the "whatever" within a \X'whatever' escape
      sequence into GNU troff's intermediate output.  Handle stretchable and
      unstretchable space escape sequences ("\ " and \~") first.  Then, if
      the token is a special character escape sequence, retrieve its
      "contents" (glyph name).  Move the basic Latin mapping for the seven
      glyph names '-', 'aq', 'dq', 'ga', 'ha', 'rs', and 'ti' here, before
      checking whether the device description issued the
      'use_charnames_in_special' directive.  This way, the 'html' and
      'xhtml' output devices can straightforwardly embed these basic Latin
      characters in device control escapes (notably, "html:", for which the
      present convention is to follow the this tag immediately with a
      literal HTML URI, complete with `<a href>` element syntax).  If the
      special character is none of these and we should
      'use_charnames_in_special', proceed as groff 1.22.4 and earlier did.
      This is a behavior change, as was my addition of this translation
      mechanism in the first place, so...
    
    * doc/groff.texi (Postprocessor Access): Document it.
    
    * src/roff/groff/tests/device_control_escapes_express_basic_latin.sh:
      Test it.
    * src/roff/groff/groff.am (groff_TESTS): Run test.
    
    Fixes <https://savannah.gnu.org/bugs/?61401>.
---
 ChangeLog                                          | 33 ++++++++++++
 doc/groff.texi                                     | 50 ++++++++++--------
 src/roff/groff/groff.am                            |  1 +
 .../device_control_escapes_express_basic_latin.sh  | 60 ++++++++++++++++++++++
 src/roff/troff/input.cpp                           | 44 +++++++++-------
 5 files changed, 148 insertions(+), 40 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index cef7428..100a3f8 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,38 @@
 2021-10-30  G. Branden Robinson <g.branden.robinson@gmail.com>
 
+       [troff]: Handle special character escape sequences that map to
+       basic Latin glyphs in device control escape sequences
+       consistently among output devices.
+
+       * src/roff/troff/input.cpp (encode_char): Rearrange
+       conditionals.  This is the logic that puts the "whatever" within
+       a \X'whatever' escape sequence into GNU troff's intermediate
+       output.  Handle stretchable and unstretchable space escape
+       sequences ("\ " and \~") first.  Then, if the token is a special
+       character escape sequence, retrieve its "contents" (glyph name).
+       Move the basic Latin mapping for the seven glyph names '-',
+       'aq', 'dq', 'ga', 'ha', 'rs', and 'ti' here, before checking
+       whether the device description issued the
+       'use_charnames_in_special' directive.  This way, the 'html' and
+       'xhtml' output devices can straightforwardly embed these basic
+       Latin characters in device control escapes (notably, "html:",
+       for which the present convention is to follow the this tag
+       immediately with a literal HTML URI, complete with `<a href>`
+       element syntax).  If the special character is none of these and
+       we should 'use_charnames_in_special', proceed as groff 1.22.4
+       and earlier did.  This is a behavior change, as was my addition
+       of this translation mechanism in the first place, so...
+
+       * doc/groff.texi (Postprocessor Access): Document it.
+
+       * src/roff/groff/tests/\
+       device_control_escapes_express_basic_latin.sh: Test it.
+       * src/roff/groff/groff.am (groff_TESTS): Run test.
+
+       Fixes <https://savannah.gnu.org/bugs/?61401>.
+
+2021-10-30  G. Branden Robinson <g.branden.robinson@gmail.com>
+
        [troff]: Map \[ti] correctly in device control escape sequences.
 
        * src/roff/troff/input.cpp (encode_char): Fix copy-and-paste
diff --git a/doc/groff.texi b/doc/groff.texi
index 7a8dfdf..4640f6a 100644
--- a/doc/groff.texi
+++ b/doc/groff.texi
@@ -14882,11 +14882,18 @@ There are two escapes that give information directly 
to the
 postprocessor.  This is particularly useful for embedding PostScript
 into the final document.
 
-@DefreqList {device, xxx}
-@DefescListEndx {\\X, @code{'}, xxx, @code{'}}
-Embeds its argument into the @code{gtroff} output preceded with
-@w{@samp{x X}}.
+@DefreqList {device, xxx @r{@dots{}}}
+@DefescListEndx {\\X, @code{'}, xxx @r{@dots{}}, @code{'}}
+Embed all @var{xxx} arguments into GNU @code{troff} output as parameters
+to a device control command @w{@samp{x X}}.  The meaning and
+interpretation of such parameters is determined by the output driver or
+other postprocessor.
 
+@cindex @code{device} request, and copy mode
+@cindex copy mode, and @code{device} request
+@cindex mode, copy, and @code{device} request
+The @code{device} request processes its arguments in copy mode
+(@pxref{Copy Mode}).
 @cindex @code{\&}, in @code{\X}
 @cindex @code{\)}, in @code{\X}
 @cindex @code{\%}, in @code{\X}
@@ -14896,27 +14903,28 @@ Embeds its argument into the @code{gtroff} output 
preceded with
 @ifinfo
 @cindex @code{\@r{<colon>}}, in @code{\X}
 @end ifinfo
-The escapes @code{\&}, @code{\)}, @code{\%}, and @code{\:} are ignored
-within @code{\X}, @w{@samp{\ }} and @code{\~} are converted to single
-space characters.  All other escapes (except @code{\\}, which produces a
-backslash) cause an error.
-
-@cindex @code{device} request, and copy mode
-@cindex copy mode, and @code{device} request
-@cindex mode, copy, and @code{device} request
-Contrary to @code{\X}, the @code{device} request simply processes its
-argument in copy mode (@pxref{Copy Mode}).
+By contrast, within @code{\X} arguments, the escape sequences @code{\&},
+@code{\)}, @code{\%}, and @code{\:} are ignored, @code{\SP} and
+@code{\~} are converted to single space characters, and @code{\\} has
+its escape character stripped.  So that the basic Latin subset of the
+Unicode character set@footnote{that is, ISO@tie{}646:1991-IRV or,
+popularly, ``US-ASCII''} can be reliably encoded in device control
+commands, seven special character escape sequences (@samp{\-},
+@samp{\aq}, @samp{\dq}, @samp{\ga}, @samp{\ha}, @samp{\rs}, and
+@samp{\ti},) are mapped to basic Latin glyphs; see the
+@cite{groff_char@r{(7)}} man page.  The use of any other escape sequence
+in @code{\X} arguments is normally an error.
 
 @kindex use_charnames_in_special
 @pindex DESC@r{, and @code{use_charnames_in_special}}
 @cindex @code{\X}, and special characters
-If the @samp{use_charnames_in_special} keyword is set in the @file{DESC}
-file, special characters no longer cause an error; they are simply
-output verbatim.  Additionally, the backslash is represented as
-@code{\\}.
-
-@samp{use_charnames_in_special} is currently used by @code{grohtml}
-only.
+If the @code{use_charnames_in_special} directive appears in the output
+device's @file{DESC} file, the use of special character escape sequences
+is @emph{not} an error; they are simply output verbatim (with the
+exception of the seven mapped to Unicode basic Latin characters,
+discussed above).  For convenience, the backslash can be represented as
+@samp{\\}.  @code{use_charnames_in_special} is currently used only by
+@code{grohtml}.
 @endDefesc
 
 @DefreqList {devicem, xx}
diff --git a/src/roff/groff/groff.am b/src/roff/groff/groff.am
index 3140acd..fa5ef52 100644
--- a/src/roff/groff/groff.am
+++ b/src/roff/groff/groff.am
@@ -39,6 +39,7 @@ groff_TESTS = \
   src/roff/groff/tests/ab_works.sh \
   src/roff/groff/tests/adjustment_works.sh \
   src/roff/groff/tests/break_zero-length_output_line_sanely.sh \
+  src/roff/groff/tests/device_control_escapes_express_basic_latin.sh \
   src/roff/groff/tests/do_not_loop_infinitely_when_breaking_cjk.sh \
   src/roff/groff/tests/dot-cp_register_works.sh \
   src/roff/groff/tests/dot-nm_register_works.sh \
diff --git a/src/roff/groff/tests/device_control_escapes_express_basic_latin.sh 
b/src/roff/groff/tests/device_control_escapes_express_basic_latin.sh
new file mode 100755
index 0000000..99d1cee
--- /dev/null
+++ b/src/roff/groff/tests/device_control_escapes_express_basic_latin.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+#
+# Copyright (C) 2021 Free Software Foundation, Inc.
+#
+# This file is part of groff.
+#
+# groff is free software; you can redistribute it and/or modify it under
+# the terms of the GNU General Public License as published by the Free
+# Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# groff is distributed in the hope that it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+# for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+groff="${abs_top_builddir:-.}/test-groff"
+fail=
+
+# Confirm translation of a groff special character escape sequence to a
+# basic Latin character when used in a device control escape sequence.
+#
+# $1 is the special character escape _without_ the leading backslash.
+# $2 is the expected output character _shell-quoted as necessary_.
+# $3 is a human-readable glyph description for the test log.
+# $4 is the groff -T device name under test.
+check_char () {
+  sc=$1
+  output=$2
+  description=$3
+  device=$4
+  printf 'checking conversion of \%s to %s (%s) on device %s' \
+    "$sc" "$output" "$description" "$device"
+  if ! printf "\\X#\\%s %s#\n" "$sc" "$desc" | "$groff" -T$device -Z \
+    | grep -Fqx 'x X '$output' '
+  then
+    printf '...failed'
+    fail=yes
+  fi
+  printf '\n'
+}
+
+for device in utf8 html
+do
+  check_char - - "minus sign" $device
+  check_char '[aq]' "'" "neutral apostrophe" $device
+  check_char '[dq]' '"' "double quote" $device
+  check_char '[ga]' '`' "grave accent" $device
+  check_char '[ha]' ^ "caret/hat" $device
+  check_char '[rs]' '\' "reverse solidus/backslash" $device
+  check_char '[ti]' '~' "tilde" $device
+done
+
+test -z "$fail" || exit 1
+
+# vim:set autoindent expandtab shiftwidth=2 tabstop=2 textwidth=72:
diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
index 7f31f9e..23748c2 100644
--- a/src/roff/troff/input.cpp
+++ b/src/roff/troff/input.cpp
@@ -5397,25 +5397,17 @@ static node *do_non_interpreted()
 static void encode_char(macro *mac, char c)
 {
   if (c == '\0') {
-    if ((font::use_charnames_in_special) && tok.is_special()) {
-      charinfo *ci = tok.get_char(true /* required */);
-      const char *s = ci->get_symbol()->contents();
-      if (s[0] != (char)0) {
-       mac->append('\\');
-       mac->append('[');
-       int i = 0;
-       while (s[i] != (char)0) {
-         mac->append(s[i]);
-         i++;
-       }
-       mac->append(']');
-      }
-    }
-    else if (tok.is_stretchable_space()
+    if (tok.is_stretchable_space()
             || tok.is_unstretchable_space())
       mac->append(' ');
     else if (tok.is_special()) {
-      const char *sc = tok.get_char()->get_symbol()->contents();
+      const char *sc;
+      if (font::use_charnames_in_special) {
+       charinfo *ci = tok.get_char(true /* required */);
+       sc = ci->get_symbol()->contents();
+      }
+      else
+       sc = tok.get_char()->get_symbol()->contents();
       if (strcmp("-", sc) == 0)
        mac->append('-');
       else if (strcmp("aq", sc) == 0)
@@ -5430,9 +5422,23 @@ static void encode_char(macro *mac, char c)
        mac->append('\\');
       else if (strcmp("ti", sc) == 0)
        mac->append('~');
-      else
-       error("special character '%1' cannot be used within device"
-             " control escape sequence", sc);
+      else {
+       if (font::use_charnames_in_special) {
+         if (sc[0] != (char)0) {
+           mac->append('\\');
+           mac->append('[');
+           int i = 0;
+           while (sc[i] != (char)0) {
+             mac->append(sc[i]);
+             i++;
+           }
+           mac->append(']');
+         }
+         else
+             error("special character '%1' cannot be used within"
+                   " device control escape sequence", sc);
+       }
+      }
     }
     else if (!(tok.is_hyphen_indicator()
               || tok.is_dummy()



reply via email to

[Prev in Thread] Current Thread [Next in Thread]