groff-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[groff] 08/11: [grog]: Refactor preprocessor inference.


From: G. Branden Robinson
Subject: [groff] 08/11: [grog]: Refactor preprocessor inference.
Date: Sat, 31 Jul 2021 10:36:30 -0400 (EDT)

gbranden pushed a commit to branch master
in repository groff.

commit 53a996449742e3b50ae93393088c8d362f8bec20
Author: G. Branden Robinson <g.branden.robinson@gmail.com>
AuthorDate: Sat Jul 31 18:03:39 2021 +1000

    [grog]: Refactor preprocessor inference.
    
    * src/utils/grog/grog.pl: Refactor preprocessor inference.
      - Add new list, `inferred_preprocessor`.
      - Drop preprocessor-related keys from `Groff` hash.
      - Drop scalar `inside_tbl_table`.
    
      (do_line): Set up hash `preprocessor_for_macro`.  Detect preprocessor
      macros the way the preprocessors do, explained in a comment.  Respect
      AT&T compatibility mode when doing so.  Build list of inferred
      preprocessors.  This replaces the extensive and gaseous series of `if`
      statements that manipulated `Groff` hash.  Be more careful when
      pattern-matching request/macro names, since roff identifiers can
      include Perl regex metacharacters.
    
      (infer_preprocessors): Completely replace.  Set up a hash
      `option_for_preprocessor` mapping preprocessor names to groff options
      (where applicable).  Append to `command` and `preprocessor` lists as
      appropriate.  Sort the preprocessor options so they don't move around
      in the argument list depending on the order of their macros'
      appearance in the input.
    
    * src/utils/grog/tests/smoke-test.sh: Update test cases to
      expect preprocessor options to show up in sorted order.
    
    Thanks to Dave Kemper for pointing out to me that the preprocessors
    don't parse roff control lines the way a roff does.
---
 ChangeLog                          |  21 ++++
 src/utils/grog/grog.pl             | 206 ++++++++++++-------------------------
 src/utils/grog/tests/smoke-test.sh |   4 +-
 3 files changed, 88 insertions(+), 143 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 1d4228b..05238df 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,26 @@
 2021-07-31  G. Branden Robinson <g.branden.robinson@gmail.com>
 
+       * src/utils/grog/grog.pl: Refactor preprocessor inference.
+         - Add new list, `inferred_preprocessor`.
+         - Drop preprocessor-related keys from `Groff` hash.
+         - Drop scalar `inside_tbl_table`.
+       (do_line): Set up hash `preprocessor_for_macro`.  Detect
+       preprocessor macros the way the preprocessors do, explained in a
+       comment.  Respect AT&T compatibility mode when doing so.  Build
+       list of inferred preprocessors.  This replaces the extensive and
+       gaseous series of `if` statements that manipulated `Groff` hash.
+       (infer_preprocessors): Completely replace.  Set up a hash
+       `option_for_preprocessor` mapping preprocessor names to groff
+       options (where applicable).  Append to `command` and
+       `preprocessor` lists as appropriate.  Sort the preprocessor
+       options so they don't move around in the argument list depending
+       on the order of their macros' appearance in the input.
+
+       * src/utils/grog/tests/smoke-test.sh: Update test cases to
+       expect preprocessor options to show up in sorted order.
+
+2021-07-31  G. Branden Robinson <g.branden.robinson@gmail.com>
+
        * src/utils/grog/grog.pl: Drop dead code.  Delete global
        hash `preprocs_tmacs`, unused since commit b0de53c9, 30 June.
 
diff --git a/src/utils/grog/grog.pl b/src/utils/grog/grog.pl
index c8a0930..e6cad5b 100644
--- a/src/utils/grog/grog.pl
+++ b/src/utils/grog/grog.pl
@@ -42,6 +42,7 @@ my $groff_version = 'DEVELOPMENT';
 
 my @command = ();              # the constructed groff command
 my @requested_package = ();    # arguments to '-m' grog options
+my @inferred_preprocessor = ();        # preprocessors the document uses
 my $do_run = 0;                        # run generated 'groff' command
 my $use_compatibility_mode = 0;        # is -C being passed to groff?
 
@@ -109,27 +110,6 @@ my @macro_man_or_ms = ('B', 'I', 'BI',
 my %user_macro;
 my %Groff =
   (
-   # preprocessors
-   'chem' => 0,
-   'eqn' => 0,
-   'gperl' => 0,
-   'grap' => 0,
-   'grn' => 0,
-   'gideal' => 0,
-   'gpinyin' => 0,
-   'lilypond' => 0,
-
-   'pic' => 0,
-   'PS' => 0,          # opening for pic
-   'PE' => 0,          # closing for pic
-   'PF' => 0,          # alternative closing for pic
-
-   'refer' => 0,
-   'refer_open' => 0,
-   'refer_close' => 0,
-   'soelim' => 0,
-   'tbl' => 0,
-
    # for mdoc and mdoc-old
    # .Oo and .Oc for modern mdoc, only .Oo for mdoc-old
    'Oo' => 0,          # mdoc and mdoc-old
@@ -152,7 +132,6 @@ my $inferred_main_package = '';
 # find .TH in ms(7) documents only between .TS and .TE calls, and in
 # man(7) documents only as the first macro call.
 my $have_seen_first_macro_call = 0;
-my $inside_tbl_table = 0;
 # man(7) and ms(7) use many of the same macro names; do extra checking.
 my $man_score = 0;
 my $ms_score = 0;
@@ -314,6 +293,42 @@ sub do_line {
 
   return unless ($line =~ /^[.']/);    # Ignore text lines.
 
+  # Perform preprocessor checks; they scan their inputs using a rump
+  # interpretation of roff(7) syntax that requires the default control
+  # character and no space between it and the macro name.  In AT&T
+  # compatibility mode, no space (or newline!) is required after the
+  # macro name, either.  We mimic the preprocessors themselves; eqn(1),
+  # for instance, does not recognize '.EN' if '.EQ' has not been seen.
+  my %preprocessor_for_macro = (
+    'EQ', 'eqn',
+    'G1', 'grap',
+    'GS', 'grn',
+    'PS', 'pic',
+    '[',  'refer',
+    #'so', 'soelim', # Can't be inferred this way; see grog man page.
+    'TS', 'tbl',
+    'cstart',   'chem',
+    'lilypond', 'glilypond',
+    'Perl',     'gperl',
+    'pinyin',   'gpinyin',
+  );
+
+  my $boundary = '\\b';
+  $boundary = '' if ($use_compatibility_mode);
+
+  if ($line =~ /^\.(\w\w)$boundary/ || $line =~ /^\.(\[)/) {
+    my $macro = $1;
+    # groff identifiers can have extremely weird characters in them.
+    # The ones we care about are conventionally named, but me(7)
+    # documents can call macros like '+c', so quote carefully.
+    if (grep(/^\Q$macro\E$/, keys %preprocessor_for_macro)) {
+      my $preproc = $preprocessor_for_macro{$macro};
+      if (!grep(/$preproc/, @inferred_preprocessor)) {
+       push @inferred_preprocessor, $preproc;
+      }
+    }
+  }
+
   # Normalize control lines; convert no-break control character to the
   # regular one and remove unnecessary whitespace.
   $line =~ s/^['.]\s*/./;
@@ -373,93 +388,14 @@ sub do_line {
     return;
   }
 
-  # Ignore all other requests.
-  return if (grep(/$command/, @request));
+  # Ignore all other requests.  Again, macro names can contain Perl
+  # regex metacharacters, so be careful.
+  return if (grep(/^\Q$command\E$/, @request));
 
   $have_seen_first_macro_call = 1;
 
 
   ######################################################################
-  # preprocessors
-
-  if ( $command =~ /^(cstart)|(begin\s+chem)$/ ) {
-    $Groff{'chem'}++;          # for chem
-    return;
-  }
-  if ( $command =~ /^EQ$/ ) {
-    $Groff{'eqn'}++;           # for eqn
-    return;
-  }
-  if ( $command =~ /^G1$/ ) {
-    $Groff{'grap'}++;          # for grap
-    return;
-  }
-  if ( $command =~ /^Perl/ ) {
-    $Groff{'gperl'}++;         # for gperl
-    return;
-  }
-  if ( $command =~ /^pinyin/ ) {
-    $Groff{'gpinyin'}++;               # for gperl
-    return;
-  }
-  if ( $command =~ /^GS$/ ) {
-    $Groff{'grn'}++;           # for grn
-    return;
-  }
-  if ( $command =~ /^IS$/ ) {
-    $Groff{'gideal'}++;                # preproc gideal for ideal
-    return;
-  }
-  if ( $command =~ /^lilypond$/ ) {
-    $Groff{'lilypond'}++;      # for glilypond
-    return;
-  }
-
-  # pic is opened by .PS and can be closed by either .PE or .PF
-  if ( $command =~ /^PS$/ ) {
-    $Groff{'PS'}++;            # opening for pic
-    return;
-  }
-  if ( $command =~ /^PE$/ ) {
-    $Groff{'PE'}++;            # closing for pic
-    return;
-  }
-  if ( $command =~ /^PF$/ ) {
-    $Groff{'PF'}++;            # alternate closing for pic
-    return;
-  }
-
-  if ( $command =~ /^R1$/ ) {
-    $Groff{'refer'}++;         # for refer
-    return;
-  }
-  if ( $command =~ /^\[$/ ) {
-    $Groff{'refer_open'}++;    # for refer open
-    return;
-  }
-  if ( $command =~ /^\]$/ ) {
-    $Groff{'refer_close'}++;   # for refer close
-    return;
-  }
-  if ( $command =~ /^TS$/ ) {
-    $Groff{'tbl'}++;           # for tbl
-    $inside_tbl_table = 1;
-    return;
-  }
-  if ( $command =~ /^TE$/ ) {
-    $Groff{'tbl'}++;           # for tbl
-    $inside_tbl_table = 0;
-    return;
-  }
-  if ( $command =~ /^TH$/ ) {
-    if ($inside_tbl_table) {
-      $Groff{'tbl'}++;         # for tbl
-    }
-    return;
-  }
-
-
-  ######################################################################
   # macro package (tmac)
   ######################################################################
 
@@ -581,44 +517,32 @@ my @preprocessor = ();
 
 
 sub infer_preprocessors {
-  # preprocessors without 'groff' option
-  if ( $Groff{'lilypond'} ) {
-    push @preprocessor, 'glilypond';
-  }
-  if ( $Groff{'gperl'} ) {
-    push @preprocessor, 'gperl';
-  }
-  if ( $Groff{'gpinyin'} ) {
-    push @preprocessor, 'gpinyin';
-  }
-
-  # preprocessors with 'groff' option
-  if ( $Groff{'PS'} &&  ( $Groff{'PE'} ||  $Groff{'PF'} ) ) {
-    $Groff{'pic'} = 1;
-  }
-  if ( $Groff{'gideal'} ) {
-    $Groff{'pic'} = 1;
-  }
-
-  $Groff{'refer'} ||= $Groff{'refer_open'} && $Groff{'refer_close'};
-
-  if ( $Groff{'chem'} || $Groff{'eqn'} ||  $Groff{'gideal'} ||
-       $Groff{'grap'} || $Groff{'grn'} || $Groff{'pic'} ||
-       $Groff{'refer'} || $Groff{'tbl'} ) {
-    push(@command, '-s') if $Groff{'soelim'};
-
-    push(@command, '-R') if $Groff{'refer'};
-
-    push(@command, '-t') if $Groff{'tbl'};     # tbl before eqn
-    push(@command, '-e') if $Groff{'eqn'};
-
-    push(@command, '-j') if $Groff{'chem'};    # chem produces pic code
-    push(@command, '-J') if $Groff{'gideal'};  # gideal produces pic
-    push(@command, '-G') if $Groff{'grap'};
-    push(@command, '-g') if $Groff{'grn'};     # gremlin files for -me
-    push(@command, '-p') if $Groff{'pic'};
-
+  my %option_for_preprocessor =  (
+    'eqn', '-e',
+    'grap', '-G',
+    'grn', '-g',
+    'pic', '-p',
+    'refer', '-R',
+    #'soelim', '-s', # Can't be inferred this way; see grog man page.
+    'tbl', '-t',
+    'chem', '-j'
+  );
+
+  # Use a temporary list we can sort later.  We want the options to show
+  # up in a stable order for testing purposes instead of the order their
+  # macros turn up in the input.  groff doesn't care about the order.
+  my @opt = ();
+
+  foreach my $preproc (@inferred_preprocessor) {
+    my $preproc_option = $option_for_preprocessor{$preproc};
+
+    if ($preproc_option) {
+      push @opt, $preproc_option;
+    } else {
+      push @preprocessor, $preproc;
+    }
   }
+  push @command, sort @opt;
 } # infer_preprocessors()
 
 
diff --git a/src/utils/grog/tests/smoke-test.sh 
b/src/utils/grog/tests/smoke-test.sh
index b598ab7..fdd5cc7 100755
--- a/src/utils/grog/tests/smoke-test.sh
+++ b/src/utils/grog/tests/smoke-test.sh
@@ -107,7 +107,7 @@ echo "testing mom(7) document $doc" >&2
 doc=$src/contrib/mom/examples/slide-demo.mom
 echo "testing mom(7) document $doc" >&2
 "$grog" "$doc" | \
-    grep -Fqx 'groff -t -e -p -mom '"$doc"
+    grep -Fqx 'groff -e -p -t -mom '"$doc"
 
 doc=$src/contrib/mom/examples/typesetting.mom
 echo "testing mom(7) document $doc" >&2
@@ -133,7 +133,7 @@ doc=$src/doc/pic.ms
 echo "testing tbl(1)-, eqn(1)-, and pic(1)-using ms(7) document $doc" \
     >&2
 "$grog" "$doc" | \
-    grep -Fqx 'groff -t -e -p -ms '"$doc"
+    grep -Fqx 'groff -e -p -t -ms '"$doc"
 
 doc=$src/doc/webpage.ms
 echo "testing ms(7) document $doc" >&2



reply via email to

[Prev in Thread] Current Thread [Next in Thread]