texinfo-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: * tp/Texinfo/Convert/ParagraphNonXS.pm (new): ren


From: Patrice Dumas
Subject: branch master updated: * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 'last_char' as 'last_letter'. a Always set 'last_letter', to an empty string when it was previously undef.
Date: Tue, 25 Jul 2023 09:54:55 -0400

This is an automated email from the git hooks/post-receive script.

pertusus pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 20d3dd4822 * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 
'last_char' as 'last_letter'. a Always set 'last_letter', to an empty string 
when it was previously undef.
20d3dd4822 is described below

commit 20d3dd4822f479efa1c2bb00e2a463cbea986433
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Tue Jul 25 15:52:32 2023 +0200

    * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 'last_char'
    as 'last_letter'. a Always set 'last_letter', to an empty string
    when it was previously undef.
    
    * tp/Texinfo/Convert/ParagraphNonXS.pm (end, _add_next, add_text),
    tp/Texinfo/XS/xspara.c (xspara__end_line, xspara_end)
    (xspara_add_text): set last_letter to the last character when it is
    space or end of line, or fullwidth character.  Do not unset
    last_letter in _add_pending_word.  Unset last_letter in end.
    
    * tp/Texinfo/Convert/ParagraphNonXS.pm ($end_sentence_characters):
    rename $end_sentence_character as $end_sentence_characters.
    
    * tp/Texinfo/XS/xspara.c (after_punctuation_characters)
    (end_sentence_characters): use defin for the strings to be sure to
    avoid errors in code (] was missing in one place).
    
    * tp/Texinfo/Convert/ParagraphNonXS.pm: try to have all debug messages
    as one string.
    
    * tp/t/paragraph.t: more tests with fullwidth characters, including
    latin fullwidth characters that can be upper case.
    
    * tp/Makefile.tres, tp/t/plaintext_tests.t
    (split_punctuation_detection_in_commands): new test with all the
    puctuation related character tested and with @-commands.
---
 ChangeLog                                          |  29 +++++
 tp/Makefile.tres                                   |   1 +
 tp/TODO                                            |   2 -
 tp/Texinfo/Convert/ParagraphNonXS.pm               |  99 ++++++++-------
 tp/Texinfo/Convert/Plaintext.pm                    |   2 +
 tp/Texinfo/XS/xspara.c                             |  57 +++++----
 tp/t/paragraph.t                                   |  12 +-
 tp/t/plaintext_tests.t                             |  11 +-
 .../split_punctuation_detection_in_commands.pl     | 138 +++++++++++++++++++++
 9 files changed, 275 insertions(+), 76 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 393d7a6183..413085cc85 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,32 @@
+2023-07-25  Patrice Dumas  <pertusus@free.fr>
+
+       * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 'last_char'
+       as 'last_letter'. a Always set 'last_letter', to an empty string
+       when it was previously undef.
+
+       * tp/Texinfo/Convert/ParagraphNonXS.pm (end, _add_next, add_text),
+       tp/Texinfo/XS/xspara.c (xspara__end_line, xspara_end)
+       (xspara_add_text): set last_letter to the last character when it is
+       space or end of line, or fullwidth character.  Do not unset
+       last_letter in _add_pending_word.  Unset last_letter in end.
+
+       * tp/Texinfo/Convert/ParagraphNonXS.pm ($end_sentence_characters):
+       rename $end_sentence_character as $end_sentence_characters.
+
+       * tp/Texinfo/XS/xspara.c (after_punctuation_characters)
+       (end_sentence_characters): use defin for the strings to be sure to
+       avoid errors in code (] was missing in one place).
+
+       * tp/Texinfo/Convert/ParagraphNonXS.pm: try to have all debug messages
+       as one string.
+
+       * tp/t/paragraph.t: more tests with fullwidth characters, including
+       latin fullwidth characters that can be upper case.
+
+       * tp/Makefile.tres, tp/t/plaintext_tests.t
+       (split_punctuation_detection_in_commands): new test with all the
+       puctuation related character tested and with @-commands.
+
 2023-07-24  Gavin Smith <gavinsmith0123@gmail.com>
 
        No casts in isascii_* calls
diff --git a/tp/Makefile.tres b/tp/Makefile.tres
index 69e5e32cba..af1ff2b78c 100644
--- a/tp/Makefile.tres
+++ b/tp/Makefile.tres
@@ -1664,6 +1664,7 @@ test_files_generated_list = 
$(test_tap_files_generated_list) \
   t/results/plaintext_tests/sc_with_utf8_enable_encoding.pl \
   t/results/plaintext_tests/settitle_and_empty_top.pl \
   t/results/plaintext_tests/sp_with_text_before_in_example.pl \
+  t/results/plaintext_tests/split_punctuation_detection_in_commands.pl \
   t/results/plaintext_tests/star_at_command_formatting.pl \
   t/results/plaintext_tests/tab_in_table_in_example.pl \
   t/results/plaintext_tests/tab_item_in_example.pl \
diff --git a/tp/TODO b/tp/TODO
index 65c4e62466..030dc7e9c2 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -12,8 +12,6 @@ Before next release
 
 comment on linemacro call at top level comments until end of line.
 
-ParagraphNonXS different from XS for punctuation, check if bug/difference.
-
 Bugs
 ====
 
diff --git a/tp/Texinfo/Convert/ParagraphNonXS.pm 
b/tp/Texinfo/Convert/ParagraphNonXS.pm
index e9d4b3d0ce..e210fcc337 100644
--- a/tp/Texinfo/Convert/ParagraphNonXS.pm
+++ b/tp/Texinfo/Convert/ParagraphNonXS.pm
@@ -18,7 +18,7 @@
 # Original author: Patrice Dumas <pertusus@free.fr>
 
 # this module has nothing Texinfo specific.  In contrast with existing
-# modules Text::Wrap, Text::Format, it keeps a state of the paragraph 
+# modules Text::Wrap, Text::Format, it keeps a state of the paragraph
 # and waits for text to be fed into it.
 
 package Texinfo::Convert::Paragraph;
@@ -43,7 +43,7 @@ sub new($;$)
   my $self = {'max' => 72, 'indent_length' => 0, 'counter' => 0, 
               'word_counter' => 0, 'space' => '', 'frenchspacing' => 0,
               'lines_counter' => 0, 'end_line_count' => 0,
-              'unfilled' => 0 };
+              'unfilled' => 0, 'last_letter' => '' };
   if (defined($conf)) {
     foreach my $key (keys(%$conf)) {
       $self->{$key} = $conf->{$key};
@@ -56,13 +56,11 @@ sub new($;$)
 sub dump($)
 {
   my $self = shift;
-  my $word = 'UNDEF';
-  if (defined($self->{'word'})) {
-    $word = $self->{'word'};
-  }
-  my $end_sentence = 'UNDEF';
-  $end_sentence = $self->{'end_sentence'} if 
(defined($self->{'end_sentence'}));
-  print STDERR "para ($self->{'counter'}+$self->{'word_counter'}) word: $word, 
space `$self->{'space'}' end_sentence: $end_sentence\n"; 
+  print STDERR "para ($self->{'counter'}+$self->{'word_counter'}) "
+    ."word: ".(defined($self->{'word'}) ? $self->{'word'} : 'UNDEF')
+    .", space `$self->{'space'}' "
+    ."end_sentence: ".(defined($self->{'end_sentence'})
+                                ? $self->{'end_sentence'} : 'UNDEF')."\n";
 }
 
 sub _cut_line($)
@@ -103,6 +101,8 @@ sub _end_line($)
   }
   $paragraph->{'lines_counter'}++;
   $paragraph->{'end_line_count'}++;
+  # could be set to other values, anything that is not upper case.
+  $paragraph->{'last_letter'} = "\n";
   print STDERR "END_LINE\n" if ($paragraph->{'DEBUG'});
   return "\n";
 }
@@ -156,10 +156,10 @@ sub _add_pending_word($;$)
   if (defined($paragraph->{'word'})) {
     $result .= $paragraph->{'word'};
     $paragraph->{'counter'} += $paragraph->{'word_counter'};
-    print STDERR "ADD_WORD[$paragraph->{'word'}]+$paragraph->{'word_counter'} 
($paragraph->{'counter'})\n"
-      if ($paragraph->{'DEBUG'});
+    print STDERR "ADD_WORD[$paragraph->{'word'}]+$paragraph->{'word_counter'}"
+      ." ($paragraph->{'counter'})\n"
+        if ($paragraph->{'DEBUG'});
     $paragraph->{'word'} = undef;
-    $paragraph->{'last_char'} = undef;
     $paragraph->{'word_counter'} = 0;
   }
   return $result;
@@ -172,6 +172,8 @@ sub end($)
   $paragraph->{'end_line_count'} = 0;
   print STDERR "PARA END\n" if ($paragraph->{'DEBUG'});
   my $result = _add_pending_word($paragraph, $paragraph->{'add_final_space'});
+  # probably not really useful, but cleaner
+  $paragraph->{'last_letter'} = '';
   if (!$paragraph->{'no_final_newline'} and $paragraph->{'counter'} != 0) {
     $result .= "\n"; 
     $paragraph->{'lines_counter'}++;
@@ -180,7 +182,7 @@ sub end($)
   return $result;
 }
 
-my $end_sentence_character = quotemeta('.?!');
+my $end_sentence_characters = quotemeta('.?!');
 my $after_punctuation_characters = quotemeta('"\')]');
 
 # Add $WORD to paragraph, returning the text to be added to the paragraph.
@@ -220,21 +222,18 @@ sub _add_next($;$$$)
 
   if (!$transparent) {
     if ($disinhibit) {
-      $paragraph->{'last_char'} = 'a';
+      $paragraph->{'last_letter'} = 'a';
     } elsif ($word =~
-         /([^$end_sentence_character$after_punctuation_characters])
-          [$end_sentence_character$after_punctuation_characters]*$/ox) {
+         /([^$end_sentence_characters$after_punctuation_characters])
+          [$end_sentence_characters$after_punctuation_characters]*$/ox) {
       # Save the last character in $word before punctuation
-      $paragraph->{'last_char'} = $1;
+      $paragraph->{'last_letter'} = $1;
     }
   }
 
   if (!$newlines_impossible and $word =~ /\n/) {
     $result .= _add_pending_word ($paragraph);
     _end_line($paragraph);
-    $paragraph->{'word_counter'} = 0;
-    $paragraph->{'word'} = undef;
-    $paragraph->{'last_char'} = undef;
   } else {
     $paragraph->{'word_counter'}
       += Texinfo::Convert::Unicode::string_width($word);
@@ -248,11 +247,8 @@ sub _add_next($;$$$)
     }
   }
   if ($paragraph->{'DEBUG'}) {
-    my $para_word = 'UNDEF';;
-    if (defined($paragraph->{'word'})) {
-      $para_word = $paragraph->{'word'};
-    }
-    print STDERR "WORD+ $word -> $para_word\n";
+    print STDERR "WORD+ $word -> "
+      .(defined($paragraph->{'word'}) ? $paragraph->{'word'} : 'UNDEF')."\n";
   }
 
   return $result;
@@ -274,7 +270,7 @@ sub allow_end_sentence($)
 {
   my $paragraph = shift;
   printf STDERR "ALLOW END SENTENCE\n" if $paragraph->{'DEBUG'};
-  $paragraph->{'last_char'} = 'a'; # lower-case
+  $paragraph->{'last_letter'} = 'a'; # lower-case
 }
 
 sub set_space_protection($$;$$$$)
@@ -329,11 +325,11 @@ sub add_text($$)
      = splice (@segments, 0, 4);
 
     if ($debug_flag) {
-      my $word = 'UNDEF';
-      $word = $paragraph->{'word'} if (defined($paragraph->{'word'}));
-      print STDERR "p ($paragraph->{'counter'}+$paragraph->{'word_counter'}) s 
`"
-          ._print_escaped_spaces($paragraph->{'space'})."', w `$word'\n";
-      #print STDERR "TEXT: "._print_escaped_spaces($text)."|\n"
+      print STDERR "p ($paragraph->{'counter'}+$paragraph->{'word_counter'}) "
+       ."s `" . _print_escaped_spaces($paragraph->{'space'})."', "
+       ."l `$paragraph->{'last_letter'}', "
+       ."w `".(defined($paragraph->{'word'}) ? $paragraph->{'word'}
+                                 : 'UNDEF')."'\n";
     }
     if (defined $spaces) {
       print STDERR "SPACES($paragraph->{'counter'}) `"
@@ -354,7 +350,6 @@ sub add_text($$)
           if (substr($paragraph->{'word'}, -1) ne ' ') {
             my $new_spaces = $at_end_sentence ? '  ' : ' ';
             $paragraph->{'word'} .= $new_spaces;
-            $paragraph->{'last_char'} = ' ';
             $paragraph->{'word_counter'} += length($new_spaces);
 
             # The $paragraph->{'counter'} != 0 is here to avoid having an
@@ -385,9 +380,6 @@ sub add_text($$)
           }
         }
       }
-      #print STDERR "delete END_SENTENCE($paragraph->{'end_sentence'}): 
spaces\n" 
-      #  if (defined($paragraph->{'end_sentence'}) and $paragraph->{'DEBUG'});
-      #delete $paragraph->{'end_sentence'};
       if ($paragraph->{'counter'} + length($paragraph->{'space'}) 
                       > $paragraph->{'max'}) {
         $result .= _cut_line($paragraph);
@@ -396,12 +388,16 @@ sub add_text($$)
           and $paragraph->{'keep_end_lines'} and $spaces =~ /\n/) {
         $result .= _end_line($paragraph);
       }
+      $paragraph->{'last_letter'} = ' ';
     } elsif (defined $added_word) {
       my $tmp = $added_word;
-      if (defined $paragraph->{'last_char'}) {
-        # Use 'last_char' here because _add_next overwrites it.
-        $tmp = $paragraph->{'last_char'} . $tmp;
-      }
+      # Prepend 'last_letter' to add the information on the last
+      # letter even if it was read as part of a previous string
+      # Add it here because _add_next overwrites it.  Note that
+      # if _add_next overwrited it, it wouldn't lead to an invalid
+      # result, as the wrong prepended 'last_letter' would not match
+      # at the end of the $added_word in the regex below anyway.
+      $tmp = $paragraph->{'last_letter'} . $tmp;
 
       $result .= _add_next($paragraph, $added_word, undef,
                            !$newline_possible_flag);
@@ -414,9 +410,9 @@ sub add_text($$)
         # do nothing in the case of a continuation of 
after_punctuation_characters
       } elsif (!$paragraph->{'unfilled'}
           and $tmp =~
-        /(^|[^\p{Upper}$after_punctuation_characters$end_sentence_character])
-         [$after_punctuation_characters]*[$end_sentence_character]
-         [$end_sentence_character\x08$after_punctuation_characters]*$/ox) {
+        /(^|[^\p{Upper}$after_punctuation_characters$end_sentence_characters])
+         [$after_punctuation_characters]*[$end_sentence_characters]
+         [$end_sentence_characters\x08$after_punctuation_characters]*$/ox) {
         if ($paragraph->{'frenchspacing'}) {
           $paragraph->{'end_sentence'} = -1;
         } else {
@@ -424,27 +420,34 @@ sub add_text($$)
         }
         print STDERR "END_SENTENCE\n" if ($paragraph->{'DEBUG'});
       } else {
-        delete $paragraph->{'end_sentence'};
-        print STDERR "delete END_SENTENCE($paragraph->{'end_sentence'}): 
text\n" 
+        print STDERR "delete END_SENTENCE($paragraph->{'end_sentence'})\n"
           if (defined($paragraph->{'end_sentence'}) and $paragraph->{'DEBUG'});
+        delete $paragraph->{'end_sentence'};
       }
     } elsif (defined $fullwidth_segment) {
-      print STDERR "EAST_ASIAN\n" if ($paragraph->{'DEBUG'});
+      print STDERR "FULLWIDTH\n" if ($paragraph->{'DEBUG'});
+
       if (!defined($paragraph->{'word'})) {
         $paragraph->{'word'} = '';
       }
       $paragraph->{'word'} .= $fullwidth_segment;
-      $paragraph->{'last_char'} = $fullwidth_segment;
       $paragraph->{'word_counter'} += 2;
+
+      # fullwidth latin letters can be upper case, so it is important to
+      # use the actual characters here.
+      $paragraph->{'last_letter'} = $fullwidth_segment;
+
+      # We allow a line break in between Chinese characters even if
+      # there was no space between them, unlike single-width
+      # characters.
       if ($paragraph->{'counter'} != 0 and
-          $paragraph->{'counter'} + $paragraph->{'word_counter'} 
+          $paragraph->{'counter'} + $paragraph->{'word_counter'}
                                > $paragraph->{'max'}) {
         $result .= _cut_line($paragraph);
       }
       if (!$paragraph->{'no_break'}
           and !$paragraph->{'double_width_no_break'}) {
         $result .= _add_pending_word($paragraph);
-        $paragraph->{'space'} = '';
       }
       delete $paragraph->{'end_sentence'};
     }
diff --git a/tp/Texinfo/Convert/Plaintext.pm b/tp/Texinfo/Convert/Plaintext.pm
index b46e2dfbf6..0dcc5952ec 100644
--- a/tp/Texinfo/Convert/Plaintext.pm
+++ b/tp/Texinfo/Convert/Plaintext.pm
@@ -621,6 +621,8 @@ sub _protect_sentence_ends ($) {
   my $text = shift;
   # Avoid suppressing end of sentence, by inserting a control character
   # in front of the full stop.  The choice of BS for this is arbitrary.
+  # Note that the use of ?: is not crucial but since we do not use the
+  # grouping value, setting no backtracking could be more efficient.
   $text =~ s/(?<=[^\p{Upper}])
              (?=[$end_sentence][$after_punctuation]*(?:\s|$))
              /\x08/xg;
diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
index 5ca8260f8a..ea25f25a39 100644
--- a/tp/Texinfo/XS/xspara.c
+++ b/tp/Texinfo/XS/xspara.c
@@ -570,6 +570,8 @@ xspara__end_line (void)
 
   state.lines_counter++;
   state.end_line_count++;
+  /* could be set to other values, anything that is not upper case. */
+  state.last_letter = L'\n';
 }
 
 char *
@@ -678,6 +680,9 @@ xspara_end (void)
   if (debug)
     fprintf (stderr, "PARA END\n");
 
+  /* probably not really useful, but cleaner */
+  state.last_letter = L'\0';
+
   xspara__add_pending_word (&ret, state.add_final_space);
   if (!state.no_final_newline && state.counter != 0)
     {
@@ -702,6 +707,12 @@ xspara_end (void)
 /* check if a byte is in the printable ASCII range */
 #define PRINTABLE_ASCII(c) (0x20 <= (c) && (c) <= 0x7E)
 
+/* ignored after end sentence character to determine if
+   at the end of a sentence */
+#define after_punctuation_characters "\"')]"
+/* characters triggering an end of sentence */
+#define end_sentence_characters ".?!"
+
 /* Add WORD to paragraph in RESULT, not refilling WORD.  If we go past the end 
    of the line start a new one.  TRANSPARENT means that the letters in WORD
    are ignored for the purpose of deciding whether a full stop ends a sentence
@@ -743,7 +754,8 @@ xspara__add_next (TEXT *result, char *word, int word_len, 
int transparent)
                 }
               while ((*p & 0xC0) == 0x80 && p > word);
 
-              if (!strchr (".?!\"')", *p))
+              if (!strchr (end_sentence_characters
+                           after_punctuation_characters, *p))
                 {
                   if (!PRINTABLE_ASCII(*p))
                     {
@@ -917,18 +929,13 @@ xspara_add_text (char *text, int len)
     {
       if (debug)
         {
-          char *word = "UNDEF";
-          if (state.word.end > 0)
-            word = state.word.text;
-          fprintf(stderr, "p (%d+%d) s `%s', w `%s'\n", state.counter,
-                  state.word_counter, state.space.end == 0 ? ""
-                   : xspara__print_escaped_spaces (state.space.text),
-                  word);
+          fprintf(stderr, "p (%d+%d) s `%s', l `%lc', w `%s'\n", state.counter,
+              state.word_counter, state.last_letter, state.space.end == 0 ? ""
+               : xspara__print_escaped_spaces (state.space.text),
+              state.word.end > 0 ? state.word.text : "UNDEF");
         }
       if (isspace ((unsigned char) *p))
         {
-          state.last_letter = L'\0';
-
           if (debug)
             {
               char t[2];
@@ -968,7 +975,6 @@ xspara_add_text (char *text, int len)
                       text_append_n (&state.word, " ", 1);
                       state.word_counter += 1;
                     }
-                  state.last_letter = ' ';
 
                   if (state.counter != 0
                       && state.counter + state.word_counter
@@ -1027,6 +1033,7 @@ xspara_add_text (char *text, int len)
               text_append (&result, "\n");
             }
           p++; len--;
+          state.last_letter = ' ';
           continue;
         }
 
@@ -1054,17 +1061,20 @@ xspara_add_text (char *text, int len)
       /*************** Double width character. *********************/
       if (width == 2)
         {
-          state.last_letter = L'\0';
+          if (debug)
+            fprintf (stderr, "FULLWIDTH\n");
+
+          text_append_n (&state.word, p, char_len);
+          state.word_counter += 2;
+
+          /* fullwidth latin letters can be upper case, so it is important to
+             use the actual characters here. */
+          state.last_letter = wc;
 
           /* We allow a line break in between Chinese characters even if
              there was no space between them, unlike single-width
              characters. */
 
-          /* Append wc to state.word. */
-          text_append_n (&state.word, p, char_len);
-
-          state.word_counter += 2;
-
           if (state.counter != 0
               && state.counter + state.word_counter > state.max)
             {
@@ -1075,8 +1085,8 @@ xspara_add_text (char *text, int len)
           if (!state.no_break && !state.double_width_no_break)
             {
               xspara__add_pending_word (&result, 0);
-              state.end_sentence = -2;
             }
+          state.end_sentence = -2;
         }
       else if (wc == L'\b')
         {
@@ -1099,7 +1109,7 @@ xspara_add_text (char *text, int len)
           /* Now check if it is considered as an end of sentence, and
              set state.end_sentence if it is. */
 
-          if (strchr (".?!", *p) && !state.unfilled)
+          if (strchr (end_sentence_characters, *p) && !state.unfilled)
             {
               /* Doesn't count if preceded by an upper-case letter. */
               if (!iswupper (state.last_letter))
@@ -1108,9 +1118,11 @@ xspara_add_text (char *text, int len)
                     state.end_sentence = -1;
                   else
                     state.end_sentence = 1;
+                  if (debug)
+                    fprintf (stderr, "END_SENTENCE\n");
                 }
             }
-          else if (strchr ("\"')]", *p))
+          else if (strchr (after_punctuation_characters, *p))
             {
               /* '"', '\'', ']' and ')' are ignored for the purpose
                of deciding whether a full stop ends a sentence. */
@@ -1120,8 +1132,11 @@ xspara_add_text (char *text, int len)
               /* Otherwise reset the end of sentence marker: a full stop in
                  a string like "aaaa.bbbb" doesn't mark an end of
                  sentence. */
-              state.end_sentence = -2;
               state.last_letter = wc;
+              if (debug && state.end_sentence != -2)
+                fprintf (stderr, "delete END_SENTENCE(%d)\n",
+                                  state.end_sentence);
+              state.end_sentence = -2;
             }
         }
       else
diff --git a/tp/t/paragraph.t b/tp/t/paragraph.t
index c89f17184c..97c6ca75ce 100644
--- a/tp/t/paragraph.t
+++ b/tp/t/paragraph.t
@@ -7,7 +7,7 @@ use File::Basename;
 use lib '.';
 use Texinfo::ModulePath (undef, undef, undef, 'updirs' => 2);
 
-BEGIN { plan tests => 119 ; }
+BEGIN { plan tests => 122 ; }
 
 use Texinfo::Convert::Paragraph;
 
@@ -21,8 +21,8 @@ sub test_para($$$;$)
   my $conf = shift;
 
   my $result = '';
-  #$conf = {'DEBUG' => 1} if (!defined($conf));
   $conf = {} if (!defined($conf));
+  $conf->{'DEBUG'} = 1;
   my $para = Texinfo::Convert::Paragraph->new($conf);
   foreach my $arg (@$args) {
     $result .= add_text($para, $arg);
@@ -70,6 +70,14 @@ test_para(['word',' other'], "word\nother\n", 
'two_elements_space_max', {'max' =
 test_para(["\x{7b2c}\x{4e00} ",'other'], "\x{7b2c}\n\x{4e00}\nother\n", 
'east_asian', {'max' => 2});
 test_para(['word.  other'], "word. other\n", 'two_words_dot_frenchspacing', 
{'frenchspacing' => 1});
 test_para(["aa.)\x{7b2c} b"], "aa.)\x{7b2c} b\n", 'end_sentence_east_asian');
+test_para(["B\x{7b2c}. After\x{7b2c}. Last"], "B\x{7b2c}.  After\x{7b2c}.  
Last\n",
+          'east_asian_before_end_sentence');
+# uses a fullwidth b
+test_para(["B\x{ff42}. After\x{ff42}. Last"], "B\x{ff42}.  After\x{ff42}.  
Last\n",
+          'fullwidth_lower_case_latin_before_end_sentence');
+# uses a fullwidth R
+test_para(["B\x{ff32}. After\x{ff32}. Last"], "B\x{ff32}. After\x{ff32}. 
Last\n",
+           'fullwidth_upper_case_latin_before_end_sentence');
 test_para(["aaaa bbbbbbb cccccccc dddddddddddd eeeeeeeeeeee fffffffff 
ggggggg"],
 "   aaaa
  bbbbbbb
diff --git a/tp/t/plaintext_tests.t b/tp/t/plaintext_tests.t
index ff8010cb95..ceba400026 100644
--- a/tp/t/plaintext_tests.t
+++ b/tp/t/plaintext_tests.t
@@ -84,6 +84,10 @@ Before samp. @samp{a}. after samp, w @w{in   w. after dot}  
afterw
 @exdent before samp. @samp{a}. after samp, w @w{in   w. after dot}  afterw
 @end quotation
 '],
+# This tests all the possibilities for end sentence related characters
+# ans splitting by commands (also tested in other tests)
+['split_punctuation_detection_in_commands',
+'Before @asis{B}@asis{)}@asis{.}]]? Afte@strong{R}@emph{"!}\'? Last'],
 ['html_expanded',
 'Before
 @html
@@ -383,6 +387,10 @@ and in emph.}
 'Text.@asis{)
 follows}.
 '],
+# tests that upper case letter in code ends a sentence
+['code_commands_and_punctuation',
+'@code{AA}. @samp{aa}. After.
+'],
 ['sc_with_utf8_enable_encoding',
 '@documentencoding utf-8
 @sc{in sc}.
@@ -459,9 +467,6 @@ end footnote}
 ['command_brace_no_arg_punctuation',
 '@TeX{}. And @LaTeX{}. @copyright{}. @registeredsymbol{}. End.
 '],
-['code_commands_and_punctuation',
-'@code{AA}. @samp{aa}. After.
-'],
 ['sp_with_text_before_in_example',
 '
 @example
diff --git 
a/tp/t/results/plaintext_tests/split_punctuation_detection_in_commands.pl 
b/tp/t/results/plaintext_tests/split_punctuation_detection_in_commands.pl
new file mode 100644
index 0000000000..7062a49c90
--- /dev/null
+++ b/tp/t/results/plaintext_tests/split_punctuation_detection_in_commands.pl
@@ -0,0 +1,138 @@
+use vars qw(%result_texis %result_texts %result_trees %result_errors 
+   %result_indices %result_sectioning %result_nodes %result_menus
+   %result_floats %result_converted %result_converted_errors 
+   %result_elements %result_directions_text %result_indices_sort_strings);
+
+use utf8;
+
+$result_trees{'split_punctuation_detection_in_commands'} = {
+  'contents' => [
+    {
+      'contents' => [
+        {
+          'contents' => [
+            {
+              'text' => 'Before '
+            },
+            {
+              'args' => [
+                {
+                  'contents' => [
+                    {
+                      'text' => 'B'
+                    }
+                  ],
+                  'type' => 'brace_command_arg'
+                }
+              ],
+              'cmdname' => 'asis',
+              'source_info' => {
+                'file_name' => '',
+                'line_nr' => 1,
+                'macro' => ''
+              }
+            },
+            {
+              'args' => [
+                {
+                  'contents' => [
+                    {
+                      'text' => ')'
+                    }
+                  ],
+                  'type' => 'brace_command_arg'
+                }
+              ],
+              'cmdname' => 'asis',
+              'source_info' => {
+                'file_name' => '',
+                'line_nr' => 1,
+                'macro' => ''
+              }
+            },
+            {
+              'args' => [
+                {
+                  'contents' => [
+                    {
+                      'text' => '.'
+                    }
+                  ],
+                  'type' => 'brace_command_arg'
+                }
+              ],
+              'cmdname' => 'asis',
+              'source_info' => {
+                'file_name' => '',
+                'line_nr' => 1,
+                'macro' => ''
+              }
+            },
+            {
+              'text' => ']]? Afte'
+            },
+            {
+              'args' => [
+                {
+                  'contents' => [
+                    {
+                      'text' => 'R'
+                    }
+                  ],
+                  'type' => 'brace_command_arg'
+                }
+              ],
+              'cmdname' => 'strong',
+              'source_info' => {
+                'file_name' => '',
+                'line_nr' => 1,
+                'macro' => ''
+              }
+            },
+            {
+              'args' => [
+                {
+                  'contents' => [
+                    {
+                      'text' => '"!'
+                    }
+                  ],
+                  'type' => 'brace_command_arg'
+                }
+              ],
+              'cmdname' => 'emph',
+              'source_info' => {
+                'file_name' => '',
+                'line_nr' => 1,
+                'macro' => ''
+              }
+            },
+            {
+              'text' => '\'? Last'
+            }
+          ],
+          'type' => 'paragraph'
+        }
+      ],
+      'type' => 'before_node_section'
+    }
+  ],
+  'type' => 'document_root'
+};
+
+$result_texis{'split_punctuation_detection_in_commands'} = 'Before 
@asis{B}@asis{)}@asis{.}]]? Afte@strong{R}@emph{"!}\'? Last';
+
+
+$result_texts{'split_punctuation_detection_in_commands'} = 'Before B).]]? 
AfteR"!\'? Last';
+
+$result_errors{'split_punctuation_detection_in_commands'} = [];
+
+
+$result_floats{'split_punctuation_detection_in_commands'} = {};
+
+
+
+$result_converted{'plaintext'}->{'split_punctuation_detection_in_commands'} = 
'Before B).]]? Afte*R*_"!_\'? Last
+';
+
+1;



reply via email to

[Prev in Thread] Current Thread [Next in Thread]