[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: * tp/Texinfo/Convert/ParagraphNonXS.pm (new): ren
From: |
Patrice Dumas |
Subject: |
branch master updated: * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 'last_char' as 'last_letter'. a Always set 'last_letter', to an empty string when it was previously undef. |
Date: |
Tue, 25 Jul 2023 09:54:55 -0400 |
This is an automated email from the git hooks/post-receive script.
pertusus pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new 20d3dd4822 * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename
'last_char' as 'last_letter'. a Always set 'last_letter', to an empty string
when it was previously undef.
20d3dd4822 is described below
commit 20d3dd4822f479efa1c2bb00e2a463cbea986433
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Tue Jul 25 15:52:32 2023 +0200
* tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 'last_char'
as 'last_letter'. a Always set 'last_letter', to an empty string
when it was previously undef.
* tp/Texinfo/Convert/ParagraphNonXS.pm (end, _add_next, add_text),
tp/Texinfo/XS/xspara.c (xspara__end_line, xspara_end)
(xspara_add_text): set last_letter to the last character when it is
space or end of line, or fullwidth character. Do not unset
last_letter in _add_pending_word. Unset last_letter in end.
* tp/Texinfo/Convert/ParagraphNonXS.pm ($end_sentence_characters):
rename $end_sentence_character as $end_sentence_characters.
* tp/Texinfo/XS/xspara.c (after_punctuation_characters)
(end_sentence_characters): use defin for the strings to be sure to
avoid errors in code (] was missing in one place).
* tp/Texinfo/Convert/ParagraphNonXS.pm: try to have all debug messages
as one string.
* tp/t/paragraph.t: more tests with fullwidth characters, including
latin fullwidth characters that can be upper case.
* tp/Makefile.tres, tp/t/plaintext_tests.t
(split_punctuation_detection_in_commands): new test with all the
puctuation related character tested and with @-commands.
---
ChangeLog | 29 +++++
tp/Makefile.tres | 1 +
tp/TODO | 2 -
tp/Texinfo/Convert/ParagraphNonXS.pm | 99 ++++++++-------
tp/Texinfo/Convert/Plaintext.pm | 2 +
tp/Texinfo/XS/xspara.c | 57 +++++----
tp/t/paragraph.t | 12 +-
tp/t/plaintext_tests.t | 11 +-
.../split_punctuation_detection_in_commands.pl | 138 +++++++++++++++++++++
9 files changed, 275 insertions(+), 76 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index 393d7a6183..413085cc85 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,32 @@
+2023-07-25 Patrice Dumas <pertusus@free.fr>
+
+ * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 'last_char'
+ as 'last_letter'. a Always set 'last_letter', to an empty string
+ when it was previously undef.
+
+ * tp/Texinfo/Convert/ParagraphNonXS.pm (end, _add_next, add_text),
+ tp/Texinfo/XS/xspara.c (xspara__end_line, xspara_end)
+ (xspara_add_text): set last_letter to the last character when it is
+ space or end of line, or fullwidth character. Do not unset
+ last_letter in _add_pending_word. Unset last_letter in end.
+
+ * tp/Texinfo/Convert/ParagraphNonXS.pm ($end_sentence_characters):
+ rename $end_sentence_character as $end_sentence_characters.
+
+ * tp/Texinfo/XS/xspara.c (after_punctuation_characters)
+ (end_sentence_characters): use defin for the strings to be sure to
+ avoid errors in code (] was missing in one place).
+
+ * tp/Texinfo/Convert/ParagraphNonXS.pm: try to have all debug messages
+ as one string.
+
+ * tp/t/paragraph.t: more tests with fullwidth characters, including
+ latin fullwidth characters that can be upper case.
+
+ * tp/Makefile.tres, tp/t/plaintext_tests.t
+ (split_punctuation_detection_in_commands): new test with all the
+ puctuation related character tested and with @-commands.
+
2023-07-24 Gavin Smith <gavinsmith0123@gmail.com>
No casts in isascii_* calls
diff --git a/tp/Makefile.tres b/tp/Makefile.tres
index 69e5e32cba..af1ff2b78c 100644
--- a/tp/Makefile.tres
+++ b/tp/Makefile.tres
@@ -1664,6 +1664,7 @@ test_files_generated_list =
$(test_tap_files_generated_list) \
t/results/plaintext_tests/sc_with_utf8_enable_encoding.pl \
t/results/plaintext_tests/settitle_and_empty_top.pl \
t/results/plaintext_tests/sp_with_text_before_in_example.pl \
+ t/results/plaintext_tests/split_punctuation_detection_in_commands.pl \
t/results/plaintext_tests/star_at_command_formatting.pl \
t/results/plaintext_tests/tab_in_table_in_example.pl \
t/results/plaintext_tests/tab_item_in_example.pl \
diff --git a/tp/TODO b/tp/TODO
index 65c4e62466..030dc7e9c2 100644
--- a/tp/TODO
+++ b/tp/TODO
@@ -12,8 +12,6 @@ Before next release
comment on linemacro call at top level comments until end of line.
-ParagraphNonXS different from XS for punctuation, check if bug/difference.
-
Bugs
====
diff --git a/tp/Texinfo/Convert/ParagraphNonXS.pm
b/tp/Texinfo/Convert/ParagraphNonXS.pm
index e9d4b3d0ce..e210fcc337 100644
--- a/tp/Texinfo/Convert/ParagraphNonXS.pm
+++ b/tp/Texinfo/Convert/ParagraphNonXS.pm
@@ -18,7 +18,7 @@
# Original author: Patrice Dumas <pertusus@free.fr>
# this module has nothing Texinfo specific. In contrast with existing
-# modules Text::Wrap, Text::Format, it keeps a state of the paragraph
+# modules Text::Wrap, Text::Format, it keeps a state of the paragraph
# and waits for text to be fed into it.
package Texinfo::Convert::Paragraph;
@@ -43,7 +43,7 @@ sub new($;$)
my $self = {'max' => 72, 'indent_length' => 0, 'counter' => 0,
'word_counter' => 0, 'space' => '', 'frenchspacing' => 0,
'lines_counter' => 0, 'end_line_count' => 0,
- 'unfilled' => 0 };
+ 'unfilled' => 0, 'last_letter' => '' };
if (defined($conf)) {
foreach my $key (keys(%$conf)) {
$self->{$key} = $conf->{$key};
@@ -56,13 +56,11 @@ sub new($;$)
sub dump($)
{
my $self = shift;
- my $word = 'UNDEF';
- if (defined($self->{'word'})) {
- $word = $self->{'word'};
- }
- my $end_sentence = 'UNDEF';
- $end_sentence = $self->{'end_sentence'} if
(defined($self->{'end_sentence'}));
- print STDERR "para ($self->{'counter'}+$self->{'word_counter'}) word: $word,
space `$self->{'space'}' end_sentence: $end_sentence\n";
+ print STDERR "para ($self->{'counter'}+$self->{'word_counter'}) "
+ ."word: ".(defined($self->{'word'}) ? $self->{'word'} : 'UNDEF')
+ .", space `$self->{'space'}' "
+ ."end_sentence: ".(defined($self->{'end_sentence'})
+ ? $self->{'end_sentence'} : 'UNDEF')."\n";
}
sub _cut_line($)
@@ -103,6 +101,8 @@ sub _end_line($)
}
$paragraph->{'lines_counter'}++;
$paragraph->{'end_line_count'}++;
+ # could be set to other values, anything that is not upper case.
+ $paragraph->{'last_letter'} = "\n";
print STDERR "END_LINE\n" if ($paragraph->{'DEBUG'});
return "\n";
}
@@ -156,10 +156,10 @@ sub _add_pending_word($;$)
if (defined($paragraph->{'word'})) {
$result .= $paragraph->{'word'};
$paragraph->{'counter'} += $paragraph->{'word_counter'};
- print STDERR "ADD_WORD[$paragraph->{'word'}]+$paragraph->{'word_counter'}
($paragraph->{'counter'})\n"
- if ($paragraph->{'DEBUG'});
+ print STDERR "ADD_WORD[$paragraph->{'word'}]+$paragraph->{'word_counter'}"
+ ." ($paragraph->{'counter'})\n"
+ if ($paragraph->{'DEBUG'});
$paragraph->{'word'} = undef;
- $paragraph->{'last_char'} = undef;
$paragraph->{'word_counter'} = 0;
}
return $result;
@@ -172,6 +172,8 @@ sub end($)
$paragraph->{'end_line_count'} = 0;
print STDERR "PARA END\n" if ($paragraph->{'DEBUG'});
my $result = _add_pending_word($paragraph, $paragraph->{'add_final_space'});
+ # probably not really useful, but cleaner
+ $paragraph->{'last_letter'} = '';
if (!$paragraph->{'no_final_newline'} and $paragraph->{'counter'} != 0) {
$result .= "\n";
$paragraph->{'lines_counter'}++;
@@ -180,7 +182,7 @@ sub end($)
return $result;
}
-my $end_sentence_character = quotemeta('.?!');
+my $end_sentence_characters = quotemeta('.?!');
my $after_punctuation_characters = quotemeta('"\')]');
# Add $WORD to paragraph, returning the text to be added to the paragraph.
@@ -220,21 +222,18 @@ sub _add_next($;$$$)
if (!$transparent) {
if ($disinhibit) {
- $paragraph->{'last_char'} = 'a';
+ $paragraph->{'last_letter'} = 'a';
} elsif ($word =~
- /([^$end_sentence_character$after_punctuation_characters])
- [$end_sentence_character$after_punctuation_characters]*$/ox) {
+ /([^$end_sentence_characters$after_punctuation_characters])
+ [$end_sentence_characters$after_punctuation_characters]*$/ox) {
# Save the last character in $word before punctuation
- $paragraph->{'last_char'} = $1;
+ $paragraph->{'last_letter'} = $1;
}
}
if (!$newlines_impossible and $word =~ /\n/) {
$result .= _add_pending_word ($paragraph);
_end_line($paragraph);
- $paragraph->{'word_counter'} = 0;
- $paragraph->{'word'} = undef;
- $paragraph->{'last_char'} = undef;
} else {
$paragraph->{'word_counter'}
+= Texinfo::Convert::Unicode::string_width($word);
@@ -248,11 +247,8 @@ sub _add_next($;$$$)
}
}
if ($paragraph->{'DEBUG'}) {
- my $para_word = 'UNDEF';;
- if (defined($paragraph->{'word'})) {
- $para_word = $paragraph->{'word'};
- }
- print STDERR "WORD+ $word -> $para_word\n";
+ print STDERR "WORD+ $word -> "
+ .(defined($paragraph->{'word'}) ? $paragraph->{'word'} : 'UNDEF')."\n";
}
return $result;
@@ -274,7 +270,7 @@ sub allow_end_sentence($)
{
my $paragraph = shift;
printf STDERR "ALLOW END SENTENCE\n" if $paragraph->{'DEBUG'};
- $paragraph->{'last_char'} = 'a'; # lower-case
+ $paragraph->{'last_letter'} = 'a'; # lower-case
}
sub set_space_protection($$;$$$$)
@@ -329,11 +325,11 @@ sub add_text($$)
= splice (@segments, 0, 4);
if ($debug_flag) {
- my $word = 'UNDEF';
- $word = $paragraph->{'word'} if (defined($paragraph->{'word'}));
- print STDERR "p ($paragraph->{'counter'}+$paragraph->{'word_counter'}) s
`"
- ._print_escaped_spaces($paragraph->{'space'})."', w `$word'\n";
- #print STDERR "TEXT: "._print_escaped_spaces($text)."|\n"
+ print STDERR "p ($paragraph->{'counter'}+$paragraph->{'word_counter'}) "
+ ."s `" . _print_escaped_spaces($paragraph->{'space'})."', "
+ ."l `$paragraph->{'last_letter'}', "
+ ."w `".(defined($paragraph->{'word'}) ? $paragraph->{'word'}
+ : 'UNDEF')."'\n";
}
if (defined $spaces) {
print STDERR "SPACES($paragraph->{'counter'}) `"
@@ -354,7 +350,6 @@ sub add_text($$)
if (substr($paragraph->{'word'}, -1) ne ' ') {
my $new_spaces = $at_end_sentence ? ' ' : ' ';
$paragraph->{'word'} .= $new_spaces;
- $paragraph->{'last_char'} = ' ';
$paragraph->{'word_counter'} += length($new_spaces);
# The $paragraph->{'counter'} != 0 is here to avoid having an
@@ -385,9 +380,6 @@ sub add_text($$)
}
}
}
- #print STDERR "delete END_SENTENCE($paragraph->{'end_sentence'}):
spaces\n"
- # if (defined($paragraph->{'end_sentence'}) and $paragraph->{'DEBUG'});
- #delete $paragraph->{'end_sentence'};
if ($paragraph->{'counter'} + length($paragraph->{'space'})
> $paragraph->{'max'}) {
$result .= _cut_line($paragraph);
@@ -396,12 +388,16 @@ sub add_text($$)
and $paragraph->{'keep_end_lines'} and $spaces =~ /\n/) {
$result .= _end_line($paragraph);
}
+ $paragraph->{'last_letter'} = ' ';
} elsif (defined $added_word) {
my $tmp = $added_word;
- if (defined $paragraph->{'last_char'}) {
- # Use 'last_char' here because _add_next overwrites it.
- $tmp = $paragraph->{'last_char'} . $tmp;
- }
+ # Prepend 'last_letter' to add the information on the last
+ # letter even if it was read as part of a previous string
+ # Add it here because _add_next overwrites it. Note that
+ # if _add_next overwrited it, it wouldn't lead to an invalid
+ # result, as the wrong prepended 'last_letter' would not match
+ # at the end of the $added_word in the regex below anyway.
+ $tmp = $paragraph->{'last_letter'} . $tmp;
$result .= _add_next($paragraph, $added_word, undef,
!$newline_possible_flag);
@@ -414,9 +410,9 @@ sub add_text($$)
# do nothing in the case of a continuation of
after_punctuation_characters
} elsif (!$paragraph->{'unfilled'}
and $tmp =~
- /(^|[^\p{Upper}$after_punctuation_characters$end_sentence_character])
- [$after_punctuation_characters]*[$end_sentence_character]
- [$end_sentence_character\x08$after_punctuation_characters]*$/ox) {
+ /(^|[^\p{Upper}$after_punctuation_characters$end_sentence_characters])
+ [$after_punctuation_characters]*[$end_sentence_characters]
+ [$end_sentence_characters\x08$after_punctuation_characters]*$/ox) {
if ($paragraph->{'frenchspacing'}) {
$paragraph->{'end_sentence'} = -1;
} else {
@@ -424,27 +420,34 @@ sub add_text($$)
}
print STDERR "END_SENTENCE\n" if ($paragraph->{'DEBUG'});
} else {
- delete $paragraph->{'end_sentence'};
- print STDERR "delete END_SENTENCE($paragraph->{'end_sentence'}):
text\n"
+ print STDERR "delete END_SENTENCE($paragraph->{'end_sentence'})\n"
if (defined($paragraph->{'end_sentence'}) and $paragraph->{'DEBUG'});
+ delete $paragraph->{'end_sentence'};
}
} elsif (defined $fullwidth_segment) {
- print STDERR "EAST_ASIAN\n" if ($paragraph->{'DEBUG'});
+ print STDERR "FULLWIDTH\n" if ($paragraph->{'DEBUG'});
+
if (!defined($paragraph->{'word'})) {
$paragraph->{'word'} = '';
}
$paragraph->{'word'} .= $fullwidth_segment;
- $paragraph->{'last_char'} = $fullwidth_segment;
$paragraph->{'word_counter'} += 2;
+
+ # fullwidth latin letters can be upper case, so it is important to
+ # use the actual characters here.
+ $paragraph->{'last_letter'} = $fullwidth_segment;
+
+ # We allow a line break in between Chinese characters even if
+ # there was no space between them, unlike single-width
+ # characters.
if ($paragraph->{'counter'} != 0 and
- $paragraph->{'counter'} + $paragraph->{'word_counter'}
+ $paragraph->{'counter'} + $paragraph->{'word_counter'}
> $paragraph->{'max'}) {
$result .= _cut_line($paragraph);
}
if (!$paragraph->{'no_break'}
and !$paragraph->{'double_width_no_break'}) {
$result .= _add_pending_word($paragraph);
- $paragraph->{'space'} = '';
}
delete $paragraph->{'end_sentence'};
}
diff --git a/tp/Texinfo/Convert/Plaintext.pm b/tp/Texinfo/Convert/Plaintext.pm
index b46e2dfbf6..0dcc5952ec 100644
--- a/tp/Texinfo/Convert/Plaintext.pm
+++ b/tp/Texinfo/Convert/Plaintext.pm
@@ -621,6 +621,8 @@ sub _protect_sentence_ends ($) {
my $text = shift;
# Avoid suppressing end of sentence, by inserting a control character
# in front of the full stop. The choice of BS for this is arbitrary.
+ # Note that the use of ?: is not crucial but since we do not use the
+ # grouping value, setting no backtracking could be more efficient.
$text =~ s/(?<=[^\p{Upper}])
(?=[$end_sentence][$after_punctuation]*(?:\s|$))
/\x08/xg;
diff --git a/tp/Texinfo/XS/xspara.c b/tp/Texinfo/XS/xspara.c
index 5ca8260f8a..ea25f25a39 100644
--- a/tp/Texinfo/XS/xspara.c
+++ b/tp/Texinfo/XS/xspara.c
@@ -570,6 +570,8 @@ xspara__end_line (void)
state.lines_counter++;
state.end_line_count++;
+ /* could be set to other values, anything that is not upper case. */
+ state.last_letter = L'\n';
}
char *
@@ -678,6 +680,9 @@ xspara_end (void)
if (debug)
fprintf (stderr, "PARA END\n");
+ /* probably not really useful, but cleaner */
+ state.last_letter = L'\0';
+
xspara__add_pending_word (&ret, state.add_final_space);
if (!state.no_final_newline && state.counter != 0)
{
@@ -702,6 +707,12 @@ xspara_end (void)
/* check if a byte is in the printable ASCII range */
#define PRINTABLE_ASCII(c) (0x20 <= (c) && (c) <= 0x7E)
+/* ignored after end sentence character to determine if
+ at the end of a sentence */
+#define after_punctuation_characters "\"')]"
+/* characters triggering an end of sentence */
+#define end_sentence_characters ".?!"
+
/* Add WORD to paragraph in RESULT, not refilling WORD. If we go past the end
of the line start a new one. TRANSPARENT means that the letters in WORD
are ignored for the purpose of deciding whether a full stop ends a sentence
@@ -743,7 +754,8 @@ xspara__add_next (TEXT *result, char *word, int word_len,
int transparent)
}
while ((*p & 0xC0) == 0x80 && p > word);
- if (!strchr (".?!\"')", *p))
+ if (!strchr (end_sentence_characters
+ after_punctuation_characters, *p))
{
if (!PRINTABLE_ASCII(*p))
{
@@ -917,18 +929,13 @@ xspara_add_text (char *text, int len)
{
if (debug)
{
- char *word = "UNDEF";
- if (state.word.end > 0)
- word = state.word.text;
- fprintf(stderr, "p (%d+%d) s `%s', w `%s'\n", state.counter,
- state.word_counter, state.space.end == 0 ? ""
- : xspara__print_escaped_spaces (state.space.text),
- word);
+ fprintf(stderr, "p (%d+%d) s `%s', l `%lc', w `%s'\n", state.counter,
+ state.word_counter, state.last_letter, state.space.end == 0 ? ""
+ : xspara__print_escaped_spaces (state.space.text),
+ state.word.end > 0 ? state.word.text : "UNDEF");
}
if (isspace ((unsigned char) *p))
{
- state.last_letter = L'\0';
-
if (debug)
{
char t[2];
@@ -968,7 +975,6 @@ xspara_add_text (char *text, int len)
text_append_n (&state.word, " ", 1);
state.word_counter += 1;
}
- state.last_letter = ' ';
if (state.counter != 0
&& state.counter + state.word_counter
@@ -1027,6 +1033,7 @@ xspara_add_text (char *text, int len)
text_append (&result, "\n");
}
p++; len--;
+ state.last_letter = ' ';
continue;
}
@@ -1054,17 +1061,20 @@ xspara_add_text (char *text, int len)
/*************** Double width character. *********************/
if (width == 2)
{
- state.last_letter = L'\0';
+ if (debug)
+ fprintf (stderr, "FULLWIDTH\n");
+
+ text_append_n (&state.word, p, char_len);
+ state.word_counter += 2;
+
+ /* fullwidth latin letters can be upper case, so it is important to
+ use the actual characters here. */
+ state.last_letter = wc;
/* We allow a line break in between Chinese characters even if
there was no space between them, unlike single-width
characters. */
- /* Append wc to state.word. */
- text_append_n (&state.word, p, char_len);
-
- state.word_counter += 2;
-
if (state.counter != 0
&& state.counter + state.word_counter > state.max)
{
@@ -1075,8 +1085,8 @@ xspara_add_text (char *text, int len)
if (!state.no_break && !state.double_width_no_break)
{
xspara__add_pending_word (&result, 0);
- state.end_sentence = -2;
}
+ state.end_sentence = -2;
}
else if (wc == L'\b')
{
@@ -1099,7 +1109,7 @@ xspara_add_text (char *text, int len)
/* Now check if it is considered as an end of sentence, and
set state.end_sentence if it is. */
- if (strchr (".?!", *p) && !state.unfilled)
+ if (strchr (end_sentence_characters, *p) && !state.unfilled)
{
/* Doesn't count if preceded by an upper-case letter. */
if (!iswupper (state.last_letter))
@@ -1108,9 +1118,11 @@ xspara_add_text (char *text, int len)
state.end_sentence = -1;
else
state.end_sentence = 1;
+ if (debug)
+ fprintf (stderr, "END_SENTENCE\n");
}
}
- else if (strchr ("\"')]", *p))
+ else if (strchr (after_punctuation_characters, *p))
{
/* '"', '\'', ']' and ')' are ignored for the purpose
of deciding whether a full stop ends a sentence. */
@@ -1120,8 +1132,11 @@ xspara_add_text (char *text, int len)
/* Otherwise reset the end of sentence marker: a full stop in
a string like "aaaa.bbbb" doesn't mark an end of
sentence. */
- state.end_sentence = -2;
state.last_letter = wc;
+ if (debug && state.end_sentence != -2)
+ fprintf (stderr, "delete END_SENTENCE(%d)\n",
+ state.end_sentence);
+ state.end_sentence = -2;
}
}
else
diff --git a/tp/t/paragraph.t b/tp/t/paragraph.t
index c89f17184c..97c6ca75ce 100644
--- a/tp/t/paragraph.t
+++ b/tp/t/paragraph.t
@@ -7,7 +7,7 @@ use File::Basename;
use lib '.';
use Texinfo::ModulePath (undef, undef, undef, 'updirs' => 2);
-BEGIN { plan tests => 119 ; }
+BEGIN { plan tests => 122 ; }
use Texinfo::Convert::Paragraph;
@@ -21,8 +21,8 @@ sub test_para($$$;$)
my $conf = shift;
my $result = '';
- #$conf = {'DEBUG' => 1} if (!defined($conf));
$conf = {} if (!defined($conf));
+ $conf->{'DEBUG'} = 1;
my $para = Texinfo::Convert::Paragraph->new($conf);
foreach my $arg (@$args) {
$result .= add_text($para, $arg);
@@ -70,6 +70,14 @@ test_para(['word',' other'], "word\nother\n",
'two_elements_space_max', {'max' =
test_para(["\x{7b2c}\x{4e00} ",'other'], "\x{7b2c}\n\x{4e00}\nother\n",
'east_asian', {'max' => 2});
test_para(['word. other'], "word. other\n", 'two_words_dot_frenchspacing',
{'frenchspacing' => 1});
test_para(["aa.)\x{7b2c} b"], "aa.)\x{7b2c} b\n", 'end_sentence_east_asian');
+test_para(["B\x{7b2c}. After\x{7b2c}. Last"], "B\x{7b2c}. After\x{7b2c}.
Last\n",
+ 'east_asian_before_end_sentence');
+# uses a fullwidth b
+test_para(["B\x{ff42}. After\x{ff42}. Last"], "B\x{ff42}. After\x{ff42}.
Last\n",
+ 'fullwidth_lower_case_latin_before_end_sentence');
+# uses a fullwidth R
+test_para(["B\x{ff32}. After\x{ff32}. Last"], "B\x{ff32}. After\x{ff32}.
Last\n",
+ 'fullwidth_upper_case_latin_before_end_sentence');
test_para(["aaaa bbbbbbb cccccccc dddddddddddd eeeeeeeeeeee fffffffff
ggggggg"],
" aaaa
bbbbbbb
diff --git a/tp/t/plaintext_tests.t b/tp/t/plaintext_tests.t
index ff8010cb95..ceba400026 100644
--- a/tp/t/plaintext_tests.t
+++ b/tp/t/plaintext_tests.t
@@ -84,6 +84,10 @@ Before samp. @samp{a}. after samp, w @w{in w. after dot}
afterw
@exdent before samp. @samp{a}. after samp, w @w{in w. after dot} afterw
@end quotation
'],
+# This tests all the possibilities for end sentence related characters
+# ans splitting by commands (also tested in other tests)
+['split_punctuation_detection_in_commands',
+'Before @asis{B}@asis{)}@asis{.}]]? Afte@strong{R}@emph{"!}\'? Last'],
['html_expanded',
'Before
@html
@@ -383,6 +387,10 @@ and in emph.}
'Text.@asis{)
follows}.
'],
+# tests that upper case letter in code ends a sentence
+['code_commands_and_punctuation',
+'@code{AA}. @samp{aa}. After.
+'],
['sc_with_utf8_enable_encoding',
'@documentencoding utf-8
@sc{in sc}.
@@ -459,9 +467,6 @@ end footnote}
['command_brace_no_arg_punctuation',
'@TeX{}. And @LaTeX{}. @copyright{}. @registeredsymbol{}. End.
'],
-['code_commands_and_punctuation',
-'@code{AA}. @samp{aa}. After.
-'],
['sp_with_text_before_in_example',
'
@example
diff --git
a/tp/t/results/plaintext_tests/split_punctuation_detection_in_commands.pl
b/tp/t/results/plaintext_tests/split_punctuation_detection_in_commands.pl
new file mode 100644
index 0000000000..7062a49c90
--- /dev/null
+++ b/tp/t/results/plaintext_tests/split_punctuation_detection_in_commands.pl
@@ -0,0 +1,138 @@
+use vars qw(%result_texis %result_texts %result_trees %result_errors
+ %result_indices %result_sectioning %result_nodes %result_menus
+ %result_floats %result_converted %result_converted_errors
+ %result_elements %result_directions_text %result_indices_sort_strings);
+
+use utf8;
+
+$result_trees{'split_punctuation_detection_in_commands'} = {
+ 'contents' => [
+ {
+ 'contents' => [
+ {
+ 'contents' => [
+ {
+ 'text' => 'Before '
+ },
+ {
+ 'args' => [
+ {
+ 'contents' => [
+ {
+ 'text' => 'B'
+ }
+ ],
+ 'type' => 'brace_command_arg'
+ }
+ ],
+ 'cmdname' => 'asis',
+ 'source_info' => {
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => ''
+ }
+ },
+ {
+ 'args' => [
+ {
+ 'contents' => [
+ {
+ 'text' => ')'
+ }
+ ],
+ 'type' => 'brace_command_arg'
+ }
+ ],
+ 'cmdname' => 'asis',
+ 'source_info' => {
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => ''
+ }
+ },
+ {
+ 'args' => [
+ {
+ 'contents' => [
+ {
+ 'text' => '.'
+ }
+ ],
+ 'type' => 'brace_command_arg'
+ }
+ ],
+ 'cmdname' => 'asis',
+ 'source_info' => {
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => ''
+ }
+ },
+ {
+ 'text' => ']]? Afte'
+ },
+ {
+ 'args' => [
+ {
+ 'contents' => [
+ {
+ 'text' => 'R'
+ }
+ ],
+ 'type' => 'brace_command_arg'
+ }
+ ],
+ 'cmdname' => 'strong',
+ 'source_info' => {
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => ''
+ }
+ },
+ {
+ 'args' => [
+ {
+ 'contents' => [
+ {
+ 'text' => '"!'
+ }
+ ],
+ 'type' => 'brace_command_arg'
+ }
+ ],
+ 'cmdname' => 'emph',
+ 'source_info' => {
+ 'file_name' => '',
+ 'line_nr' => 1,
+ 'macro' => ''
+ }
+ },
+ {
+ 'text' => '\'? Last'
+ }
+ ],
+ 'type' => 'paragraph'
+ }
+ ],
+ 'type' => 'before_node_section'
+ }
+ ],
+ 'type' => 'document_root'
+};
+
+$result_texis{'split_punctuation_detection_in_commands'} = 'Before
@asis{B}@asis{)}@asis{.}]]? Afte@strong{R}@emph{"!}\'? Last';
+
+
+$result_texts{'split_punctuation_detection_in_commands'} = 'Before B).]]?
AfteR"!\'? Last';
+
+$result_errors{'split_punctuation_detection_in_commands'} = [];
+
+
+$result_floats{'split_punctuation_detection_in_commands'} = {};
+
+
+
+$result_converted{'plaintext'}->{'split_punctuation_detection_in_commands'} =
'Before B).]]? Afte*R*_"!_\'? Last
+';
+
+1;
- branch master updated: * tp/Texinfo/Convert/ParagraphNonXS.pm (new): rename 'last_char' as 'last_letter'. a Always set 'last_letter', to an empty string when it was previously undef.,
Patrice Dumas <=