texinfo-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): R


From: Gavin D. Smith
Subject: branch master updated: * tp/Texinfo/Convert/Unicode.pm (string_width): Reset count at a newline. Add a comment saying what the different character classes mean.
Date: Sun, 31 Dec 2023 15:54:47 -0500

This is an automated email from the git hooks/post-receive script.

gavin pushed a commit to branch master
in repository texinfo.

The following commit(s) were added to refs/heads/master by this push:
     new 4cbd328c8f * tp/Texinfo/Convert/Unicode.pm (string_width): Reset count 
at a newline.  Add a comment saying what the different character classes mean.
4cbd328c8f is described below

commit 4cbd328c8fb07f38ca52b5eaab6a76d297c1d3be
Author: Gavin Smith <gavinsmith0123@gmail.com>
AuthorDate: Sun Dec 31 20:54:39 2023 +0000

    * tp/Texinfo/Convert/Unicode.pm (string_width):
    Reset count at a newline.  Add a comment saying what the
    different character classes mean.
---
 ChangeLog                                          |   7 ++-
 tp/Texinfo/Convert/Unicode.pm                      |  21 +++++---
 .../formats_encodings/at_commands_in_refs.pl       |  56 ++++++++++-----------
 .../at_commands_in_refs_latin1.pl                  |   2 +-
 .../res_info/at_commands_in_refs_latin1.info       | Bin 8004 -> 7999 bytes
 .../formats_encodings/at_commands_in_refs_utf8.pl  |   2 +-
 .../res_info/at_commands_in_refs_utf8.info         | Bin 8401 -> 8396 bytes
 .../unclosed_verb_on_section_line.pl               |   2 +-
 8 files changed, 50 insertions(+), 40 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 345f8b2513..857e059e18 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2023-12-31  Gavin Smith <gavinsmith0123@gmail.com>
+
+       * tp/Texinfo/Convert/Unicode.pm (string_width):
+       Reset count at a newline.  Add a comment saying what the
+       different character classes mean.
+
 2023-12-31  Gavin Smith <gavinsmith0123@gmail.com>
 
        * tp/Texinfo/Translations.pm (gdt_string_columns): Adjust to
@@ -18,7 +24,6 @@
        (convert_xtable_command), commands_internal_conversion_table):
        implement convert_multitable_command and convert_xtable_command.
 
-
 2023-12-31  Patrice Dumas  <pertusus@free.fr>
 
        * tp/Texinfo/XS/convert/convert_html.c (convert_enumerate_command)
diff --git a/tp/Texinfo/Convert/Unicode.pm b/tp/Texinfo/Convert/Unicode.pm
index 4ccf70440b..bc977e61b1 100644
--- a/tp/Texinfo/Convert/Unicode.pm
+++ b/tp/Texinfo/Convert/Unicode.pm
@@ -1692,20 +1692,23 @@ sub string_width($)
   # Optimise for the common case where we can just return the length
   # of the string.  These regexes are faster than making the substitutions
   # below.
-  # IsPrint without \pM
+  # IsPrint without \p{Mark}.  Matches classes Letter, Number, Punct, Symbol,
+  # and Space_Separator.
   if ($string =~ /^[\p{L}\p{N}\p{P}\p{S}\p{Zs}]*$/
       and $string !~ /[\p{InFullwidth}]/) {
     return length($string);
   }
 
-  $string =~ s/\p{InFullwidth}/\x{02}/g;
-  $string =~ s/[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/\x{01}/g;
-  $string =~ s/[^\x{01}\x{02}]/\x{00}/g;
+  if ($string !~ /\n/) {
+    $string =~ s/\p{InFullwidth}/\x{02}/g;
+    $string =~ s/[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/\x{01}/g;
+    $string =~ s/[^\x{01}\x{02}]/\x{00}/g;
 
-  # This sums up the byte values of the bytes in $string, which now are
-  # all either 0, 1 or 2.  This is faster.  The original, more readable
-  # version is below.
-  return unpack("U0%32A*", $string);
+    # This sums up the byte values of the bytes in $string, which now are
+    # all either 0, 1 or 2.  This is faster.  The original, more readable
+    # version is below.
+    return unpack("U0%32A*", $string);
+  }
 
   if (! defined($string)) {
     cluck();
@@ -1716,6 +1719,8 @@ sub string_width($)
       $width += 2;
     } elsif ($character =~ /[\p{L}\p{N}\p{P}\p{S}\p{Zs}]/) {
       $width += 1;
+    } elsif ($character eq "\n") {
+      $width = 0;
     } else {
       # zero width character: \pC (including controls), \pM, \p{Zl}, \p{Zp}
     }
diff --git a/tp/t/results/formats_encodings/at_commands_in_refs.pl 
b/tp/t/results/formats_encodings/at_commands_in_refs.pl
index 64ac2c45e7..c41eaef674 100644
--- a/tp/t/results/formats_encodings/at_commands_in_refs.pl
+++ b/tp/t/results/formats_encodings/at_commands_in_refs.pl
@@ -13821,7 +13821,7 @@ $result_texts{'at_commands_in_refs'} = 'Top
 
 2     ! 
  .  . ? @
-*****************
+*********
 
 3 @ { } \\ #
 ***********
@@ -15603,7 +15603,7 @@ $result_converted{'plaintext'}->{'at_commands_in_refs'} 
= 'Top
 
 2     !
 . . ? @
-**************
+*********
 
 3 @ { } \\ #
 ***********
@@ -16760,7 +16760,7 @@ File: ,  Node:     ! . . ? @,  Next: @ { } \\ #,  Prev: 
{ },  Up: Top
 
 2     !
 . . ? @
-**************
+*********
 
 
 File: ,  Node: @ { } \\ #,  Next: LaTeX TeX • , © ... ...,  Prev:     ! . . 
? @,  Up: Top
@@ -16974,31 +16974,31 @@ Tag Table:
 Node: Top27
 Node: { }783
 Node:     ! . . ? @862
-Node: @ { } \\ #966
-Node: LaTeX TeX • , © ... ...1085
-Node: ≡ error→ € ¡ ↦ −1235
-Node: ≥ ≤ →1367
-Node: ª º ⋆ £ ⊣ ¿ ®1465
-Node: ⇒ ° a b a sunny day å1584
-Node: Å æ œ Æ Œ ø Ø ß ł Ł Ð ð Þ þ1741
-Node: ä ẽ î â à é ç ē e̊ e̋ ę1920
-Node: ė ĕ e̲ ẹ ě ȷ e͡e2086
-Node: ı Ḕ Ḉ2216
-Node: “ ” ‘ ’ „ ‚2314
-Node: « » « » ‹ ›2419
-Node: `` \'\' --- -- ` \'2535
-Node: AAA (fff) AAA BBB2659
-Node: CCC (rrr) CCC DDD2799
-Node: the someone <someone@somewher> <no_explain@there>2972
-Node: [f--ile1] [image src="f--ile.png" alt="alt" text="Image 
description\\"\\"\\\\."]3272
-Node:  @ {} . 3622
-Node: cite asis in @w b in r SC *str* t VAR dfn i3825
-Node: env code option samp command file C-x <ESC>4069
-Node: 8.27in4331
-Node: sansserif slanted4465
-Node: indicateurl4589
-Node: _{g}H 3^{rd}4711
-Node: <http://somewhere_aaa> text (url) ls4850
+Node: @ { } \\ #961
+Node: LaTeX TeX • , © ... ...1080
+Node: ≡ error→ € ¡ ↦ −1230
+Node: ≥ ≤ →1362
+Node: ª º ⋆ £ ⊣ ¿ ®1460
+Node: ⇒ ° a b a sunny day å1579
+Node: Å æ œ Æ Œ ø Ø ß ł Ł Ð ð Þ þ1736
+Node: ä ẽ î â à é ç ē e̊ e̋ ę1915
+Node: ė ĕ e̲ ẹ ě ȷ e͡e2081
+Node: ı Ḕ Ḉ2211
+Node: “ ” ‘ ’ „ ‚2309
+Node: « » « » ‹ ›2414
+Node: `` \'\' --- -- ` \'2530
+Node: AAA (fff) AAA BBB2654
+Node: CCC (rrr) CCC DDD2794
+Node: the someone <someone@somewher> <no_explain@there>2967
+Node: [f--ile1] [image src="f--ile.png" alt="alt" text="Image 
description\\"\\"\\\\."]3267
+Node:  @ {} . 3617
+Node: cite asis in @w b in r SC *str* t VAR dfn i3820
+Node: env code option samp command file C-x <ESC>4064
+Node: 8.27in4326
+Node: sansserif slanted4460
+Node: indicateurl4584
+Node: _{g}H 3^{rd}4706
+Node: <http://somewhere_aaa> text (url) ls4845
 
 End Tag Table
 
diff --git a/tp/t/results/formats_encodings/at_commands_in_refs_latin1.pl 
b/tp/t/results/formats_encodings/at_commands_in_refs_latin1.pl
index 4c59d7f069..4699c78b33 100644
--- a/tp/t/results/formats_encodings/at_commands_in_refs_latin1.pl
+++ b/tp/t/results/formats_encodings/at_commands_in_refs_latin1.pl
@@ -13893,7 +13893,7 @@ Top
 
 2     ! 
  .  . ? @
-*****************
+*********
 
 3 @ { } \\ #
 ***********
diff --git 
a/tp/t/results/formats_encodings/at_commands_in_refs_latin1/res_info/at_commands_in_refs_latin1.info
 
b/tp/t/results/formats_encodings/at_commands_in_refs_latin1/res_info/at_commands_in_refs_latin1.info
index 9ca877b9ba..e0a4dcd3fe 100644
Binary files 
a/tp/t/results/formats_encodings/at_commands_in_refs_latin1/res_info/at_commands_in_refs_latin1.info
 and 
b/tp/t/results/formats_encodings/at_commands_in_refs_latin1/res_info/at_commands_in_refs_latin1.info
 differ
diff --git a/tp/t/results/formats_encodings/at_commands_in_refs_utf8.pl 
b/tp/t/results/formats_encodings/at_commands_in_refs_utf8.pl
index 9745f7a371..c96fbc1d30 100644
--- a/tp/t/results/formats_encodings/at_commands_in_refs_utf8.pl
+++ b/tp/t/results/formats_encodings/at_commands_in_refs_utf8.pl
@@ -13893,7 +13893,7 @@ Top
 
 2     ! 
  .  . ? @
-*****************
+*********
 
 3 @ { } \\ #
 ***********
diff --git 
a/tp/t/results/formats_encodings/at_commands_in_refs_utf8/res_info/at_commands_in_refs_utf8.info
 
b/tp/t/results/formats_encodings/at_commands_in_refs_utf8/res_info/at_commands_in_refs_utf8.info
index 91d9958a6e..ca68180970 100644
Binary files 
a/tp/t/results/formats_encodings/at_commands_in_refs_utf8/res_info/at_commands_in_refs_utf8.info
 and 
b/tp/t/results/formats_encodings/at_commands_in_refs_utf8/res_info/at_commands_in_refs_utf8.info
 differ
diff --git a/tp/t/results/invalid_nestings/unclosed_verb_on_section_line.pl 
b/tp/t/results/invalid_nestings/unclosed_verb_on_section_line.pl
index 62b098f04c..4ebc85d3ab 100644
--- a/tp/t/results/invalid_nestings/unclosed_verb_on_section_line.pl
+++ b/tp/t/results/invalid_nestings/unclosed_verb_on_section_line.pl
@@ -78,7 +78,7 @@ T}';
 $result_texts{'unclosed_verb_on_section_line'} = '1 in section ruc
 
 Now text.
-=========================
+=========
 ';
 
 $result_sectioning{'unclosed_verb_on_section_line'} = {



reply via email to

[Prev in Thread] Current Thread [Next in Thread]