grep-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

grep branch, master, updated. v3.7-14-gb7d83f4


From: Paul Eggert
Subject: grep branch, master, updated. v3.7-14-gb7d83f4
Date: Tue, 24 Aug 2021 03:43:32 -0400 (EDT)

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "grep".

The branch, master has been updated
       via  b7d83f46d81a304e188c82877430765c29a75610 (commit)
       via  643e5573888c49ec7022c2cee04a312b70166d0d (commit)
       via  869989fa834c34ca2d5602555111c11f179ec8e4 (commit)
       via  70b84b9294480c8b5f12d7a0cd95e54584b08288 (commit)
       via  01b7b13f8376187bc870f4b5c6d91ded35a151d0 (commit)
       via  2b455da03fb9de9af75fecf5792844a1de108899 (commit)
      from  33b2d2eded9c9679853631ec94825247aae711ac (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.savannah.gnu.org/cgit/grep.git/commit/?id=b7d83f46d81a304e188c82877430765c29a75610


commit b7d83f46d81a304e188c82877430765c29a75610
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Aug 24 00:37:01 2021 -0700

    grep: scan back thru UTF-8 a bit faster
    
    * src/searchutils.c (mb_goback): When scanning backward through
    UTF-8, check the length implied by the putative byte 1 before
    bothering to invoke mb_clen.  This length check also lets us use
    mbrlen directly rather than calling mb_clen, which would
    eventually defer to mbrlen anyway.

diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char 
const *cur,
         for (int i = 1; i <= 3; i++)
           if ((cur[-i] & 0xc0) != 0x80)
             {
-              mbstate_t mbs = { 0 };
-              size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
-              if (i < clen && clen <= MB_LEN_MAX)
+              /* True if the length implied by the putative byte 1 at
+                 CUR[-I] extends at least through *CUR.  */
+              bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+              if (long_enough)
                 {
-                  /* This multibyte character contains *CUR.  */
-                  p0 = cur - i;
-                  p = p0 + clen;
+                  mbstate_t mbs = { 0 };
+                  size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+                  if (clen <= MB_LEN_MAX)
+                    {
+                      /* This multibyte character contains *CUR.  */
+                      p0 = cur - i;
+                      p = p0 + clen;
+                    }
                 }
               break;
             }

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=643e5573888c49ec7022c2cee04a312b70166d0d


commit b7d83f46d81a304e188c82877430765c29a75610
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Aug 24 00:37:01 2021 -0700

    grep: scan back thru UTF-8 a bit faster
    
    * src/searchutils.c (mb_goback): When scanning backward through
    UTF-8, check the length implied by the putative byte 1 before
    bothering to invoke mb_clen.  This length check also lets us use
    mbrlen directly rather than calling mb_clen, which would
    eventually defer to mbrlen anyway.

diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char 
const *cur,
         for (int i = 1; i <= 3; i++)
           if ((cur[-i] & 0xc0) != 0x80)
             {
-              mbstate_t mbs = { 0 };
-              size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
-              if (i < clen && clen <= MB_LEN_MAX)
+              /* True if the length implied by the putative byte 1 at
+                 CUR[-I] extends at least through *CUR.  */
+              bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+              if (long_enough)
                 {
-                  /* This multibyte character contains *CUR.  */
-                  p0 = cur - i;
-                  p = p0 + clen;
+                  mbstate_t mbs = { 0 };
+                  size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+                  if (clen <= MB_LEN_MAX)
+                    {
+                      /* This multibyte character contains *CUR.  */
+                      p0 = cur - i;
+                      p = p0 + clen;
+                    }
                 }
               break;
             }

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=869989fa834c34ca2d5602555111c11f179ec8e4


commit b7d83f46d81a304e188c82877430765c29a75610
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Aug 24 00:37:01 2021 -0700

    grep: scan back thru UTF-8 a bit faster
    
    * src/searchutils.c (mb_goback): When scanning backward through
    UTF-8, check the length implied by the putative byte 1 before
    bothering to invoke mb_clen.  This length check also lets us use
    mbrlen directly rather than calling mb_clen, which would
    eventually defer to mbrlen anyway.

diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char 
const *cur,
         for (int i = 1; i <= 3; i++)
           if ((cur[-i] & 0xc0) != 0x80)
             {
-              mbstate_t mbs = { 0 };
-              size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
-              if (i < clen && clen <= MB_LEN_MAX)
+              /* True if the length implied by the putative byte 1 at
+                 CUR[-I] extends at least through *CUR.  */
+              bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+              if (long_enough)
                 {
-                  /* This multibyte character contains *CUR.  */
-                  p0 = cur - i;
-                  p = p0 + clen;
+                  mbstate_t mbs = { 0 };
+                  size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+                  if (clen <= MB_LEN_MAX)
+                    {
+                      /* This multibyte character contains *CUR.  */
+                      p0 = cur - i;
+                      p = p0 + clen;
+                    }
                 }
               break;
             }

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=70b84b9294480c8b5f12d7a0cd95e54584b08288


commit b7d83f46d81a304e188c82877430765c29a75610
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Aug 24 00:37:01 2021 -0700

    grep: scan back thru UTF-8 a bit faster
    
    * src/searchutils.c (mb_goback): When scanning backward through
    UTF-8, check the length implied by the putative byte 1 before
    bothering to invoke mb_clen.  This length check also lets us use
    mbrlen directly rather than calling mb_clen, which would
    eventually defer to mbrlen anyway.

diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char 
const *cur,
         for (int i = 1; i <= 3; i++)
           if ((cur[-i] & 0xc0) != 0x80)
             {
-              mbstate_t mbs = { 0 };
-              size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
-              if (i < clen && clen <= MB_LEN_MAX)
+              /* True if the length implied by the putative byte 1 at
+                 CUR[-I] extends at least through *CUR.  */
+              bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+              if (long_enough)
                 {
-                  /* This multibyte character contains *CUR.  */
-                  p0 = cur - i;
-                  p = p0 + clen;
+                  mbstate_t mbs = { 0 };
+                  size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+                  if (clen <= MB_LEN_MAX)
+                    {
+                      /* This multibyte character contains *CUR.  */
+                      p0 = cur - i;
+                      p = p0 + clen;
+                    }
                 }
               break;
             }

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=01b7b13f8376187bc870f4b5c6d91ded35a151d0


commit b7d83f46d81a304e188c82877430765c29a75610
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Aug 24 00:37:01 2021 -0700

    grep: scan back thru UTF-8 a bit faster
    
    * src/searchutils.c (mb_goback): When scanning backward through
    UTF-8, check the length implied by the putative byte 1 before
    bothering to invoke mb_clen.  This length check also lets us use
    mbrlen directly rather than calling mb_clen, which would
    eventually defer to mbrlen anyway.

diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char 
const *cur,
         for (int i = 1; i <= 3; i++)
           if ((cur[-i] & 0xc0) != 0x80)
             {
-              mbstate_t mbs = { 0 };
-              size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
-              if (i < clen && clen <= MB_LEN_MAX)
+              /* True if the length implied by the putative byte 1 at
+                 CUR[-I] extends at least through *CUR.  */
+              bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+              if (long_enough)
                 {
-                  /* This multibyte character contains *CUR.  */
-                  p0 = cur - i;
-                  p = p0 + clen;
+                  mbstate_t mbs = { 0 };
+                  size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+                  if (clen <= MB_LEN_MAX)
+                    {
+                      /* This multibyte character contains *CUR.  */
+                      p0 = cur - i;
+                      p = p0 + clen;
+                    }
                 }
               break;
             }

http://git.savannah.gnu.org/cgit/grep.git/commit/?id=2b455da03fb9de9af75fecf5792844a1de108899


commit b7d83f46d81a304e188c82877430765c29a75610
Author: Paul Eggert <eggert@cs.ucla.edu>
Date:   Tue Aug 24 00:37:01 2021 -0700

    grep: scan back thru UTF-8 a bit faster
    
    * src/searchutils.c (mb_goback): When scanning backward through
    UTF-8, check the length implied by the putative byte 1 before
    bothering to invoke mb_clen.  This length check also lets us use
    mbrlen directly rather than calling mb_clen, which would
    eventually defer to mbrlen anyway.

diff --git a/src/searchutils.c b/src/searchutils.c
index f16dd84..0080dd7 100644
--- a/src/searchutils.c
+++ b/src/searchutils.c
@@ -107,13 +107,20 @@ mb_goback (char const **mb_start, size_t *mbclen, char 
const *cur,
         for (int i = 1; i <= 3; i++)
           if ((cur[-i] & 0xc0) != 0x80)
             {
-              mbstate_t mbs = { 0 };
-              size_t clen = mb_clen (cur - i, end - (cur - i), &mbs);
-              if (i < clen && clen <= MB_LEN_MAX)
+              /* True if the length implied by the putative byte 1 at
+                 CUR[-I] extends at least through *CUR.  */
+              bool long_enough = (~cur[-i] & 0xff) >> (7 - i) == 0;
+
+              if (long_enough)
                 {
-                  /* This multibyte character contains *CUR.  */
-                  p0 = cur - i;
-                  p = p0 + clen;
+                  mbstate_t mbs = { 0 };
+                  size_t clen = mbrlen (cur - i, end - (cur - i), &mbs);
+                  if (clen <= MB_LEN_MAX)
+                    {
+                      /* This multibyte character contains *CUR.  */
+                      p0 = cur - i;
+                      p = p0 + clen;
+                    }
                 }
               break;
             }

-----------------------------------------------------------------------

Summary of changes:
 src/grep.c        |  8 +++---
 src/kwset.c       |  4 ---
 src/search.h      |  4 +--
 src/searchutils.c | 79 +++++++++++++++++++++++++++++++++++++------------------
 4 files changed, 60 insertions(+), 35 deletions(-)


hooks/post-receive
-- 
grep



reply via email to

[Prev in Thread] Current Thread [Next in Thread]