GNU logs - #41535, boring messages


Message sent to bug-gzip@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#41535: [PATCH] performance optimization for aarch64
Resent-From: l00374334 <liqiang64@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gzip@HIDDEN
Resent-Date: Tue, 26 May 2020 05:18:02 +0000
Resent-Message-ID: <handler.41535.B.159047027910148 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: report 41535
X-GNU-PR-Package: gzip
X-GNU-PR-Keywords: patch
To: 41535 <at> debbugs.gnu.org, eggert@HIDDEN
Cc: luanjianhai@HIDDEN, liqiang64@HIDDEN, sangyan@HIDDEN, luchunhua@HIDDEN
X-Debbugs-Original-To: <bug-gzip@HIDDEN>, <eggert@HIDDEN>
Received: via spool by submit <at> debbugs.gnu.org id=B.159047027910148
          (code B ref -1); Tue, 26 May 2020 05:18:02 +0000
Received: (at submit) by debbugs.gnu.org; 26 May 2020 05:17:59 +0000
Received: from localhost ([127.0.0.1]:43369 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jdRyQ-0002db-Be
	for submit <at> debbugs.gnu.org; Tue, 26 May 2020 01:17:58 -0400
Received: from lists.gnu.org ([209.51.188.17]:56748)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <liqiang64@HIDDEN>) id 1jdPVi-0002uY-He
 for submit <at> debbugs.gnu.org; Mon, 25 May 2020 22:40:11 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:41510)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <liqiang64@HIDDEN>)
 id 1jdPVi-0005rV-BR
 for bug-gzip@HIDDEN; Mon, 25 May 2020 22:40:10 -0400
Received: from szxga06-in.huawei.com ([45.249.212.32]:33558 helo=huawei.com)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <liqiang64@HIDDEN>)
 id 1jdPVg-0000f4-DY
 for bug-gzip@HIDDEN; Mon, 25 May 2020 22:40:09 -0400
Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.60])
 by Forcepoint Email with ESMTP id 8CC6788B7A8D9320AB1D;
 Tue, 26 May 2020 10:39:52 +0800 (CST)
Received: from huawei.com (10.108.222.92) by DGGEMS402-HUB.china.huawei.com
 (10.3.19.202) with Microsoft SMTP Server id 14.3.487.0; Tue, 26 May 2020
 10:39:42 +0800
From: l00374334 <liqiang64@HIDDEN>
Date: Tue, 26 May 2020 10:39:40 +0800
Message-ID: <20200526023940.1967-1-liqiang64@HIDDEN>
X-Mailer: git-send-email 2.23.0.windows.1
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
X-Originating-IP: [10.108.222.92]
X-CFilter-Loop: Reflected
Received-SPF: pass client-ip=45.249.212.32; envelope-from=liqiang64@HIDDEN;
 helo=huawei.com
X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/25 22:39:53
X-ACL-Warn: Detected OS   = Linux 3.11 and newer [fuzzy]
X-Spam_score_int: -41
X-Spam_score: -4.2
X-Spam_bar: ----
X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Mailman-Approved-At: Tue, 26 May 2020 01:17:56 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

From: liqiang <liqiang64@HIDDEN>

By analyzing the compression and decompression process of gzip, I found =0D
that the hot spots of CRC32 and longest_match function are very high.=0D
=0D
On the aarch64 architecture, we can optimize the efficiency of crc32 =0D
through the interface provided by the neon instruction set (12x faster =0D
in aarch64), and optimize the performance of random access code through =0D
prefetch instructions (about 5%~8% improvement). In some compression =0D
scenarios, loop expansion can also get a certain performance improvement =0D
(about 10%).=0D
=0D
Modify by Li Qiang.

---
 configure | 14 ++++++++++++++
 deflate.c | 30 +++++++++++++++++++++++++++++-
 util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index cab3daf..dc80cb6 100644
--- a/configure
+++ b/configure
@@ -14555,6 +14555,20 @@ rm -f core conftest.err conftest.$ac_objext confte=
st.$ac_ext
            ;;
=20
          arm* | aarch64 )
+           cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#if defined __ARM_NEON__ || defined __ARM_NEON
+                   int ok;
+                  #else
+                   error fail
+                  #endif
+
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
+then :
+  CFLAGS=3D"$CFLAGS -march=3Darmv8-a+crc"
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
            # Assume arm with EABI.
            # On arm64 systems, the C compiler may be generating code in on=
e of
            # these ABIs:
diff --git a/deflate.c b/deflate.c
index 9d379e9..ee77ffd 100644
--- a/deflate.c
+++ b/deflate.c
@@ -378,6 +378,9 @@ longest_match(IPos cur_match)
     register int len;                           /* length of current match=
 */=0D
     int best_len =3D prev_length;                 /* best match length so =
far */=0D
     IPos limit =3D strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST :=
 NIL;=0D
+#ifdef __aarch64__=0D
+    IPos next_match;=0D
+#endif=0D
     /* Stop when cur_match becomes <=3D limit. To simplify the code,=0D
      * we prevent matches with the string of window index 0.=0D
      */=0D
@@ -411,6 +414,10 @@ longest_match(IPos cur_match)
     do {=0D
         Assert(cur_match < strstart, "no future");=0D
         match =3D window + cur_match;=0D
+#ifdef __aarch64__=0D
+        next_match =3D prev[cur_match & WMASK];=0D
+        __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK])));=
=0D
+#endif=0D
 =0D
         /* Skip to next match if the match length cannot increase=0D
          * or if the match length is less than 2:=0D
@@ -488,8 +495,14 @@ longest_match(IPos cur_match)
             scan_end   =3D scan[best_len];=0D
 #endif=0D
         }=0D
-    } while ((cur_match =3D prev[cur_match & WMASK]) > limit=0D
+    }=0D
+#ifdef __aarch64__=0D
+    while ((cur_match =3D next_match) > limit=0D
+             && --chain_length !=3D 0);=0D
+#else=0D
+    while ((cur_match =3D prev[cur_match & WMASK]) > limit=0D
              && --chain_length !=3D 0);=0D
+#endif=0D
 =0D
     return best_len;=0D
 }=0D
@@ -777,7 +790,22 @@ deflate (int pack_level)
             lookahead -=3D prev_length-1;=0D
             prev_length -=3D 2;=0D
             RSYNC_ROLL(strstart, prev_length+1);=0D
+            while (prev_length >=3D 4) {=0D
+                /* After actual verification, expanding this loop=0D
+                 * can improve its performance in certain scenarios.=0D
+                 */=0D
+                prev_length -=3D 4;=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+            }=0D
             do {=0D
+                if (prev_length =3D=3D 0) break;=0D
                 strstart++;=0D
                 INSERT_STRING(strstart, hash_head);=0D
                 /* strstart never exceeds WSIZE-MAX_MATCH, so there are=0D
diff --git a/util.c b/util.c
index 0a0fc21..c9f0e52 100644
--- a/util.c
+++ b/util.c
@@ -38,6 +38,12 @@
 =0D
 static int write_buffer (int, voidp, unsigned int);=0D
 =0D
+#if defined __ARM_NEON__ || defined __ARM_NEON=0D
+#define CRC32D(crc, val) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#define CRC32W(crc, val) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#define CRC32H(crc, val) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#define CRC32B(crc, val) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#else=0D
 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0D
  * Table of CRC-32's of all single-byte values (made by makecrc.c)=0D
  */=0D
@@ -95,6 +101,7 @@ static const ulg crc_32_tab[] =3D {
   0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,=0D
   0x2d02ef8dL=0D
 };=0D
+#endif=0D
 =0D
 /* Shift register contents.  */=0D
 static ulg crc =3D 0xffffffffL;=0D
@@ -132,6 +139,43 @@ ulg updcrc(s, n)
     const uch *s;           /* pointer to bytes to pump through */=0D
     unsigned n;             /* number of bytes in s[] */=0D
 {=0D
+#if defined __ARM_NEON__ || defined __ARM_NEON=0D
+    register ulg c;=0D
+    static ulg crc =3D (ulg)0xffffffffL;=0D
+    register const uint8_t  *buf1;=0D
+    register const uint16_t *buf2;=0D
+    register const uint32_t *buf4;=0D
+    register const uint64_t *buf8;=0D
+    int64_t length =3D (int64_t)n;=0D
+    buf8 =3D (const  uint64_t *)(const void *)s;=0D
+=0D
+    if (s =3D=3D NULL) {=0D
+        c =3D 0xffffffffL;=0D
+    } else {=0D
+        c =3D crc;=0D
+        while(length >=3D sizeof(uint64_t)) {=0D
+            CRC32D(c, *buf8++);=0D
+            length -=3D sizeof(uint64_t);=0D
+        }=0D
+        buf4 =3D (const uint32_t *)(const void *)buf8;=0D
+        if (length >=3D sizeof(uint32_t)) {=0D
+            CRC32W(c, *buf4++);=0D
+            length -=3D sizeof(uint32_t);=0D
+        }=0D
+        buf2 =3D (const uint16_t *)(const void *)buf4;=0D
+        if(length >=3D sizeof(uint16_t)) {=0D
+            CRC32H(c, *buf2++);=0D
+            length -=3D sizeof(uint16_t);=0D
+        }=0D
+        buf1 =3D (const uint8_t *)(const void *)buf2;=0D
+        if (length >=3D sizeof(uint8_t)) {=0D
+            CRC32B(c, *buf1);=0D
+            length -=3D sizeof(uint8_t);=0D
+        }=0D
+    }=0D
+    crc =3D c;=0D
+    return (c ^ 0xffffffffL);=0D
+#else=0D
     register ulg c;         /* temporary variable */=0D
 =0D
     if (s =3D=3D NULL) {=0D
@@ -144,6 +188,7 @@ ulg updcrc(s, n)
     }=0D
     crc =3D c;=0D
     return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) *=
/=0D
+#endif=0D
 }=0D
 =0D
 /* Return a current CRC value.  */=0D
--=20
2.17.1






Message sent:


Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-Mailer: MIME-tools 5.505 (Entity 5.505)
Content-Type: text/plain; charset=utf-8
X-Loop: help-debbugs@HIDDEN
From: help-debbugs@HIDDEN (GNU bug Tracking System)
To: l00374334 <liqiang64@HIDDEN>
Subject: bug#41535: Acknowledgement ([PATCH] performance optimization for
 aarch64)
Message-ID: <handler.41535.B.159047027910148.ack <at> debbugs.gnu.org>
References: <20200526023940.1967-1-liqiang64@HIDDEN>
X-Gnu-PR-Message: ack 41535
X-Gnu-PR-Package: gzip
X-Gnu-PR-Keywords: patch
Reply-To: 41535 <at> debbugs.gnu.org
Date: Tue, 26 May 2020 05:18:02 +0000

Thank you for filing a new bug report with debbugs.gnu.org.

This is an automatically generated reply to let you know your message
has been received.

Your message is being forwarded to the package maintainers and other
interested parties for their attention; they will reply in due course.

Your message has been sent to the package maintainer(s):
 bug-gzip@HIDDEN

If you wish to submit further information on this problem, please
send it to 41535 <at> debbugs.gnu.org.

Please do not send mail to help-debbugs@HIDDEN unless you wish
to report a problem with the Bug-tracking system.

--=20
41535: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=3D41535
GNU Bug Tracking System
Contact help-debbugs@HIDDEN with problems


Message sent to bug-gzip@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#41535: [PATCH] performance optimization for aarch64
Resent-From: Li Qiang <liqiang64@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gzip@HIDDEN
Resent-Date: Sat, 30 May 2020 09:19:02 +0000
Resent-Message-ID: <handler.41535.B41535.15908302926026 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 41535
X-GNU-PR-Package: gzip
X-GNU-PR-Keywords: patch
To: <41535 <at> debbugs.gnu.org>
Cc: luanjianhai@HIDDEN, eggert@HIDDEN, sangyan@HIDDEN, colordev.jiang@HIDDEN, luchunhua@HIDDEN, huxinwei@HIDDEN, meyering@HIDDEN
Received: via spool by 41535-submit <at> debbugs.gnu.org id=B41535.15908302926026
          (code B ref 41535); Sat, 30 May 2020 09:19:02 +0000
Received: (at 41535) by debbugs.gnu.org; 30 May 2020 09:18:12 +0000
Received: from localhost ([127.0.0.1]:56954 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jexd5-0001Z7-Hd
	for submit <at> debbugs.gnu.org; Sat, 30 May 2020 05:18:11 -0400
Received: from szxga05-in.huawei.com ([45.249.212.191]:2296 helo=huawei.com)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <liqiang64@HIDDEN>) id 1jexd2-0001Yf-RA
 for 41535 <at> debbugs.gnu.org; Sat, 30 May 2020 05:18:09 -0400
Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60])
 by Forcepoint Email with ESMTP id 2696B6668135339376EC;
 Sat, 30 May 2020 17:18:01 +0800 (CST)
Received: from [127.0.0.1] (10.108.222.92) by DGGEMS409-HUB.china.huawei.com
 (10.3.19.209) with Microsoft SMTP Server id 14.3.487.0; Sat, 30 May 2020
 17:17:51 +0800
References: <20200526023940.1967-1-liqiang64@HIDDEN>
From: Li Qiang <liqiang64@HIDDEN>
Message-ID: <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
Date: Sat, 30 May 2020 17:17:49 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
 Thunderbird/68.8.1
MIME-Version: 1.0
In-Reply-To: <20200526023940.1967-1-liqiang64@HIDDEN>
Content-Type: text/plain; charset="gbk"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.108.222.92]
X-CFilter-Loop: Reflected
X-Spam-Score: -2.3 (--)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)



在 2020/5/26 10:39, l00374334 写道:
> From: liqiang <liqiang64@HIDDEN>
> 
> By analyzing the compression and decompression process of gzip, I found 
> 
> that the hot spots of CRC32 and longest_match function are very high.
> 
> 
> 
> On the aarch64 architecture, we can optimize the efficiency of crc32 
> 
> through the interface provided by the neon instruction set (12x faster 
> 
> in aarch64), and optimize the performance of random access code through 
> 
> prefetch instructions (about 5%~8% improvement). In some compression 
> 
> scenarios, loop expansion can also get a certain performance improvement 
> 
> (about 10%).
> 
> 
> 
> Modify by Li Qiang.
> 
> ---
>  configure | 14 ++++++++++++++
>  deflate.c | 30 +++++++++++++++++++++++++++++-
>  util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index cab3daf..dc80cb6 100644
> --- a/configure
> +++ b/configure
> @@ -14555,6 +14555,20 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>             ;;
>  
>           arm* | aarch64 )
> +           cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +#if defined __ARM_NEON__ || defined __ARM_NEON
> +                   int ok;
> +                  #else
> +                   error fail
> +                  #endif
> +
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"
> +then :
> +  CFLAGS="$CFLAGS -march=armv8-a+crc"
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>             # Assume arm with EABI.
>             # On arm64 systems, the C compiler may be generating code in one of
>             # these ABIs:
> diff --git a/deflate.c b/deflate.c
> index 9d379e9..ee77ffd 100644
> --- a/deflate.c
> +++ b/deflate.c
> @@ -378,6 +378,9 @@ longest_match(IPos cur_match)
>      register int len;                           /* length of current match */
> 
>      int best_len = prev_length;                 /* best match length so far */
> 
>      IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;
> 
> +#ifdef __aarch64__
> 
> +    IPos next_match;
> 
> +#endif
> 
>      /* Stop when cur_match becomes <= limit. To simplify the code,
> 
>       * we prevent matches with the string of window index 0.
> 
>       */
> 
> @@ -411,6 +414,10 @@ longest_match(IPos cur_match)
>      do {
> 
>          Assert(cur_match < strstart, "no future");
> 
>          match = window + cur_match;
> 
> +#ifdef __aarch64__
> 
> +        next_match = prev[cur_match & WMASK];
> 
> +        __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK])));
> 
> +#endif
> 
>  
> 
>          /* Skip to next match if the match length cannot increase
> 
>           * or if the match length is less than 2:
> 
> @@ -488,8 +495,14 @@ longest_match(IPos cur_match)
>              scan_end   = scan[best_len];
> 
>  #endif
> 
>          }
> 
> -    } while ((cur_match = prev[cur_match & WMASK]) > limit
> 
> +    }
> 
> +#ifdef __aarch64__
> 
> +    while ((cur_match = next_match) > limit
> 
> +             && --chain_length != 0);
> 
> +#else
> 
> +    while ((cur_match = prev[cur_match & WMASK]) > limit
> 
>               && --chain_length != 0);
> 
> +#endif
> 
>  
> 
>      return best_len;
> 
>  }
> 
> @@ -777,7 +790,22 @@ deflate (int pack_level)
>              lookahead -= prev_length-1;
> 
>              prev_length -= 2;
> 
>              RSYNC_ROLL(strstart, prev_length+1);
> 
> +            while (prev_length >= 4) {
> 
> +                /* After actual verification, expanding this loop
> 
> +                 * can improve its performance in certain scenarios.
> 
> +                 */
> 
> +                prev_length -= 4;
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +            }
> 
>              do {
> 
> +                if (prev_length == 0) break;
> 
>                  strstart++;
> 
>                  INSERT_STRING(strstart, hash_head);
> 
>                  /* strstart never exceeds WSIZE-MAX_MATCH, so there are
> 
> diff --git a/util.c b/util.c
> index 0a0fc21..c9f0e52 100644
> --- a/util.c
> +++ b/util.c
> @@ -38,6 +38,12 @@
>  
> 
>  static int write_buffer (int, voidp, unsigned int);
> 
>  
> 
> +#if defined __ARM_NEON__ || defined __ARM_NEON
> 
> +#define CRC32D(crc, val) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#define CRC32W(crc, val) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#define CRC32H(crc, val) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#define CRC32B(crc, val) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#else
> 
>  /* ========================================================================
> 
>   * Table of CRC-32's of all single-byte values (made by makecrc.c)
> 
>   */
> 
> @@ -95,6 +101,7 @@ static const ulg crc_32_tab[] = {
>    0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
> 
>    0x2d02ef8dL
> 
>  };
> 
> +#endif
> 
>  
> 
>  /* Shift register contents.  */
> 
>  static ulg crc = 0xffffffffL;
> 
> @@ -132,6 +139,43 @@ ulg updcrc(s, n)
>      const uch *s;           /* pointer to bytes to pump through */
> 
>      unsigned n;             /* number of bytes in s[] */
> 
>  {
> 
> +#if defined __ARM_NEON__ || defined __ARM_NEON
> 
> +    register ulg c;
> 
> +    static ulg crc = (ulg)0xffffffffL;
> 
> +    register const uint8_t  *buf1;
> 
> +    register const uint16_t *buf2;
> 
> +    register const uint32_t *buf4;
> 
> +    register const uint64_t *buf8;
> 
> +    int64_t length = (int64_t)n;
> 
> +    buf8 = (const  uint64_t *)(const void *)s;
> 
> +
> 
> +    if (s == NULL) {
> 
> +        c = 0xffffffffL;
> 
> +    } else {
> 
> +        c = crc;
> 
> +        while(length >= sizeof(uint64_t)) {
> 
> +            CRC32D(c, *buf8++);
> 
> +            length -= sizeof(uint64_t);
> 
> +        }
> 
> +        buf4 = (const uint32_t *)(const void *)buf8;
> 
> +        if (length >= sizeof(uint32_t)) {
> 
> +            CRC32W(c, *buf4++);
> 
> +            length -= sizeof(uint32_t);
> 
> +        }
> 
> +        buf2 = (const uint16_t *)(const void *)buf4;
> 
> +        if(length >= sizeof(uint16_t)) {
> 
> +            CRC32H(c, *buf2++);
> 
> +            length -= sizeof(uint16_t);
> 
> +        }
> 
> +        buf1 = (const uint8_t *)(const void *)buf2;
> 
> +        if (length >= sizeof(uint8_t)) {
> 
> +            CRC32B(c, *buf1);
> 
> +            length -= sizeof(uint8_t);
> 
> +        }
> 
> +    }
> 
> +    crc = c;
> 
> +    return (c ^ 0xffffffffL);
> 
> +#else
> 
>      register ulg c;         /* temporary variable */
> 
>  
> 
>      if (s == NULL) {
> 
> @@ -144,6 +188,7 @@ ulg updcrc(s, n)
>      }
> 
>      crc = c;
> 
>      return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) */
> 
> +#endif
> 
>  }
> 
>  
> 
>  /* Return a current CRC value.  */
> 

Please allow me to show a set of actual test data for this patch.

First, I made an original version of the program "gzip-1.10" based
on the gzip-1.10 source code, and then made an optimized version of
the program "gzip-optimized" after applying my optimization patch.

Next I use gzip-1.10 version to test the compression and decompression
time on some **xml** files:
[XML]# time ./gzip-1.10 *.xml

real    0m5.099s
user    0m4.384s
sys     0m0.176s
[XML]# time ./gzip-1.10 -d *.gz

real    0m2.173s
user    0m1.821s
sys     0m0.348s

Then use the optimized version to compare:
[XML]# time ./gzip-optimized *.xml

real    0m2.785s
user    0m2.576s
sys     0m0.204s
[XML]# time ./gzip-optimized -d *.gz

real    0m0.497s
user    0m0.176s
sys     0m0.320s


The next test object is a large **log** file:
[LOG]# time ./gzip-1.10 *.log

real    0m8.883s
user    0m8.652s
sys     0m0.217s
[LOG]# time ./gzip-1.10 -d *.gz

real    0m3.049s
user    0m2.604s
sys     0m0.439s

Also use the optimized version to compare:
[LOG]# time ./gzip-optimized *.log

real    0m6.882s
user    0m6.607s
sys     0m0.264s
[LOG]# time ./gzip-optimized -d *.gz

real    0m1.054s
user    0m0.622s
sys     0m0.431s

The above experimental data are from the aarch64 platform.

-- 
Best regards,
Li Qiang





Message sent to bug-gzip@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#41535: [PATCH] performance optimization for aarch64
Resent-From: Li Qiang <liqiang64@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gzip@HIDDEN
Resent-Date: Thu, 20 Aug 2020 08:56:01 +0000
Resent-Message-ID: <handler.41535.B41535.159791374216029 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 41535
X-GNU-PR-Package: gzip
X-GNU-PR-Keywords: patch
To: <41535 <at> debbugs.gnu.org>
Cc: meyering@HIDDEN, eggert@HIDDEN
Received: via spool by 41535-submit <at> debbugs.gnu.org id=B41535.159791374216029
          (code B ref 41535); Thu, 20 Aug 2020 08:56:01 +0000
Received: (at 41535) by debbugs.gnu.org; 20 Aug 2020 08:55:42 +0000
Received: from localhost ([127.0.0.1]:41602 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1k8gMH-0004AS-Qx
	for submit <at> debbugs.gnu.org; Thu, 20 Aug 2020 04:55:42 -0400
Received: from szxga06-in.huawei.com ([45.249.212.32]:56900 helo=huawei.com)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <liqiang64@HIDDEN>) id 1k8gMC-0004AA-RR
 for 41535 <at> debbugs.gnu.org; Thu, 20 Aug 2020 04:55:40 -0400
Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.58])
 by Forcepoint Email with ESMTP id 8B77EE9DAA5AC3580D5D;
 Thu, 20 Aug 2020 16:55:28 +0800 (CST)
Received: from [127.0.0.1] (10.108.234.107) by DGGEMS404-HUB.china.huawei.com
 (10.3.19.204) with Microsoft SMTP Server id 14.3.487.0;
 Thu, 20 Aug 2020 16:55:27 +0800
From: Li Qiang <liqiang64@HIDDEN>
References: <20200526023940.1967-1-liqiang64@HIDDEN>
 <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
Message-ID: <5e6aa834-6dc0-4465-5807-088eb53abd05@HIDDEN>
Date: Thu, 20 Aug 2020 16:55:26 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
 Thunderbird/68.8.1
MIME-Version: 1.0
In-Reply-To: <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
Content-Type: text/plain; charset="gbk"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.108.234.107]
X-CFilter-Loop: Reflected
X-Spam-Score: -3.7 (---)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.7 (----)



在 2020/5/30 17:17, Li Qiang 写道:
> 
> 
> 在 2020/5/26 10:39, l00374334 写道:
>> From: liqiang <liqiang64@HIDDEN>
>>
>> By analyzing the compression and decompression process of gzip, I found 
>>
>> that the hot spots of CRC32 and longest_match function are very high.
>>
>>
>>
>> On the aarch64 architecture, we can optimize the efficiency of crc32 
>>
>> through the interface provided by the neon instruction set (12x faster 
>>
>> in aarch64), and optimize the performance of random access code through 
>>
>> prefetch instructions (about 5%~8% improvement). In some compression 
>>
>> scenarios, loop expansion can also get a certain performance improvement 
>>
>> (about 10%).
>>
>>
>>
>> Modify by Li Qiang.
>>
>> ---
>>  configure | 14 ++++++++++++++
>>  deflate.c | 30 +++++++++++++++++++++++++++++-
>>  util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 88 insertions(+), 1 deletion(-)
>>
>> diff --git a/configure b/configure
>> index cab3daf..dc80cb6 100644
>> --- a/configure
>> +++ b/configure
>> @@ -14555,6 +14555,20 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>>             ;;
>>  
>>           arm* | aarch64 )
>> +           cat confdefs.h - <<_ACEOF >conftest.$ac_ext
>> +/* end confdefs.h.  */
>> +#if defined __ARM_NEON__ || defined __ARM_NEON
>> +                   int ok;
>> +                  #else
>> +                   error fail
>> +                  #endif
>> +
>> +_ACEOF
>> +if ac_fn_c_try_compile "$LINENO"
>> +then :
>> +  CFLAGS="$CFLAGS -march=armv8-a+crc"
>> +fi
>> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>>             # Assume arm with EABI.
>>             # On arm64 systems, the C compiler may be generating code in one of
>>             # these ABIs:
>> diff --git a/deflate.c b/deflate.c
>> index 9d379e9..ee77ffd 100644
>> --- a/deflate.c
>> +++ b/deflate.c
>> @@ -378,6 +378,9 @@ longest_match(IPos cur_match)
>>      register int len;                           /* length of current match */
>>
>>      int best_len = prev_length;                 /* best match length so far */
>>
>>      IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;
>>
>> +#ifdef __aarch64__
>>
>> +    IPos next_match;
>>
>> +#endif
>>
>>      /* Stop when cur_match becomes <= limit. To simplify the code,
>>
>>       * we prevent matches with the string of window index 0.
>>
>>       */
>>
>> @@ -411,6 +414,10 @@ longest_match(IPos cur_match)
>>      do {
>>
>>          Assert(cur_match < strstart, "no future");
>>
>>          match = window + cur_match;
>>
>> +#ifdef __aarch64__
>>
>> +        next_match = prev[cur_match & WMASK];
>>
>> +        __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK])));
>>
>> +#endif
>>
>>  
>>
>>          /* Skip to next match if the match length cannot increase
>>
>>           * or if the match length is less than 2:
>>
>> @@ -488,8 +495,14 @@ longest_match(IPos cur_match)
>>              scan_end   = scan[best_len];
>>
>>  #endif
>>
>>          }
>>
>> -    } while ((cur_match = prev[cur_match & WMASK]) > limit
>>
>> +    }
>>
>> +#ifdef __aarch64__
>>
>> +    while ((cur_match = next_match) > limit
>>
>> +             && --chain_length != 0);
>>
>> +#else
>>
>> +    while ((cur_match = prev[cur_match & WMASK]) > limit
>>
>>               && --chain_length != 0);
>>
>> +#endif
>>
>>  
>>
>>      return best_len;
>>
>>  }
>>
>> @@ -777,7 +790,22 @@ deflate (int pack_level)
>>              lookahead -= prev_length-1;
>>
>>              prev_length -= 2;
>>
>>              RSYNC_ROLL(strstart, prev_length+1);
>>
>> +            while (prev_length >= 4) {
>>
>> +                /* After actual verification, expanding this loop
>>
>> +                 * can improve its performance in certain scenarios.
>>
>> +                 */
>>
>> +                prev_length -= 4;
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +            }
>>
>>              do {
>>
>> +                if (prev_length == 0) break;
>>
>>                  strstart++;
>>
>>                  INSERT_STRING(strstart, hash_head);
>>
>>                  /* strstart never exceeds WSIZE-MAX_MATCH, so there are
>>
>> diff --git a/util.c b/util.c
>> index 0a0fc21..c9f0e52 100644
>> --- a/util.c
>> +++ b/util.c
>> @@ -38,6 +38,12 @@
>>  
>>
>>  static int write_buffer (int, voidp, unsigned int);
>>
>>  
>>
>> +#if defined __ARM_NEON__ || defined __ARM_NEON
>>
>> +#define CRC32D(crc, val) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#define CRC32W(crc, val) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#define CRC32H(crc, val) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#define CRC32B(crc, val) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#else
>>
>>  /* ========================================================================
>>
>>   * Table of CRC-32's of all single-byte values (made by makecrc.c)
>>
>>   */
>>
>> @@ -95,6 +101,7 @@ static const ulg crc_32_tab[] = {
>>    0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
>>
>>    0x2d02ef8dL
>>
>>  };
>>
>> +#endif
>>
>>  
>>
>>  /* Shift register contents.  */
>>
>>  static ulg crc = 0xffffffffL;
>>
>> @@ -132,6 +139,43 @@ ulg updcrc(s, n)
>>      const uch *s;           /* pointer to bytes to pump through */
>>
>>      unsigned n;             /* number of bytes in s[] */
>>
>>  {
>>
>> +#if defined __ARM_NEON__ || defined __ARM_NEON
>>
>> +    register ulg c;
>>
>> +    static ulg crc = (ulg)0xffffffffL;
>>
>> +    register const uint8_t  *buf1;
>>
>> +    register const uint16_t *buf2;
>>
>> +    register const uint32_t *buf4;
>>
>> +    register const uint64_t *buf8;
>>
>> +    int64_t length = (int64_t)n;
>>
>> +    buf8 = (const  uint64_t *)(const void *)s;
>>
>> +
>>
>> +    if (s == NULL) {
>>
>> +        c = 0xffffffffL;
>>
>> +    } else {
>>
>> +        c = crc;
>>
>> +        while(length >= sizeof(uint64_t)) {
>>
>> +            CRC32D(c, *buf8++);
>>
>> +            length -= sizeof(uint64_t);
>>
>> +        }
>>
>> +        buf4 = (const uint32_t *)(const void *)buf8;
>>
>> +        if (length >= sizeof(uint32_t)) {
>>
>> +            CRC32W(c, *buf4++);
>>
>> +            length -= sizeof(uint32_t);
>>
>> +        }
>>
>> +        buf2 = (const uint16_t *)(const void *)buf4;
>>
>> +        if(length >= sizeof(uint16_t)) {
>>
>> +            CRC32H(c, *buf2++);
>>
>> +            length -= sizeof(uint16_t);
>>
>> +        }
>>
>> +        buf1 = (const uint8_t *)(const void *)buf2;
>>
>> +        if (length >= sizeof(uint8_t)) {
>>
>> +            CRC32B(c, *buf1);
>>
>> +            length -= sizeof(uint8_t);
>>
>> +        }
>>
>> +    }
>>
>> +    crc = c;
>>
>> +    return (c ^ 0xffffffffL);
>>
>> +#else
>>
>>      register ulg c;         /* temporary variable */
>>
>>  
>>
>>      if (s == NULL) {
>>
>> @@ -144,6 +188,7 @@ ulg updcrc(s, n)
>>      }
>>
>>      crc = c;
>>
>>      return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) */
>>
>> +#endif
>>
>>  }
>>
>>  
>>
>>  /* Return a current CRC value.  */
>>
> 
> Please allow me to show a set of actual test data for this patch.
> 
> First, I made an original version of the program "gzip-1.10" based
> on the gzip-1.10 source code, and then made an optimized version of
> the program "gzip-optimized" after applying my optimization patch.
> 
> Next I use gzip-1.10 version to test the compression and decompression
> time on some **xml** files:
> [XML]# time ./gzip-1.10 *.xml
> 
> real    0m5.099s
> user    0m4.384s
> sys     0m0.176s
> [XML]# time ./gzip-1.10 -d *.gz
> 
> real    0m2.173s
> user    0m1.821s
> sys     0m0.348s
> 
> Then use the optimized version to compare:
> [XML]# time ./gzip-optimized *.xml
> 
> real    0m2.785s
> user    0m2.576s
> sys     0m0.204s
> [XML]# time ./gzip-optimized -d *.gz
> 
> real    0m0.497s
> user    0m0.176s
> sys     0m0.320s
> 
> 
> The next test object is a large **log** file:
> [LOG]# time ./gzip-1.10 *.log
> 
> real    0m8.883s
> user    0m8.652s
> sys     0m0.217s
> [LOG]# time ./gzip-1.10 -d *.gz
> 
> real    0m3.049s
> user    0m2.604s
> sys     0m0.439s
> 
> Also use the optimized version to compare:
> [LOG]# time ./gzip-optimized *.log
> 
> real    0m6.882s
> user    0m6.607s
> sys     0m0.264s
> [LOG]# time ./gzip-optimized -d *.gz
> 
> real    0m1.054s
> user    0m0.622s
> sys     0m0.431s
> 
> The above experimental data are from the aarch64 platform.
> 

Gentle ping.
: )

-- 
Best regards,
Li Qiang





Message sent to bug-gzip@HIDDEN:


X-Loop: help-debbugs@HIDDEN
Subject: bug#41535: [PATCH] performance optimization for aarch64
Resent-From: Jim Meyering <jim@HIDDEN>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
Resent-CC: bug-gzip@HIDDEN
Resent-Date: Sun, 29 Aug 2021 08:49:01 +0000
Resent-Message-ID: <handler.41535.B41535.163022689221828 <at> debbugs.gnu.org>
Resent-Sender: help-debbugs@HIDDEN
X-GNU-PR-Message: followup 41535
X-GNU-PR-Package: gzip
X-GNU-PR-Keywords: patch
To: Li Qiang <liqiang64@HIDDEN>
Cc: luanjianhai@HIDDEN, Paul Eggert <eggert@HIDDEN>, sangyan@HIDDEN, colordev.jiang@HIDDEN, luchunhua@HIDDEN, 41535 <at> debbugs.gnu.org, huxinwei@HIDDEN, Jim Meyering <meyering@HIDDEN>
Received: via spool by 41535-submit <at> debbugs.gnu.org id=B41535.163022689221828
          (code B ref 41535); Sun, 29 Aug 2021 08:49:01 +0000
Received: (at 41535) by debbugs.gnu.org; 29 Aug 2021 08:48:12 +0000
Received: from localhost ([127.0.0.1]:55481 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1mKGU4-0005fw-Se
	for submit <at> debbugs.gnu.org; Sun, 29 Aug 2021 04:48:12 -0400
Received: from mail-wr1-f44.google.com ([209.85.221.44]:34382)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <meyering@HIDDEN>) id 1mKGTz-0005fN-T6
 for 41535 <at> debbugs.gnu.org; Sun, 29 Aug 2021 04:48:08 -0400
Received: by mail-wr1-f44.google.com with SMTP id h13so17611928wrp.1
 for <41535 <at> debbugs.gnu.org>; Sun, 29 Aug 2021 01:48:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=pTgvO80U3R8U7a3GILKksocEMsWiqzF8wn7g3b4PYu8=;
 b=N5HWVSnYafnu4EEL0ViUOsdwXV8iGgrPI7sx9+to7JkqikIQnq0hEf8lT61tV0QexU
 /AH4M/5owNZifSUasl5wORNYOJmRY9ZSGRJP5SAfVt06H2AC4yF9PeIx3ugl0BgISu1r
 jhcgB1JwuFziSeVM10VUVKYSpZjRg74lWyxmdDYpnAvJLwXg3yqrNHi6Vxf/9qIyPG7f
 GxQeDxhtnVzdiDVwhBqxAzGLymFgb3eZjntGvERljTbbSzS9CtGhXvGtOMxdjqidSuWh
 4z5tEbxpBowVS7NQ9keqgQztsPLdDKt5SXJRMT+rOcWCg1YANFb7iOGjXOSoCSDUt0cU
 TShA==
X-Gm-Message-State: AOAM533NbaopG1k/LTKz1hzOEh2K3ShcxiXLKZtPTqdlrElvJPW9Piv/
 86vVwM15Iu1+B8KtVsKMIVAUucO3y5oTqLCToz0=
X-Google-Smtp-Source: ABdhPJz6wPyKp69/1OOVWpO3ykD3/y6G+fPMn+PonG01u68jvhfRV31xyTiNy/5n9LUYaAwb7qt9+vvlMVUtg+oJy1A=
X-Received: by 2002:a5d:6cab:: with SMTP id a11mr16752642wra.287.1630226877943; 
 Sun, 29 Aug 2021 01:47:57 -0700 (PDT)
MIME-Version: 1.0
References: <20200526023940.1967-1-liqiang64@HIDDEN>
 <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
In-Reply-To: <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Sun, 29 Aug 2021 10:47:45 +0200
Message-ID: <CA+8g5KGOQ-r4o+mV11DuJqcfYcaYPcvQNgoP7P2poWv2k5K90Q@HIDDEN>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.5 (/)
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.5 (/)

On Sat, May 30, 2020 at 11:19 AM Li Qiang <liqiang64@HIDDEN> wrote:
> =E5=9C=A8 2020/5/26 10:39, l00374334 =E5=86=99=E9=81=93:
> > From: liqiang <liqiang64@HIDDEN>
> >
> > By analyzing the compression and decompression process of gzip, I found
> >
> > that the hot spots of CRC32 and longest_match function are very high.
> >
> >
> >
> > On the aarch64 architecture, we can optimize the efficiency of crc32
> >
> > through the interface provided by the neon instruction set (12x faster
> >
> > in aarch64), and optimize the performance of random access code through
> >
> > prefetch instructions (about 5%~8% improvement). In some compression
> >
> > scenarios, loop expansion can also get a certain performance improvemen=
t
> >
> > (about 10%).
> >
> >
> >
> > Modify by Li Qiang.
> >
> > ---
> >  configure | 14 ++++++++++++++
> >  deflate.c | 30 +++++++++++++++++++++++++++++-
> >  util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++

Thank you for that work and sorry for the delay in responding.
However, for now I prefer not to apply it.
I'd prefer to see arch-specific optimizations added to libz in the
hope (perhaps naive) that someone will find time to make gzip use
libz.




Message received at control <at> debbugs.gnu.org:


Received: (at control) by debbugs.gnu.org; 5 Apr 2022 01:36:35 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Apr 04 21:36:35 2022
Received: from localhost ([127.0.0.1]:53428 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1nbY7X-0004FO-0I
	for submit <at> debbugs.gnu.org; Mon, 04 Apr 2022 21:36:35 -0400
Received: from zimbra.cs.ucla.edu ([131.179.128.68]:46330)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <eggert@HIDDEN>) id 1nbY7U-0004F4-VE
 for control <at> debbugs.gnu.org; Mon, 04 Apr 2022 21:36:33 -0400
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id CB30716009A
 for <control <at> debbugs.gnu.org>; Mon,  4 Apr 2022 18:36:26 -0700 (PDT)
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10032)
 with ESMTP id 5oYvPCcQLN8w for <control <at> debbugs.gnu.org>;
 Mon,  4 Apr 2022 18:36:26 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
 by zimbra.cs.ucla.edu (Postfix) with ESMTP id 3B52F160130
 for <control <at> debbugs.gnu.org>; Mon,  4 Apr 2022 18:36:26 -0700 (PDT)
X-Virus-Scanned: amavisd-new at zimbra.cs.ucla.edu
Received: from zimbra.cs.ucla.edu ([127.0.0.1])
 by localhost (zimbra.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10026)
 with ESMTP id ZEoBnrxCVF8D for <control <at> debbugs.gnu.org>;
 Mon,  4 Apr 2022 18:36:26 -0700 (PDT)
Received: from [131.179.64.200] (Penguin.CS.UCLA.EDU [131.179.64.200])
 by zimbra.cs.ucla.edu (Postfix) with ESMTPSA id 1C87816009A
 for <control <at> debbugs.gnu.org>; Mon,  4 Apr 2022 18:36:26 -0700 (PDT)
Message-ID: <ddb4b521-92e0-48d0-2157-eb6ccb8ca9ac@HIDDEN>
Date: Mon, 4 Apr 2022 18:36:25 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.7.0
Content-Language: en-US
To: GNU bug control <control <at> debbugs.gnu.org>
From: Paul Eggert <eggert@HIDDEN>
Subject: gzip bug report maintenance
Organization: UCLA Computer Science Department
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: control
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)

tags 41535 wontfix
tags 39832 wontfix
tags 39831 wontfix





Last modified: Tue, 5 Apr 2022 01:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.