GNU bug report logs - #39832
[PATCH] Optimized the deflate in aarch64

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: gzip; Reported by: Yikun Jiang <yikunkero@HIDDEN>; Keywords: wontfix patch; dated Sat, 29 Feb 2020 10:10:02 UTC; Maintainer for gzip is bug-gzip@HIDDEN.
Added tag(s) wontfix. Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 29 Feb 2020 10:09:53 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Feb 29 05:09:53 2020
Received: from localhost ([127.0.0.1]:34272 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1j7z4C-0001zx-W8
	for submit <at> debbugs.gnu.org; Sat, 29 Feb 2020 05:09:53 -0500
Received: from lists.gnu.org ([209.51.188.17]:53632)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <yikunkero@HIDDEN>) id 1j7yYG-0007Hi-G1
 for submit <at> debbugs.gnu.org; Sat, 29 Feb 2020 04:36:52 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:43883)
 by lists.gnu.org with esmtp (Exim 4.90_1)
 (envelope-from <yikunkero@HIDDEN>) id 1j7yYF-0001S4-5h
 for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:52 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,
 HTML_MESSAGE autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <yikunkero@HIDDEN>) id 1j7yYD-0006cA-Ti
 for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:51 -0500
Received: from mail-lj1-x243.google.com ([2a00:1450:4864:20::243]:40041)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
 (Exim 4.71) (envelope-from <yikunkero@HIDDEN>) id 1j7yYD-0006Yd-LK
 for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:49 -0500
Received: by mail-lj1-x243.google.com with SMTP id 143so6101467ljj.7
 for <bug-gzip@HIDDEN>; Sat, 29 Feb 2020 01:36:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
 h=mime-version:from:date:message-id:subject:to;
 bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=;
 b=l39wt1krDAfRRHcGE/cyNtVQlMwwlmcuIAOhyXWGko+3Ro9FUKTomXxau9PQLchHPk
 HXneW5mAUjhrHM/yPC94ygK9MBPddaVKzChjCYFJeuarNaOIlhle56SttdUG2lyLfNsK
 IJLpPHbIZf9QT2uHc6raA85SiD8ovTwIJ/cgeAC7zosNaG9l3A6n1J00KH5EvjnO/qnO
 3lEphIN4OKbHK+nb5ZOXiv+NHUbdugqrq5E5NkZIRX2lYmmBvXXig0AC0I05IXow8Ifv
 rb003NWZqB0dGk+h+dJwSBXOYOn++4VvJ2tXiU54S9RYfN3cX+ZII9Y9UPO1/fk43iNc
 3Hgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
 bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=;
 b=YlYPnZpQsuVs0XhcaMuU4qAfi5k63s1bqfWM0Tysd0H9VSBCwkJH4lbDpTeFAbTURx
 tkz/3jySiE21bt4JEMaUspzFLLyRPUJsxPDWOXufU9h0xPzEMbAQGwAqUJJoBDa7i+nR
 mUWWPJX9zEJ/LNlIjK+TaY/QssyOR0dETvqaJP7NQt/ReDMITsEvqptauTBxo0d9xzOp
 weuBm0aijo9LtxQ+gdxzRX9mCClOYZc846rVi0S5s33A6gToGv9KBb9pUeMFy8y3W/Km
 T6yYJCy4KwFxDAWaVqCwNFJ8cSMBwQDIwr4Fc9NmSLwyVM/fXwZu2eOEfUKDXg531Ij3
 eTMg==
X-Gm-Message-State: ANhLgQ3B/zRiM09omoSgNOvx6iqYTn19v78R+eIJ8OcQZZowlU0g+TgW
 N3NHpBg+/sHtJDOhjuKgpn3uFqCNLuR217CxHilKadD3ulU=
X-Google-Smtp-Source: ADFU+vvx0Cn/UwxOAXk9ponjemoGfq1y9N7+eB8dm/jq+cjrzNhHdbgSpUi+JLVIQqmwZUsZx+lVawm9C5IXf1vQaEo=
X-Received: by 2002:a2e:80cc:: with SMTP id r12mr5241620ljg.154.1582969008369; 
 Sat, 29 Feb 2020 01:36:48 -0800 (PST)
MIME-Version: 1.0
From: Yikun Jiang <yikunkero@HIDDEN>
Date: Sat, 29 Feb 2020 17:36:37 +0800
Message-ID: <CAArz_dAeJ8FfE4ksbEHnb-W9Be-M1WsuRhLahFK0QQ35HF8V9g@HIDDEN>
Subject: [PATCH] Optimized the deflate in aarch64
To: bug-gzip@HIDDEN
Content-Type: multipart/alternative; boundary="0000000000006f347c059fb3b106"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-Received-From: 2a00:1450:4864:20::243
X-Spam-Score: 0.3 (/)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Sat, 29 Feb 2020 05:09:52 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.7 (/)

--0000000000006f347c059fb3b106
Content-Type: text/plain; charset="UTF-8"

From: Yikun Jiang <yikunkero@HIDDEN>

This patch uses the prefetch instruction to pre-load the
next_match into cache to improve the performance, also makes
an unrolling change to decrease the number of if branch usage.
---
 deflate.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/deflate.c b/deflate.c
index 5ed2a9b..008c032 100644
--- a/deflate.c
+++ b/deflate.c
@@ -378,6 +378,9 @@ longest_match(IPos cur_match)
     register int len;                           /* length of current match
*/
     int best_len = prev_length;                 /* best match length so
far */
     IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST :
NIL;
+#ifdef __aarch64__
+    IPos next_match;
+#endif
     /* Stop when cur_match becomes <= limit. To simplify the code,
      * we prevent matches with the string of window index 0.
      */
@@ -411,6 +414,10 @@ longest_match(IPos cur_match)
     do {
         Assert(cur_match < strstart, "no future");
         match = window + cur_match;
+#ifdef __aarch64__
+        next_match = prev[cur_match & WMASK];
+        __asm__("PRFM   PLDL1STRM, [%0]"::"r"(&(prev[next_match &
WMASK])));
+#endif

         /* Skip to next match if the match length cannot increase
          * or if the match length is less than 2:
@@ -488,8 +495,14 @@ longest_match(IPos cur_match)
             scan_end   = scan[best_len];
 #endif
         }
-    } while ((cur_match = prev[cur_match & WMASK]) > limit
-             && --chain_length != 0);
+    }
+#ifdef __aarch64__
+    while ((cur_match = next_match) > limit
+            && --chain_length != 0);
+#else
+    while ((cur_match = prev[cur_match & WMASK]) > limit
+            && --chain_length != 0);
+#endif

     return best_len;
 }
@@ -777,7 +790,20 @@ deflate (int pack_level)
             lookahead -= prev_length-1;
             prev_length -= 2;
             RSYNC_ROLL(strstart, prev_length+1);
+
+            while (prev_length >= 4) {
+                prev_length -= 4;
+                strstart++;
+                INSERT_STRING(strstart, hash_head);
+                strstart++;
+                INSERT_STRING(strstart, hash_head);
+                strstart++;
+                INSERT_STRING(strstart, hash_head);
+                strstart++;
+                INSERT_STRING(strstart, hash_head);
+            }
             do {
+                if (prev_length == 0) break;
                 strstart++;
                 INSERT_STRING(strstart, hash_head);
                 /* strstart never exceeds WSIZE-MAX_MATCH, so there are
--
2.17.1

--0000000000006f347c059fb3b106
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">From: Yikun Jiang &lt;<a href=3D"mailto:yikunkero@HIDDEN=
m" target=3D"_blank">yikunkero@HIDDEN</a>&gt;<br><br>This patch uses the=
 prefetch instruction to pre-load the<br>next_match into cache to improve t=
he performance, also makes<br>an unrolling change to decrease the number of=
 if branch usage.<br>---<br>=C2=A0deflate.c | 30 ++++++++++++++++++++++++++=
++--<br>=C2=A01 file changed, 28 insertions(+), 2 deletions(-)<br><br>diff =
--git a/deflate.c b/deflate.c<br>index 5ed2a9b..008c032 100644<br>--- a/def=
late.c<br>+++ b/deflate.c<br>@@ -378,6 +378,9 @@ longest_match(IPos cur_mat=
ch)<br>=C2=A0 =C2=A0 =C2=A0register int len;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* length=
 of current match */<br>=C2=A0 =C2=A0 =C2=A0int best_len =3D prev_length;=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* best match=
 length so far */<br>=C2=A0 =C2=A0 =C2=A0IPos limit =3D strstart &gt; (IPos=
)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;<br>+#ifdef __aarch64__<br>+=C2=
=A0 =C2=A0 IPos next_match;<br>+#endif<br>=C2=A0 =C2=A0 =C2=A0/* Stop when =
cur_match becomes &lt;=3D limit. To simplify the code,<br>=C2=A0 =C2=A0 =C2=
=A0 * we prevent matches with the string of window index 0.<br>=C2=A0 =C2=
=A0 =C2=A0 */<br>@@ -411,6 +414,10 @@ longest_match(IPos cur_match)<br>=C2=
=A0 =C2=A0 =C2=A0do {<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Assert(cur_match=
 &lt; strstart, &quot;no future&quot;);<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0match =3D window + cur_match;<br>+#ifdef __aarch64__<br>+=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 next_match =3D prev[cur_match &amp; WMASK];<br>+=C2=A0 =C2=A0=
 =C2=A0 =C2=A0 __asm__(&quot;PRFM=C2=A0 =C2=A0PLDL1STRM, [%0]&quot;::&quot;=
r&quot;(&amp;(prev[next_match &amp; WMASK])));<br>+#endif<br><br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0/* Skip to next match if the match length cannot in=
crease<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * or if the match length is le=
ss than 2:<br>@@ -488,8 +495,14 @@ longest_match(IPos cur_match)<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scan_end=C2=A0 =C2=A0=3D scan[best=
_len];<br>=C2=A0#endif<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>-=C2=A0 =
=C2=A0 } while ((cur_match =3D prev[cur_match &amp; WMASK]) &gt; limit<br>-=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&amp;&amp; --chain_length !=
=3D 0);<br>+=C2=A0 =C2=A0 }<br>+#ifdef __aarch64__<br>+=C2=A0 =C2=A0 while =
((cur_match =3D next_match) &gt; limit<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 &amp;&amp; --chain_length !=3D 0);<br>+#else<br>+=C2=A0 =C2=A0 w=
hile ((cur_match =3D prev[cur_match &amp; WMASK]) &gt; limit<br>+=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &amp;&amp; --chain_length !=3D 0);<br>+#end=
if<br><br>=C2=A0 =C2=A0 =C2=A0return best_len;<br>=C2=A0}<br>@@ -777,7 +790=
,20 @@ deflate (int pack_level)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0lookahead -=3D prev_length-1;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0prev_length -=3D 2;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0RSYNC_ROLL(strstart, prev_length+1);<br>+<br>+=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (prev_length &gt;=3D 4) {<br>+=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 prev_length -=3D 4;<br>+=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;<br>+=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, h=
ash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strs=
tart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_=
STRING(strstart, hash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 strstart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 INSERT_STRING(strstart, hash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, hash_head);<br>+=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A0do {<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 if (prev_length =3D=3D 0) break;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0strstart++;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0INSERT_STRING(strstart, hash_head);<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* strstart neve=
r exceeds WSIZE-MAX_MATCH, so there are<font color=3D"#888888"><br>--<br>2.=
17.1</font>=C2=A0=C2=A0<br></div>

--0000000000006f347c059fb3b106--




Acknowledgement sent to Yikun Jiang <yikunkero@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gzip@HIDDEN. Full text available.
Report forwarded to bug-gzip@HIDDEN:
bug#39832; Package gzip. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Tue, 5 Apr 2022 01:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.