Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org.
Full text available.
Received: (at submit) by debbugs.gnu.org; 29 Feb 2020 10:09:53 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat Feb 29 05:09:53 2020
Received: from localhost ([127.0.0.1]:34272 helo=debbugs.gnu.org)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
id 1j7z4C-0001zx-W8
for submit <at> debbugs.gnu.org; Sat, 29 Feb 2020 05:09:53 -0500
Received: from lists.gnu.org ([209.51.188.17]:53632)
by debbugs.gnu.org with esmtp (Exim 4.84_2)
(envelope-from <yikunkero@HIDDEN>) id 1j7yYG-0007Hi-G1
for submit <at> debbugs.gnu.org; Sat, 29 Feb 2020 04:36:52 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10]:43883)
by lists.gnu.org with esmtp (Exim 4.90_1)
(envelope-from <yikunkero@HIDDEN>) id 1j7yYF-0001S4-5h
for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:52 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level:
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM,
HTML_MESSAGE autolearn=disabled version=3.3.2
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
(envelope-from <yikunkero@HIDDEN>) id 1j7yYD-0006cA-Ti
for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:51 -0500
Received: from mail-lj1-x243.google.com ([2a00:1450:4864:20::243]:40041)
by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
(Exim 4.71) (envelope-from <yikunkero@HIDDEN>) id 1j7yYD-0006Yd-LK
for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:49 -0500
Received: by mail-lj1-x243.google.com with SMTP id 143so6101467ljj.7
for <bug-gzip@HIDDEN>; Sat, 29 Feb 2020 01:36:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=mime-version:from:date:message-id:subject:to;
bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=;
b=l39wt1krDAfRRHcGE/cyNtVQlMwwlmcuIAOhyXWGko+3Ro9FUKTomXxau9PQLchHPk
HXneW5mAUjhrHM/yPC94ygK9MBPddaVKzChjCYFJeuarNaOIlhle56SttdUG2lyLfNsK
IJLpPHbIZf9QT2uHc6raA85SiD8ovTwIJ/cgeAC7zosNaG9l3A6n1J00KH5EvjnO/qnO
3lEphIN4OKbHK+nb5ZOXiv+NHUbdugqrq5E5NkZIRX2lYmmBvXXig0AC0I05IXow8Ifv
rb003NWZqB0dGk+h+dJwSBXOYOn++4VvJ2tXiU54S9RYfN3cX+ZII9Y9UPO1/fk43iNc
3Hgg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:mime-version:from:date:message-id:subject:to;
bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=;
b=YlYPnZpQsuVs0XhcaMuU4qAfi5k63s1bqfWM0Tysd0H9VSBCwkJH4lbDpTeFAbTURx
tkz/3jySiE21bt4JEMaUspzFLLyRPUJsxPDWOXufU9h0xPzEMbAQGwAqUJJoBDa7i+nR
mUWWPJX9zEJ/LNlIjK+TaY/QssyOR0dETvqaJP7NQt/ReDMITsEvqptauTBxo0d9xzOp
weuBm0aijo9LtxQ+gdxzRX9mCClOYZc846rVi0S5s33A6gToGv9KBb9pUeMFy8y3W/Km
T6yYJCy4KwFxDAWaVqCwNFJ8cSMBwQDIwr4Fc9NmSLwyVM/fXwZu2eOEfUKDXg531Ij3
eTMg==
X-Gm-Message-State: ANhLgQ3B/zRiM09omoSgNOvx6iqYTn19v78R+eIJ8OcQZZowlU0g+TgW
N3NHpBg+/sHtJDOhjuKgpn3uFqCNLuR217CxHilKadD3ulU=
X-Google-Smtp-Source: ADFU+vvx0Cn/UwxOAXk9ponjemoGfq1y9N7+eB8dm/jq+cjrzNhHdbgSpUi+JLVIQqmwZUsZx+lVawm9C5IXf1vQaEo=
X-Received: by 2002:a2e:80cc:: with SMTP id r12mr5241620ljg.154.1582969008369;
Sat, 29 Feb 2020 01:36:48 -0800 (PST)
MIME-Version: 1.0
From: Yikun Jiang <yikunkero@HIDDEN>
Date: Sat, 29 Feb 2020 17:36:37 +0800
Message-ID: <CAArz_dAeJ8FfE4ksbEHnb-W9Be-M1WsuRhLahFK0QQ35HF8V9g@HIDDEN>
Subject: [PATCH] Optimized the deflate in aarch64
To: bug-gzip@HIDDEN
Content-Type: multipart/alternative; boundary="0000000000006f347c059fb3b106"
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
recognized.
X-Received-From: 2a00:1450:4864:20::243
X-Spam-Score: 0.3 (/)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Sat, 29 Feb 2020 05:09:52 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.7 (/)
--0000000000006f347c059fb3b106
Content-Type: text/plain; charset="UTF-8"
From: Yikun Jiang <yikunkero@HIDDEN>
This patch uses the prefetch instruction to pre-load the
next_match into cache to improve the performance, also makes
an unrolling change to decrease the number of if branch usage.
---
deflate.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/deflate.c b/deflate.c
index 5ed2a9b..008c032 100644
--- a/deflate.c
+++ b/deflate.c
@@ -378,6 +378,9 @@ longest_match(IPos cur_match)
register int len; /* length of current match
*/
int best_len = prev_length; /* best match length so
far */
IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST :
NIL;
+#ifdef __aarch64__
+ IPos next_match;
+#endif
/* Stop when cur_match becomes <= limit. To simplify the code,
* we prevent matches with the string of window index 0.
*/
@@ -411,6 +414,10 @@ longest_match(IPos cur_match)
do {
Assert(cur_match < strstart, "no future");
match = window + cur_match;
+#ifdef __aarch64__
+ next_match = prev[cur_match & WMASK];
+ __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match &
WMASK])));
+#endif
/* Skip to next match if the match length cannot increase
* or if the match length is less than 2:
@@ -488,8 +495,14 @@ longest_match(IPos cur_match)
scan_end = scan[best_len];
#endif
}
- } while ((cur_match = prev[cur_match & WMASK]) > limit
- && --chain_length != 0);
+ }
+#ifdef __aarch64__
+ while ((cur_match = next_match) > limit
+ && --chain_length != 0);
+#else
+ while ((cur_match = prev[cur_match & WMASK]) > limit
+ && --chain_length != 0);
+#endif
return best_len;
}
@@ -777,7 +790,20 @@ deflate (int pack_level)
lookahead -= prev_length-1;
prev_length -= 2;
RSYNC_ROLL(strstart, prev_length+1);
+
+ while (prev_length >= 4) {
+ prev_length -= 4;
+ strstart++;
+ INSERT_STRING(strstart, hash_head);
+ strstart++;
+ INSERT_STRING(strstart, hash_head);
+ strstart++;
+ INSERT_STRING(strstart, hash_head);
+ strstart++;
+ INSERT_STRING(strstart, hash_head);
+ }
do {
+ if (prev_length == 0) break;
strstart++;
INSERT_STRING(strstart, hash_head);
/* strstart never exceeds WSIZE-MAX_MATCH, so there are
--
2.17.1
--0000000000006f347c059fb3b106
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">From: Yikun Jiang <<a href=3D"mailto:yikunkero@HIDDEN=
m" target=3D"_blank">yikunkero@HIDDEN</a>><br><br>This patch uses the=
prefetch instruction to pre-load the<br>next_match into cache to improve t=
he performance, also makes<br>an unrolling change to decrease the number of=
if branch usage.<br>---<br>=C2=A0deflate.c | 30 ++++++++++++++++++++++++++=
++--<br>=C2=A01 file changed, 28 insertions(+), 2 deletions(-)<br><br>diff =
--git a/deflate.c b/deflate.c<br>index 5ed2a9b..008c032 100644<br>--- a/def=
late.c<br>+++ b/deflate.c<br>@@ -378,6 +378,9 @@ longest_match(IPos cur_mat=
ch)<br>=C2=A0 =C2=A0 =C2=A0register int len;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* length=
of current match */<br>=C2=A0 =C2=A0 =C2=A0int best_len =3D prev_length;=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* best match=
length so far */<br>=C2=A0 =C2=A0 =C2=A0IPos limit =3D strstart > (IPos=
)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;<br>+#ifdef __aarch64__<br>+=C2=
=A0 =C2=A0 IPos next_match;<br>+#endif<br>=C2=A0 =C2=A0 =C2=A0/* Stop when =
cur_match becomes <=3D limit. To simplify the code,<br>=C2=A0 =C2=A0 =C2=
=A0 * we prevent matches with the string of window index 0.<br>=C2=A0 =C2=
=A0 =C2=A0 */<br>@@ -411,6 +414,10 @@ longest_match(IPos cur_match)<br>=C2=
=A0 =C2=A0 =C2=A0do {<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Assert(cur_match=
< strstart, "no future");<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0match =3D window + cur_match;<br>+#ifdef __aarch64__<br>+=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 next_match =3D prev[cur_match & WMASK];<br>+=C2=A0 =C2=A0=
=C2=A0 =C2=A0 __asm__("PRFM=C2=A0 =C2=A0PLDL1STRM, [%0]"::"=
r"(&(prev[next_match & WMASK])));<br>+#endif<br><br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0/* Skip to next match if the match length cannot in=
crease<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * or if the match length is le=
ss than 2:<br>@@ -488,8 +495,14 @@ longest_match(IPos cur_match)<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scan_end=C2=A0 =C2=A0=3D scan[best=
_len];<br>=C2=A0#endif<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>-=C2=A0 =
=C2=A0 } while ((cur_match =3D prev[cur_match & WMASK]) > limit<br>-=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&& --chain_length !=
=3D 0);<br>+=C2=A0 =C2=A0 }<br>+#ifdef __aarch64__<br>+=C2=A0 =C2=A0 while =
((cur_match =3D next_match) > limit<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 && --chain_length !=3D 0);<br>+#else<br>+=C2=A0 =C2=A0 w=
hile ((cur_match =3D prev[cur_match & WMASK]) > limit<br>+=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 && --chain_length !=3D 0);<br>+#end=
if<br><br>=C2=A0 =C2=A0 =C2=A0return best_len;<br>=C2=A0}<br>@@ -777,7 +790=
,20 @@ deflate (int pack_level)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0lookahead -=3D prev_length-1;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0prev_length -=3D 2;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0RSYNC_ROLL(strstart, prev_length+1);<br>+<br>+=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 while (prev_length >=3D 4) {<br>+=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 prev_length -=3D 4;<br>+=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;<br>+=C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, h=
ash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strs=
tart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_=
STRING(strstart, hash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 strstart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 INSERT_STRING(strstart, hash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, hash_head);<br>+=C2=A0=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
=C2=A0 =C2=A0do {<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 if (prev_length =3D=3D 0) break;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0strstart++;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0INSERT_STRING(strstart, hash_head);<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* strstart neve=
r exceeds WSIZE-MAX_MATCH, so there are<font color=3D"#888888"><br>--<br>2.=
17.1</font>=C2=A0=C2=A0<br></div>
--0000000000006f347c059fb3b106--
Yikun Jiang <yikunkero@HIDDEN>:bug-gzip@HIDDEN.
Full text available.bug-gzip@HIDDEN:bug#39832; Package gzip.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.