Paul Eggert <eggert@HIDDEN>
to control <at> debbugs.gnu.org
.
Full text available.Received: (at submit) by debbugs.gnu.org; 29 Feb 2020 10:09:53 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Sat Feb 29 05:09:53 2020 Received: from localhost ([127.0.0.1]:34272 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1j7z4C-0001zx-W8 for submit <at> debbugs.gnu.org; Sat, 29 Feb 2020 05:09:53 -0500 Received: from lists.gnu.org ([209.51.188.17]:53632) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <yikunkero@HIDDEN>) id 1j7yYG-0007Hi-G1 for submit <at> debbugs.gnu.org; Sat, 29 Feb 2020 04:36:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:43883) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <yikunkero@HIDDEN>) id 1j7yYF-0001S4-5h for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:52 -0500 X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,FREEMAIL_FROM, HTML_MESSAGE autolearn=disabled version=3.3.2 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from <yikunkero@HIDDEN>) id 1j7yYD-0006cA-Ti for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:51 -0500 Received: from mail-lj1-x243.google.com ([2a00:1450:4864:20::243]:40041) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from <yikunkero@HIDDEN>) id 1j7yYD-0006Yd-LK for bug-gzip@HIDDEN; Sat, 29 Feb 2020 04:36:49 -0500 Received: by mail-lj1-x243.google.com with SMTP id 143so6101467ljj.7 for <bug-gzip@HIDDEN>; Sat, 29 Feb 2020 01:36:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=; b=l39wt1krDAfRRHcGE/cyNtVQlMwwlmcuIAOhyXWGko+3Ro9FUKTomXxau9PQLchHPk HXneW5mAUjhrHM/yPC94ygK9MBPddaVKzChjCYFJeuarNaOIlhle56SttdUG2lyLfNsK IJLpPHbIZf9QT2uHc6raA85SiD8ovTwIJ/cgeAC7zosNaG9l3A6n1J00KH5EvjnO/qnO 3lEphIN4OKbHK+nb5ZOXiv+NHUbdugqrq5E5NkZIRX2lYmmBvXXig0AC0I05IXow8Ifv rb003NWZqB0dGk+h+dJwSBXOYOn++4VvJ2tXiU54S9RYfN3cX+ZII9Y9UPO1/fk43iNc 3Hgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=RjTYvL6S15GHdhO2UzO4B1a4/fbB7aVRofsxUa5IiXY=; b=YlYPnZpQsuVs0XhcaMuU4qAfi5k63s1bqfWM0Tysd0H9VSBCwkJH4lbDpTeFAbTURx tkz/3jySiE21bt4JEMaUspzFLLyRPUJsxPDWOXufU9h0xPzEMbAQGwAqUJJoBDa7i+nR mUWWPJX9zEJ/LNlIjK+TaY/QssyOR0dETvqaJP7NQt/ReDMITsEvqptauTBxo0d9xzOp weuBm0aijo9LtxQ+gdxzRX9mCClOYZc846rVi0S5s33A6gToGv9KBb9pUeMFy8y3W/Km T6yYJCy4KwFxDAWaVqCwNFJ8cSMBwQDIwr4Fc9NmSLwyVM/fXwZu2eOEfUKDXg531Ij3 eTMg== X-Gm-Message-State: ANhLgQ3B/zRiM09omoSgNOvx6iqYTn19v78R+eIJ8OcQZZowlU0g+TgW N3NHpBg+/sHtJDOhjuKgpn3uFqCNLuR217CxHilKadD3ulU= X-Google-Smtp-Source: ADFU+vvx0Cn/UwxOAXk9ponjemoGfq1y9N7+eB8dm/jq+cjrzNhHdbgSpUi+JLVIQqmwZUsZx+lVawm9C5IXf1vQaEo= X-Received: by 2002:a2e:80cc:: with SMTP id r12mr5241620ljg.154.1582969008369; Sat, 29 Feb 2020 01:36:48 -0800 (PST) MIME-Version: 1.0 From: Yikun Jiang <yikunkero@HIDDEN> Date: Sat, 29 Feb 2020 17:36:37 +0800 Message-ID: <CAArz_dAeJ8FfE4ksbEHnb-W9Be-M1WsuRhLahFK0QQ35HF8V9g@HIDDEN> Subject: [PATCH] Optimized the deflate in aarch64 To: bug-gzip@HIDDEN Content-Type: multipart/alternative; boundary="0000000000006f347c059fb3b106" X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 2a00:1450:4864:20::243 X-Spam-Score: 0.3 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Sat, 29 Feb 2020 05:09:52 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -0.7 (/) --0000000000006f347c059fb3b106 Content-Type: text/plain; charset="UTF-8" From: Yikun Jiang <yikunkero@HIDDEN> This patch uses the prefetch instruction to pre-load the next_match into cache to improve the performance, also makes an unrolling change to decrease the number of if branch usage. --- deflate.c | 30 ++++++++++++++++++++++++++++-- 1 file changed, 28 insertions(+), 2 deletions(-) diff --git a/deflate.c b/deflate.c index 5ed2a9b..008c032 100644 --- a/deflate.c +++ b/deflate.c @@ -378,6 +378,9 @@ longest_match(IPos cur_match) register int len; /* length of current match */ int best_len = prev_length; /* best match length so far */ IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL; +#ifdef __aarch64__ + IPos next_match; +#endif /* Stop when cur_match becomes <= limit. To simplify the code, * we prevent matches with the string of window index 0. */ @@ -411,6 +414,10 @@ longest_match(IPos cur_match) do { Assert(cur_match < strstart, "no future"); match = window + cur_match; +#ifdef __aarch64__ + next_match = prev[cur_match & WMASK]; + __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK]))); +#endif /* Skip to next match if the match length cannot increase * or if the match length is less than 2: @@ -488,8 +495,14 @@ longest_match(IPos cur_match) scan_end = scan[best_len]; #endif } - } while ((cur_match = prev[cur_match & WMASK]) > limit - && --chain_length != 0); + } +#ifdef __aarch64__ + while ((cur_match = next_match) > limit + && --chain_length != 0); +#else + while ((cur_match = prev[cur_match & WMASK]) > limit + && --chain_length != 0); +#endif return best_len; } @@ -777,7 +790,20 @@ deflate (int pack_level) lookahead -= prev_length-1; prev_length -= 2; RSYNC_ROLL(strstart, prev_length+1); + + while (prev_length >= 4) { + prev_length -= 4; + strstart++; + INSERT_STRING(strstart, hash_head); + strstart++; + INSERT_STRING(strstart, hash_head); + strstart++; + INSERT_STRING(strstart, hash_head); + strstart++; + INSERT_STRING(strstart, hash_head); + } do { + if (prev_length == 0) break; strstart++; INSERT_STRING(strstart, hash_head); /* strstart never exceeds WSIZE-MAX_MATCH, so there are -- 2.17.1 --0000000000006f347c059fb3b106 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable <div dir=3D"ltr">From: Yikun Jiang <<a href=3D"mailto:yikunkero@HIDDEN= m" target=3D"_blank">yikunkero@HIDDEN</a>><br><br>This patch uses the= prefetch instruction to pre-load the<br>next_match into cache to improve t= he performance, also makes<br>an unrolling change to decrease the number of= if branch usage.<br>---<br>=C2=A0deflate.c | 30 ++++++++++++++++++++++++++= ++--<br>=C2=A01 file changed, 28 insertions(+), 2 deletions(-)<br><br>diff = --git a/deflate.c b/deflate.c<br>index 5ed2a9b..008c032 100644<br>--- a/def= late.c<br>+++ b/deflate.c<br>@@ -378,6 +378,9 @@ longest_match(IPos cur_mat= ch)<br>=C2=A0 =C2=A0 =C2=A0register int len;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* length= of current match */<br>=C2=A0 =C2=A0 =C2=A0int best_len =3D prev_length;= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* best match= length so far */<br>=C2=A0 =C2=A0 =C2=A0IPos limit =3D strstart > (IPos= )MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;<br>+#ifdef __aarch64__<br>+=C2= =A0 =C2=A0 IPos next_match;<br>+#endif<br>=C2=A0 =C2=A0 =C2=A0/* Stop when = cur_match becomes <=3D limit. To simplify the code,<br>=C2=A0 =C2=A0 =C2= =A0 * we prevent matches with the string of window index 0.<br>=C2=A0 =C2= =A0 =C2=A0 */<br>@@ -411,6 +414,10 @@ longest_match(IPos cur_match)<br>=C2= =A0 =C2=A0 =C2=A0do {<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Assert(cur_match= < strstart, "no future");<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0match =3D window + cur_match;<br>+#ifdef __aarch64__<br>+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 next_match =3D prev[cur_match & WMASK];<br>+=C2=A0 =C2=A0= =C2=A0 =C2=A0 __asm__("PRFM=C2=A0 =C2=A0PLDL1STRM, [%0]"::"= r"(&(prev[next_match & WMASK])));<br>+#endif<br><br>=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0/* Skip to next match if the match length cannot in= crease<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * or if the match length is le= ss than 2:<br>@@ -488,8 +495,14 @@ longest_match(IPos cur_match)<br>=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0scan_end=C2=A0 =C2=A0=3D scan[best= _len];<br>=C2=A0#endif<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0}<br>-=C2=A0 = =C2=A0 } while ((cur_match =3D prev[cur_match & WMASK]) > limit<br>-= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0&& --chain_length != =3D 0);<br>+=C2=A0 =C2=A0 }<br>+#ifdef __aarch64__<br>+=C2=A0 =C2=A0 while = ((cur_match =3D next_match) > limit<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 && --chain_length !=3D 0);<br>+#else<br>+=C2=A0 =C2=A0 w= hile ((cur_match =3D prev[cur_match & WMASK]) > limit<br>+=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 && --chain_length !=3D 0);<br>+#end= if<br><br>=C2=A0 =C2=A0 =C2=A0return best_len;<br>=C2=A0}<br>@@ -777,7 +790= ,20 @@ deflate (int pack_level)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0lookahead -=3D prev_length-1;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0prev_length -=3D 2;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0RSYNC_ROLL(strstart, prev_length+1);<br>+<br>+=C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 while (prev_length >=3D 4) {<br>+=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 prev_length -=3D 4;<br>+=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;<br>+=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, h= ash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strs= tart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_= STRING(strstart, hash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 strstart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 INSERT_STRING(strstart, hash_head);<br>+=C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 strstart++;<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 INSERT_STRING(strstart, hash_head);<br>+=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 }<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= =C2=A0 =C2=A0do {<br>+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 if (prev_length =3D=3D 0) break;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0strstart++;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0INSERT_STRING(strstart, hash_head);<br>=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* strstart neve= r exceeds WSIZE-MAX_MATCH, so there are<font color=3D"#888888"><br>--<br>2.= 17.1</font>=C2=A0=C2=A0<br></div> --0000000000006f347c059fb3b106--
Yikun Jiang <yikunkero@HIDDEN>
:bug-gzip@HIDDEN
.
Full text available.bug-gzip@HIDDEN
:bug#39832
; Package gzip
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.