GNU bug report logs - #41535
[PATCH] performance optimization for aarch64

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: gzip; Reported by: l00374334 <liqiang64@HIDDEN>; Keywords: patch wontfix; dated Tue, 26 May 2020 05:18:02 UTC; Maintainer for gzip is bug-gzip@HIDDEN.

Added tag(s) wontfix. Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 41535 <at> debbugs.gnu.org:

Received: (at 41535) by debbugs.gnu.org; 29 Aug 2021 08:48:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Aug 29 04:48:12 2021
Received: from localhost ([127.0.0.1]:55481 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1mKGU4-0005fw-Se
	for submit <at> debbugs.gnu.org; Sun, 29 Aug 2021 04:48:12 -0400
Received: from mail-wr1-f44.google.com ([209.85.221.44]:34382)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <meyering@HIDDEN>) id 1mKGTz-0005fN-T6
 for 41535 <at> debbugs.gnu.org; Sun, 29 Aug 2021 04:48:08 -0400
Received: by mail-wr1-f44.google.com with SMTP id h13so17611928wrp.1
 for <41535 <at> debbugs.gnu.org>; Sun, 29 Aug 2021 01:48:03 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc:content-transfer-encoding;
 bh=pTgvO80U3R8U7a3GILKksocEMsWiqzF8wn7g3b4PYu8=;
 b=N5HWVSnYafnu4EEL0ViUOsdwXV8iGgrPI7sx9+to7JkqikIQnq0hEf8lT61tV0QexU
 /AH4M/5owNZifSUasl5wORNYOJmRY9ZSGRJP5SAfVt06H2AC4yF9PeIx3ugl0BgISu1r
 jhcgB1JwuFziSeVM10VUVKYSpZjRg74lWyxmdDYpnAvJLwXg3yqrNHi6Vxf/9qIyPG7f
 GxQeDxhtnVzdiDVwhBqxAzGLymFgb3eZjntGvERljTbbSzS9CtGhXvGtOMxdjqidSuWh
 4z5tEbxpBowVS7NQ9keqgQztsPLdDKt5SXJRMT+rOcWCg1YANFb7iOGjXOSoCSDUt0cU
 TShA==
X-Gm-Message-State: AOAM533NbaopG1k/LTKz1hzOEh2K3ShcxiXLKZtPTqdlrElvJPW9Piv/
 86vVwM15Iu1+B8KtVsKMIVAUucO3y5oTqLCToz0=
X-Google-Smtp-Source: ABdhPJz6wPyKp69/1OOVWpO3ykD3/y6G+fPMn+PonG01u68jvhfRV31xyTiNy/5n9LUYaAwb7qt9+vvlMVUtg+oJy1A=
X-Received: by 2002:a5d:6cab:: with SMTP id a11mr16752642wra.287.1630226877943; 
 Sun, 29 Aug 2021 01:47:57 -0700 (PDT)
MIME-Version: 1.0
References: <20200526023940.1967-1-liqiang64@HIDDEN>
 <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
In-Reply-To: <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
From: Jim Meyering <jim@HIDDEN>
Date: Sun, 29 Aug 2021 10:47:45 +0200
Message-ID: <CA+8g5KGOQ-r4o+mV11DuJqcfYcaYPcvQNgoP7P2poWv2k5K90Q@HIDDEN>
Subject: Re: bug#41535: [PATCH] performance optimization for aarch64
To: Li Qiang <liqiang64@HIDDEN>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 0.5 (/)
X-Debbugs-Envelope-To: 41535
Cc: luanjianhai@HIDDEN, Paul Eggert <eggert@HIDDEN>,
 sangyan@HIDDEN, colordev.jiang@HIDDEN, luchunhua@HIDDEN,
 41535 <at> debbugs.gnu.org, huxinwei@HIDDEN, Jim Meyering <meyering@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.5 (/)

On Sat, May 30, 2020 at 11:19 AM Li Qiang <liqiang64@HIDDEN> wrote:
> =E5=9C=A8 2020/5/26 10:39, l00374334 =E5=86=99=E9=81=93:
> > From: liqiang <liqiang64@HIDDEN>
> >
> > By analyzing the compression and decompression process of gzip, I found
> >
> > that the hot spots of CRC32 and longest_match function are very high.
> >
> >
> >
> > On the aarch64 architecture, we can optimize the efficiency of crc32
> >
> > through the interface provided by the neon instruction set (12x faster
> >
> > in aarch64), and optimize the performance of random access code through
> >
> > prefetch instructions (about 5%~8% improvement). In some compression
> >
> > scenarios, loop expansion can also get a certain performance improvemen=
t
> >
> > (about 10%).
> >
> >
> >
> > Modify by Li Qiang.
> >
> > ---
> >  configure | 14 ++++++++++++++
> >  deflate.c | 30 +++++++++++++++++++++++++++++-
> >  util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++

Thank you for that work and sorry for the delay in responding.
However, for now I prefer not to apply it.
I'd prefer to see arch-specific optimizations added to libz in the
hope (perhaps naive) that someone will find time to make gzip use
libz.

Information forwarded to bug-gzip@HIDDEN:
bug#41535; Package gzip. Full text available.

Message received at 41535 <at> debbugs.gnu.org:

Received: (at 41535) by debbugs.gnu.org; 20 Aug 2020 08:55:42 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Aug 20 04:55:42 2020
Received: from localhost ([127.0.0.1]:41602 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1k8gMH-0004AS-Qx
	for submit <at> debbugs.gnu.org; Thu, 20 Aug 2020 04:55:42 -0400
Received: from szxga06-in.huawei.com ([45.249.212.32]:56900 helo=huawei.com)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <liqiang64@HIDDEN>) id 1k8gMC-0004AA-RR
 for 41535 <at> debbugs.gnu.org; Thu, 20 Aug 2020 04:55:40 -0400
Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.58])
 by Forcepoint Email with ESMTP id 8B77EE9DAA5AC3580D5D;
 Thu, 20 Aug 2020 16:55:28 +0800 (CST)
Received: from [127.0.0.1] (10.108.234.107) by DGGEMS404-HUB.china.huawei.com
 (10.3.19.204) with Microsoft SMTP Server id 14.3.487.0;
 Thu, 20 Aug 2020 16:55:27 +0800
Subject: Re: bug#41535: [PATCH] performance optimization for aarch64
From: Li Qiang <liqiang64@HIDDEN>
To: <41535 <at> debbugs.gnu.org>
References: <20200526023940.1967-1-liqiang64@HIDDEN>
 <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
Message-ID: <5e6aa834-6dc0-4465-5807-088eb53abd05@HIDDEN>
Date: Thu, 20 Aug 2020 16:55:26 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
 Thunderbird/68.8.1
MIME-Version: 1.0
In-Reply-To: <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
Content-Type: text/plain; charset="gbk"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.108.234.107]
X-CFilter-Loop: Reflected
X-Spam-Score: -3.7 (---)
X-Debbugs-Envelope-To: 41535
Cc: meyering@HIDDEN, eggert@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.7 (----)



�� 2020/5/30 17:17, Li Qiang д��:
> 
> 
> �� 2020/5/26 10:39, l00374334 д��:
>> From: liqiang <liqiang64@HIDDEN>
>>
>> By analyzing the compression and decompression process of gzip, I found 
>>
>> that the hot spots of CRC32 and longest_match function are very high.
>>
>>
>>
>> On the aarch64 architecture, we can optimize the efficiency of crc32 
>>
>> through the interface provided by the neon instruction set (12x faster 
>>
>> in aarch64), and optimize the performance of random access code through 
>>
>> prefetch instructions (about 5%~8% improvement). In some compression 
>>
>> scenarios, loop expansion can also get a certain performance improvement 
>>
>> (about 10%).
>>
>>
>>
>> Modify by Li Qiang.
>>
>> ---
>>  configure | 14 ++++++++++++++
>>  deflate.c | 30 +++++++++++++++++++++++++++++-
>>  util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 88 insertions(+), 1 deletion(-)
>>
>> diff --git a/configure b/configure
>> index cab3daf..dc80cb6 100644
>> --- a/configure
>> +++ b/configure
>> @@ -14555,6 +14555,20 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>>             ;;
>>  
>>           arm* | aarch64 )
>> +           cat confdefs.h - <<_ACEOF >conftest.$ac_ext
>> +/* end confdefs.h.  */
>> +#if defined __ARM_NEON__ || defined __ARM_NEON
>> +                   int ok;
>> +                  #else
>> +                   error fail
>> +                  #endif
>> +
>> +_ACEOF
>> +if ac_fn_c_try_compile "$LINENO"
>> +then :
>> +  CFLAGS="$CFLAGS -march=armv8-a+crc"
>> +fi
>> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>>             # Assume arm with EABI.
>>             # On arm64 systems, the C compiler may be generating code in one of
>>             # these ABIs:
>> diff --git a/deflate.c b/deflate.c
>> index 9d379e9..ee77ffd 100644
>> --- a/deflate.c
>> +++ b/deflate.c
>> @@ -378,6 +378,9 @@ longest_match(IPos cur_match)
>>      register int len;                           /* length of current match */
>>
>>      int best_len = prev_length;                 /* best match length so far */
>>
>>      IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;
>>
>> +#ifdef __aarch64__
>>
>> +    IPos next_match;
>>
>> +#endif
>>
>>      /* Stop when cur_match becomes <= limit. To simplify the code,
>>
>>       * we prevent matches with the string of window index 0.
>>
>>       */
>>
>> @@ -411,6 +414,10 @@ longest_match(IPos cur_match)
>>      do {
>>
>>          Assert(cur_match < strstart, "no future");
>>
>>          match = window + cur_match;
>>
>> +#ifdef __aarch64__
>>
>> +        next_match = prev[cur_match & WMASK];
>>
>> +        __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK])));
>>
>> +#endif
>>
>>  
>>
>>          /* Skip to next match if the match length cannot increase
>>
>>           * or if the match length is less than 2:
>>
>> @@ -488,8 +495,14 @@ longest_match(IPos cur_match)
>>              scan_end   = scan[best_len];
>>
>>  #endif
>>
>>          }
>>
>> -    } while ((cur_match = prev[cur_match & WMASK]) > limit
>>
>> +    }
>>
>> +#ifdef __aarch64__
>>
>> +    while ((cur_match = next_match) > limit
>>
>> +             && --chain_length != 0);
>>
>> +#else
>>
>> +    while ((cur_match = prev[cur_match & WMASK]) > limit
>>
>>               && --chain_length != 0);
>>
>> +#endif
>>
>>  
>>
>>      return best_len;
>>
>>  }
>>
>> @@ -777,7 +790,22 @@ deflate (int pack_level)
>>              lookahead -= prev_length-1;
>>
>>              prev_length -= 2;
>>
>>              RSYNC_ROLL(strstart, prev_length+1);
>>
>> +            while (prev_length >= 4) {
>>
>> +                /* After actual verification, expanding this loop
>>
>> +                 * can improve its performance in certain scenarios.
>>
>> +                 */
>>
>> +                prev_length -= 4;
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +                strstart++;
>>
>> +                INSERT_STRING(strstart, hash_head);
>>
>> +            }
>>
>>              do {
>>
>> +                if (prev_length == 0) break;
>>
>>                  strstart++;
>>
>>                  INSERT_STRING(strstart, hash_head);
>>
>>                  /* strstart never exceeds WSIZE-MAX_MATCH, so there are
>>
>> diff --git a/util.c b/util.c
>> index 0a0fc21..c9f0e52 100644
>> --- a/util.c
>> +++ b/util.c
>> @@ -38,6 +38,12 @@
>>  
>>
>>  static int write_buffer (int, voidp, unsigned int);
>>
>>  
>>
>> +#if defined __ARM_NEON__ || defined __ARM_NEON
>>
>> +#define CRC32D(crc, val) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#define CRC32W(crc, val) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#define CRC32H(crc, val) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#define CRC32B(crc, val) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
>>
>> +#else
>>
>>  /* ========================================================================
>>
>>   * Table of CRC-32's of all single-byte values (made by makecrc.c)
>>
>>   */
>>
>> @@ -95,6 +101,7 @@ static const ulg crc_32_tab[] = {
>>    0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
>>
>>    0x2d02ef8dL
>>
>>  };
>>
>> +#endif
>>
>>  
>>
>>  /* Shift register contents.  */
>>
>>  static ulg crc = 0xffffffffL;
>>
>> @@ -132,6 +139,43 @@ ulg updcrc(s, n)
>>      const uch *s;           /* pointer to bytes to pump through */
>>
>>      unsigned n;             /* number of bytes in s[] */
>>
>>  {
>>
>> +#if defined __ARM_NEON__ || defined __ARM_NEON
>>
>> +    register ulg c;
>>
>> +    static ulg crc = (ulg)0xffffffffL;
>>
>> +    register const uint8_t  *buf1;
>>
>> +    register const uint16_t *buf2;
>>
>> +    register const uint32_t *buf4;
>>
>> +    register const uint64_t *buf8;
>>
>> +    int64_t length = (int64_t)n;
>>
>> +    buf8 = (const  uint64_t *)(const void *)s;
>>
>> +
>>
>> +    if (s == NULL) {
>>
>> +        c = 0xffffffffL;
>>
>> +    } else {
>>
>> +        c = crc;
>>
>> +        while(length >= sizeof(uint64_t)) {
>>
>> +            CRC32D(c, *buf8++);
>>
>> +            length -= sizeof(uint64_t);
>>
>> +        }
>>
>> +        buf4 = (const uint32_t *)(const void *)buf8;
>>
>> +        if (length >= sizeof(uint32_t)) {
>>
>> +            CRC32W(c, *buf4++);
>>
>> +            length -= sizeof(uint32_t);
>>
>> +        }
>>
>> +        buf2 = (const uint16_t *)(const void *)buf4;
>>
>> +        if(length >= sizeof(uint16_t)) {
>>
>> +            CRC32H(c, *buf2++);
>>
>> +            length -= sizeof(uint16_t);
>>
>> +        }
>>
>> +        buf1 = (const uint8_t *)(const void *)buf2;
>>
>> +        if (length >= sizeof(uint8_t)) {
>>
>> +            CRC32B(c, *buf1);
>>
>> +            length -= sizeof(uint8_t);
>>
>> +        }
>>
>> +    }
>>
>> +    crc = c;
>>
>> +    return (c ^ 0xffffffffL);
>>
>> +#else
>>
>>      register ulg c;         /* temporary variable */
>>
>>  
>>
>>      if (s == NULL) {
>>
>> @@ -144,6 +188,7 @@ ulg updcrc(s, n)
>>      }
>>
>>      crc = c;
>>
>>      return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) */
>>
>> +#endif
>>
>>  }
>>
>>  
>>
>>  /* Return a current CRC value.  */
>>
> 
> Please allow me to show a set of actual test data for this patch.
> 
> First, I made an original version of the program "gzip-1.10" based
> on the gzip-1.10 source code, and then made an optimized version of
> the program "gzip-optimized" after applying my optimization patch.
> 
> Next I use gzip-1.10 version to test the compression and decompression
> time on some **xml** files:
> [XML]# time ./gzip-1.10 *.xml
> 
> real    0m5.099s
> user    0m4.384s
> sys     0m0.176s
> [XML]# time ./gzip-1.10 -d *.gz
> 
> real    0m2.173s
> user    0m1.821s
> sys     0m0.348s
> 
> Then use the optimized version to compare:
> [XML]# time ./gzip-optimized *.xml
> 
> real    0m2.785s
> user    0m2.576s
> sys     0m0.204s
> [XML]# time ./gzip-optimized -d *.gz
> 
> real    0m0.497s
> user    0m0.176s
> sys     0m0.320s
> 
> 
> The next test object is a large **log** file:
> [LOG]# time ./gzip-1.10 *.log
> 
> real    0m8.883s
> user    0m8.652s
> sys     0m0.217s
> [LOG]# time ./gzip-1.10 -d *.gz
> 
> real    0m3.049s
> user    0m2.604s
> sys     0m0.439s
> 
> Also use the optimized version to compare:
> [LOG]# time ./gzip-optimized *.log
> 
> real    0m6.882s
> user    0m6.607s
> sys     0m0.264s
> [LOG]# time ./gzip-optimized -d *.gz
> 
> real    0m1.054s
> user    0m0.622s
> sys     0m0.431s
> 
> The above experimental data are from the aarch64 platform.
> 

Gentle ping.
: )

-- 
Best regards,
Li Qiang

Information forwarded to bug-gzip@HIDDEN:
bug#41535; Package gzip. Full text available.

Message received at 41535 <at> debbugs.gnu.org:

Received: (at 41535) by debbugs.gnu.org; 30 May 2020 09:18:12 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sat May 30 05:18:12 2020
Received: from localhost ([127.0.0.1]:56954 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jexd5-0001Z7-Hd
	for submit <at> debbugs.gnu.org; Sat, 30 May 2020 05:18:11 -0400
Received: from szxga05-in.huawei.com ([45.249.212.191]:2296 helo=huawei.com)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <liqiang64@HIDDEN>) id 1jexd2-0001Yf-RA
 for 41535 <at> debbugs.gnu.org; Sat, 30 May 2020 05:18:09 -0400
Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60])
 by Forcepoint Email with ESMTP id 2696B6668135339376EC;
 Sat, 30 May 2020 17:18:01 +0800 (CST)
Received: from [127.0.0.1] (10.108.222.92) by DGGEMS409-HUB.china.huawei.com
 (10.3.19.209) with Microsoft SMTP Server id 14.3.487.0; Sat, 30 May 2020
 17:17:51 +0800
Subject: bug#41535: [PATCH] performance optimization for aarch64
To: <41535 <at> debbugs.gnu.org>
References: <20200526023940.1967-1-liqiang64@HIDDEN>
From: Li Qiang <liqiang64@HIDDEN>
Message-ID: <ac086349-18f9-9ead-11ea-fb0b55d15974@HIDDEN>
Date: Sat, 30 May 2020 17:17:49 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101
 Thunderbird/68.8.1
MIME-Version: 1.0
In-Reply-To: <20200526023940.1967-1-liqiang64@HIDDEN>
Content-Type: text/plain; charset="gbk"
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.108.222.92]
X-CFilter-Loop: Reflected
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 41535
Cc: luanjianhai@HIDDEN, eggert@HIDDEN, sangyan@HIDDEN,
 colordev.jiang@HIDDEN, luchunhua@HIDDEN, huxinwei@HIDDEN,
 meyering@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -3.3 (---)



�� 2020/5/26 10:39, l00374334 д��:
> From: liqiang <liqiang64@HIDDEN>
> 
> By analyzing the compression and decompression process of gzip, I found 
> 
> that the hot spots of CRC32 and longest_match function are very high.
> 
> 
> 
> On the aarch64 architecture, we can optimize the efficiency of crc32 
> 
> through the interface provided by the neon instruction set (12x faster 
> 
> in aarch64), and optimize the performance of random access code through 
> 
> prefetch instructions (about 5%~8% improvement). In some compression 
> 
> scenarios, loop expansion can also get a certain performance improvement 
> 
> (about 10%).
> 
> 
> 
> Modify by Li Qiang.
> 
> ---
>  configure | 14 ++++++++++++++
>  deflate.c | 30 +++++++++++++++++++++++++++++-
>  util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/configure b/configure
> index cab3daf..dc80cb6 100644
> --- a/configure
> +++ b/configure
> @@ -14555,6 +14555,20 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>             ;;
>  
>           arm* | aarch64 )
> +           cat confdefs.h - <<_ACEOF >conftest.$ac_ext
> +/* end confdefs.h.  */
> +#if defined __ARM_NEON__ || defined __ARM_NEON
> +                   int ok;
> +                  #else
> +                   error fail
> +                  #endif
> +
> +_ACEOF
> +if ac_fn_c_try_compile "$LINENO"
> +then :
> +  CFLAGS="$CFLAGS -march=armv8-a+crc"
> +fi
> +rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
>             # Assume arm with EABI.
>             # On arm64 systems, the C compiler may be generating code in one of
>             # these ABIs:
> diff --git a/deflate.c b/deflate.c
> index 9d379e9..ee77ffd 100644
> --- a/deflate.c
> +++ b/deflate.c
> @@ -378,6 +378,9 @@ longest_match(IPos cur_match)
>      register int len;                           /* length of current match */
> 
>      int best_len = prev_length;                 /* best match length so far */
> 
>      IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;
> 
> +#ifdef __aarch64__
> 
> +    IPos next_match;
> 
> +#endif
> 
>      /* Stop when cur_match becomes <= limit. To simplify the code,
> 
>       * we prevent matches with the string of window index 0.
> 
>       */
> 
> @@ -411,6 +414,10 @@ longest_match(IPos cur_match)
>      do {
> 
>          Assert(cur_match < strstart, "no future");
> 
>          match = window + cur_match;
> 
> +#ifdef __aarch64__
> 
> +        next_match = prev[cur_match & WMASK];
> 
> +        __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK])));
> 
> +#endif
> 
>  
> 
>          /* Skip to next match if the match length cannot increase
> 
>           * or if the match length is less than 2:
> 
> @@ -488,8 +495,14 @@ longest_match(IPos cur_match)
>              scan_end   = scan[best_len];
> 
>  #endif
> 
>          }
> 
> -    } while ((cur_match = prev[cur_match & WMASK]) > limit
> 
> +    }
> 
> +#ifdef __aarch64__
> 
> +    while ((cur_match = next_match) > limit
> 
> +             && --chain_length != 0);
> 
> +#else
> 
> +    while ((cur_match = prev[cur_match & WMASK]) > limit
> 
>               && --chain_length != 0);
> 
> +#endif
> 
>  
> 
>      return best_len;
> 
>  }
> 
> @@ -777,7 +790,22 @@ deflate (int pack_level)
>              lookahead -= prev_length-1;
> 
>              prev_length -= 2;
> 
>              RSYNC_ROLL(strstart, prev_length+1);
> 
> +            while (prev_length >= 4) {
> 
> +                /* After actual verification, expanding this loop
> 
> +                 * can improve its performance in certain scenarios.
> 
> +                 */
> 
> +                prev_length -= 4;
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +                strstart++;
> 
> +                INSERT_STRING(strstart, hash_head);
> 
> +            }
> 
>              do {
> 
> +                if (prev_length == 0) break;
> 
>                  strstart++;
> 
>                  INSERT_STRING(strstart, hash_head);
> 
>                  /* strstart never exceeds WSIZE-MAX_MATCH, so there are
> 
> diff --git a/util.c b/util.c
> index 0a0fc21..c9f0e52 100644
> --- a/util.c
> +++ b/util.c
> @@ -38,6 +38,12 @@
>  
> 
>  static int write_buffer (int, voidp, unsigned int);
> 
>  
> 
> +#if defined __ARM_NEON__ || defined __ARM_NEON
> 
> +#define CRC32D(crc, val) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#define CRC32W(crc, val) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#define CRC32H(crc, val) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#define CRC32B(crc, val) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(val))
> 
> +#else
> 
>  /* ========================================================================
> 
>   * Table of CRC-32's of all single-byte values (made by makecrc.c)
> 
>   */
> 
> @@ -95,6 +101,7 @@ static const ulg crc_32_tab[] = {
>    0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,
> 
>    0x2d02ef8dL
> 
>  };
> 
> +#endif
> 
>  
> 
>  /* Shift register contents.  */
> 
>  static ulg crc = 0xffffffffL;
> 
> @@ -132,6 +139,43 @@ ulg updcrc(s, n)
>      const uch *s;           /* pointer to bytes to pump through */
> 
>      unsigned n;             /* number of bytes in s[] */
> 
>  {
> 
> +#if defined __ARM_NEON__ || defined __ARM_NEON
> 
> +    register ulg c;
> 
> +    static ulg crc = (ulg)0xffffffffL;
> 
> +    register const uint8_t  *buf1;
> 
> +    register const uint16_t *buf2;
> 
> +    register const uint32_t *buf4;
> 
> +    register const uint64_t *buf8;
> 
> +    int64_t length = (int64_t)n;
> 
> +    buf8 = (const  uint64_t *)(const void *)s;
> 
> +
> 
> +    if (s == NULL) {
> 
> +        c = 0xffffffffL;
> 
> +    } else {
> 
> +        c = crc;
> 
> +        while(length >= sizeof(uint64_t)) {
> 
> +            CRC32D(c, *buf8++);
> 
> +            length -= sizeof(uint64_t);
> 
> +        }
> 
> +        buf4 = (const uint32_t *)(const void *)buf8;
> 
> +        if (length >= sizeof(uint32_t)) {
> 
> +            CRC32W(c, *buf4++);
> 
> +            length -= sizeof(uint32_t);
> 
> +        }
> 
> +        buf2 = (const uint16_t *)(const void *)buf4;
> 
> +        if(length >= sizeof(uint16_t)) {
> 
> +            CRC32H(c, *buf2++);
> 
> +            length -= sizeof(uint16_t);
> 
> +        }
> 
> +        buf1 = (const uint8_t *)(const void *)buf2;
> 
> +        if (length >= sizeof(uint8_t)) {
> 
> +            CRC32B(c, *buf1);
> 
> +            length -= sizeof(uint8_t);
> 
> +        }
> 
> +    }
> 
> +    crc = c;
> 
> +    return (c ^ 0xffffffffL);
> 
> +#else
> 
>      register ulg c;         /* temporary variable */
> 
>  
> 
>      if (s == NULL) {
> 
> @@ -144,6 +188,7 @@ ulg updcrc(s, n)
>      }
> 
>      crc = c;
> 
>      return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) */
> 
> +#endif
> 
>  }
> 
>  
> 
>  /* Return a current CRC value.  */
> 

Please allow me to show a set of actual test data for this patch.

First, I made an original version of the program "gzip-1.10" based
on the gzip-1.10 source code, and then made an optimized version of
the program "gzip-optimized" after applying my optimization patch.

Next I use gzip-1.10 version to test the compression and decompression
time on some **xml** files:
[XML]# time ./gzip-1.10 *.xml

real    0m5.099s
user    0m4.384s
sys     0m0.176s
[XML]# time ./gzip-1.10 -d *.gz

real    0m2.173s
user    0m1.821s
sys     0m0.348s

Then use the optimized version to compare:
[XML]# time ./gzip-optimized *.xml

real    0m2.785s
user    0m2.576s
sys     0m0.204s
[XML]# time ./gzip-optimized -d *.gz

real    0m0.497s
user    0m0.176s
sys     0m0.320s


The next test object is a large **log** file:
[LOG]# time ./gzip-1.10 *.log

real    0m8.883s
user    0m8.652s
sys     0m0.217s
[LOG]# time ./gzip-1.10 -d *.gz

real    0m3.049s
user    0m2.604s
sys     0m0.439s

Also use the optimized version to compare:
[LOG]# time ./gzip-optimized *.log

real    0m6.882s
user    0m6.607s
sys     0m0.264s
[LOG]# time ./gzip-optimized -d *.gz

real    0m1.054s
user    0m0.622s
sys     0m0.431s

The above experimental data are from the aarch64 platform.

-- 
Best regards,
Li Qiang

Information forwarded to bug-gzip@HIDDEN:
bug#41535; Package gzip. Full text available.

Message received at submit <at> debbugs.gnu.org:

Received: (at submit) by debbugs.gnu.org; 26 May 2020 05:17:59 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue May 26 01:17:59 2020
Received: from localhost ([127.0.0.1]:43369 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1jdRyQ-0002db-Be
	for submit <at> debbugs.gnu.org; Tue, 26 May 2020 01:17:58 -0400
Received: from lists.gnu.org ([209.51.188.17]:56748)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <liqiang64@HIDDEN>) id 1jdPVi-0002uY-He
 for submit <at> debbugs.gnu.org; Mon, 25 May 2020 22:40:11 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10]:41510)
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <liqiang64@HIDDEN>)
 id 1jdPVi-0005rV-BR
 for bug-gzip@HIDDEN; Mon, 25 May 2020 22:40:10 -0400
Received: from szxga06-in.huawei.com ([45.249.212.32]:33558 helo=huawei.com)
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <liqiang64@HIDDEN>)
 id 1jdPVg-0000f4-DY
 for bug-gzip@HIDDEN; Mon, 25 May 2020 22:40:09 -0400
Received: from DGGEMS402-HUB.china.huawei.com (unknown [172.30.72.60])
 by Forcepoint Email with ESMTP id 8CC6788B7A8D9320AB1D;
 Tue, 26 May 2020 10:39:52 +0800 (CST)
Received: from huawei.com (10.108.222.92) by DGGEMS402-HUB.china.huawei.com
 (10.3.19.202) with Microsoft SMTP Server id 14.3.487.0; Tue, 26 May 2020
 10:39:42 +0800
From: l00374334 <liqiang64@HIDDEN>
To: <bug-gzip@HIDDEN>, <eggert@HIDDEN>
Subject: [PATCH] performance optimization for aarch64
Date: Tue, 26 May 2020 10:39:40 +0800
Message-ID: <20200526023940.1967-1-liqiang64@HIDDEN>
X-Mailer: git-send-email 2.23.0.windows.1
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain
X-Originating-IP: [10.108.222.92]
X-CFilter-Loop: Reflected
Received-SPF: pass client-ip=45.249.212.32; envelope-from=liqiang64@HIDDEN;
 helo=huawei.com
X-detected-operating-system: by eggs.gnu.org: First seen = 2020/05/25 22:39:53
X-ACL-Warn: Detected OS   = Linux 3.11 and newer [fuzzy]
X-Spam_score_int: -41
X-Spam_score: -4.2
X-Spam_bar: ----
X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3,
 RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=_AUTOLEARN
X-Spam_action: no action
X-Spam-Score: -1.4 (-)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Tue, 26 May 2020 01:17:56 -0400
Cc: luanjianhai@HIDDEN, liqiang64@HIDDEN, sangyan@HIDDEN,
 luchunhua@HIDDEN
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.4 (--)

From: liqiang <liqiang64@HIDDEN>

By analyzing the compression and decompression process of gzip, I found =0D
that the hot spots of CRC32 and longest_match function are very high.=0D
=0D
On the aarch64 architecture, we can optimize the efficiency of crc32 =0D
through the interface provided by the neon instruction set (12x faster =0D
in aarch64), and optimize the performance of random access code through =0D
prefetch instructions (about 5%~8% improvement). In some compression =0D
scenarios, loop expansion can also get a certain performance improvement =0D
(about 10%).=0D
=0D
Modify by Li Qiang.

---
 configure | 14 ++++++++++++++
 deflate.c | 30 +++++++++++++++++++++++++++++-
 util.c    | 45 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index cab3daf..dc80cb6 100644
--- a/configure
+++ b/configure
@@ -14555,6 +14555,20 @@ rm -f core conftest.err conftest.$ac_objext confte=
st.$ac_ext
            ;;
=20
          arm* | aarch64 )
+           cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#if defined __ARM_NEON__ || defined __ARM_NEON
+                   int ok;
+                  #else
+                   error fail
+                  #endif
+
+_ACEOF
+if ac_fn_c_try_compile "$LINENO"
+then :
+  CFLAGS=3D"$CFLAGS -march=3Darmv8-a+crc"
+fi
+rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
            # Assume arm with EABI.
            # On arm64 systems, the C compiler may be generating code in on=
e of
            # these ABIs:
diff --git a/deflate.c b/deflate.c
index 9d379e9..ee77ffd 100644
--- a/deflate.c
+++ b/deflate.c
@@ -378,6 +378,9 @@ longest_match(IPos cur_match)
     register int len;                           /* length of current match=
 */=0D
     int best_len =3D prev_length;                 /* best match length so =
far */=0D
     IPos limit =3D strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST :=
 NIL;=0D
+#ifdef __aarch64__=0D
+    IPos next_match;=0D
+#endif=0D
     /* Stop when cur_match becomes <=3D limit. To simplify the code,=0D
      * we prevent matches with the string of window index 0.=0D
      */=0D
@@ -411,6 +414,10 @@ longest_match(IPos cur_match)
     do {=0D
         Assert(cur_match < strstart, "no future");=0D
         match =3D window + cur_match;=0D
+#ifdef __aarch64__=0D
+        next_match =3D prev[cur_match & WMASK];=0D
+        __asm__("PRFM PLDL1STRM, [%0]"::"r"(&(prev[next_match & WMASK])));=
=0D
+#endif=0D
 =0D
         /* Skip to next match if the match length cannot increase=0D
          * or if the match length is less than 2:=0D
@@ -488,8 +495,14 @@ longest_match(IPos cur_match)
             scan_end   =3D scan[best_len];=0D
 #endif=0D
         }=0D
-    } while ((cur_match =3D prev[cur_match & WMASK]) > limit=0D
+    }=0D
+#ifdef __aarch64__=0D
+    while ((cur_match =3D next_match) > limit=0D
+             && --chain_length !=3D 0);=0D
+#else=0D
+    while ((cur_match =3D prev[cur_match & WMASK]) > limit=0D
              && --chain_length !=3D 0);=0D
+#endif=0D
 =0D
     return best_len;=0D
 }=0D
@@ -777,7 +790,22 @@ deflate (int pack_level)
             lookahead -=3D prev_length-1;=0D
             prev_length -=3D 2;=0D
             RSYNC_ROLL(strstart, prev_length+1);=0D
+            while (prev_length >=3D 4) {=0D
+                /* After actual verification, expanding this loop=0D
+                 * can improve its performance in certain scenarios.=0D
+                 */=0D
+                prev_length -=3D 4;=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+                strstart++;=0D
+                INSERT_STRING(strstart, hash_head);=0D
+            }=0D
             do {=0D
+                if (prev_length =3D=3D 0) break;=0D
                 strstart++;=0D
                 INSERT_STRING(strstart, hash_head);=0D
                 /* strstart never exceeds WSIZE-MAX_MATCH, so there are=0D
diff --git a/util.c b/util.c
index 0a0fc21..c9f0e52 100644
--- a/util.c
+++ b/util.c
@@ -38,6 +38,12 @@
 =0D
 static int write_buffer (int, voidp, unsigned int);=0D
 =0D
+#if defined __ARM_NEON__ || defined __ARM_NEON=0D
+#define CRC32D(crc, val) __asm__("crc32x %w[c], %w[c], %x[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#define CRC32W(crc, val) __asm__("crc32w %w[c], %w[c], %w[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#define CRC32H(crc, val) __asm__("crc32h %w[c], %w[c], %w[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#define CRC32B(crc, val) __asm__("crc32b %w[c], %w[c], %w[v]":[c]"+r"(crc)=
:[v]"r"(val))=0D
+#else=0D
 /* =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=0D
  * Table of CRC-32's of all single-byte values (made by makecrc.c)=0D
  */=0D
@@ -95,6 +101,7 @@ static const ulg crc_32_tab[] =3D {
   0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL,=0D
   0x2d02ef8dL=0D
 };=0D
+#endif=0D
 =0D
 /* Shift register contents.  */=0D
 static ulg crc =3D 0xffffffffL;=0D
@@ -132,6 +139,43 @@ ulg updcrc(s, n)
     const uch *s;           /* pointer to bytes to pump through */=0D
     unsigned n;             /* number of bytes in s[] */=0D
 {=0D
+#if defined __ARM_NEON__ || defined __ARM_NEON=0D
+    register ulg c;=0D
+    static ulg crc =3D (ulg)0xffffffffL;=0D
+    register const uint8_t  *buf1;=0D
+    register const uint16_t *buf2;=0D
+    register const uint32_t *buf4;=0D
+    register const uint64_t *buf8;=0D
+    int64_t length =3D (int64_t)n;=0D
+    buf8 =3D (const  uint64_t *)(const void *)s;=0D
+=0D
+    if (s =3D=3D NULL) {=0D
+        c =3D 0xffffffffL;=0D
+    } else {=0D
+        c =3D crc;=0D
+        while(length >=3D sizeof(uint64_t)) {=0D
+            CRC32D(c, *buf8++);=0D
+            length -=3D sizeof(uint64_t);=0D
+        }=0D
+        buf4 =3D (const uint32_t *)(const void *)buf8;=0D
+        if (length >=3D sizeof(uint32_t)) {=0D
+            CRC32W(c, *buf4++);=0D
+            length -=3D sizeof(uint32_t);=0D
+        }=0D
+        buf2 =3D (const uint16_t *)(const void *)buf4;=0D
+        if(length >=3D sizeof(uint16_t)) {=0D
+            CRC32H(c, *buf2++);=0D
+            length -=3D sizeof(uint16_t);=0D
+        }=0D
+        buf1 =3D (const uint8_t *)(const void *)buf2;=0D
+        if (length >=3D sizeof(uint8_t)) {=0D
+            CRC32B(c, *buf1);=0D
+            length -=3D sizeof(uint8_t);=0D
+        }=0D
+    }=0D
+    crc =3D c;=0D
+    return (c ^ 0xffffffffL);=0D
+#else=0D
     register ulg c;         /* temporary variable */=0D
 =0D
     if (s =3D=3D NULL) {=0D
@@ -144,6 +188,7 @@ ulg updcrc(s, n)
     }=0D
     crc =3D c;=0D
     return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) *=
/=0D
+#endif=0D
 }=0D
 =0D
 /* Return a current CRC value.  */=0D
--=20
2.17.1

Acknowledgement sent to l00374334 <liqiang64@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gzip@HIDDEN. Full text available.

Report forwarded to bug-gzip@HIDDEN:
bug#41535; Package gzip. Full text available.

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Last modified: Tue, 5 Apr 2022 01:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #41535 [PATCH] performance optimization for aarch64

Message received at 41535 <at> debbugs.gnu.org:

Message received at 41535 <at> debbugs.gnu.org:

Message received at 41535 <at> debbugs.gnu.org:

Message received at submit <at> debbugs.gnu.org:

GNU bug report logs - #41535
[PATCH] performance optimization for aarch64