GNU bug report logs - #67022
Gzip decompression can be 60% faster using zlib's CRC32

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: gzip; Reported by: Young Mo Kang <kym327@HIDDEN>; dated Thu, 9 Nov 2023 17:42:01 UTC; Maintainer for gzip is bug-gzip@HIDDEN.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 9 Nov 2023 17:41:17 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Thu Nov 09 12:41:17 2023
Received: from localhost ([127.0.0.1]:48681 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1r191p-0000lA-EJ
	for submit <at> debbugs.gnu.org; Thu, 09 Nov 2023 12:41:17 -0500
Received: from lists.gnu.org ([2001:470:142::17]:34440)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <kym327@HIDDEN>) id 1r191m-0000kw-Km
 for submit <at> debbugs.gnu.org; Thu, 09 Nov 2023 12:41:15 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <kym327@HIDDEN>) id 1r1914-0001w5-1L
 for bug-gzip@HIDDEN; Thu, 09 Nov 2023 12:40:30 -0500
Received: from mail-qt1-x836.google.com ([2607:f8b0:4864:20::836])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <kym327@HIDDEN>) id 1r1912-0004NM-8G
 for bug-gzip@HIDDEN; Thu, 09 Nov 2023 12:40:29 -0500
Received: by mail-qt1-x836.google.com with SMTP id
 d75a77b69052e-41cd444d9d0so7507451cf.2
 for <bug-gzip@HIDDEN>; Thu, 09 Nov 2023 09:40:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20230601; t=1699551626; x=1700156426; darn=gnu.org;
 h=content-transfer-encoding:subject:from:to:content-language
 :user-agent:mime-version:date:message-id:from:to:cc:subject:date
 :message-id:reply-to;
 bh=4xj/b7+3lxCUWZ5BqvyWu26PadinSdhpL5Gb+alIpXs=;
 b=jsUlC78mD7oKOF0L44AGwOmEw8caH3ZBlLtk9+fcKbWa94xCnLhxzzrZcbxCF6eTWa
 QqPEps5BX9wqHC+0qkOrCnqMtVY5W7s9OWU5R0c5b5eeahNc8N6lFbHxygy1WOac2kU3
 1sBx75dE1KfR12VLF8WZZ2ZBIsYTnQ46Nu6WipIjyoBRUtxK9wsXhMAyV4e5XFilo1SX
 Cuo46rlNk3YTDVp8QeK+wFQ0xpZnmGqiGROA4sCupsxr3gy6sQGTbZIaPHW3Ra45I1bb
 KUhfnu3333GmBFVL0w2C8okwzBBkzCvkiu2MVSuPrQExvDrZE7xX3tXuTkmEdlnSB252
 4A7w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1699551626; x=1700156426;
 h=content-transfer-encoding:subject:from:to:content-language
 :user-agent:mime-version:date:message-id:x-gm-message-state:from:to
 :cc:subject:date:message-id:reply-to;
 bh=4xj/b7+3lxCUWZ5BqvyWu26PadinSdhpL5Gb+alIpXs=;
 b=wDrw2hn6R0f6+F9w31t+3czeJH5PJ5cTY0ZqzNW71H6dS0KCfR/jTl4vbk5JEAlFfj
 cN3BLaDXCUN6xnXnXT08WjCYk8HFg90C0VnHFHvfnxSLNl6k59AR3z8Uc8oD1tJXA2fW
 Nv20/x9C8tz+N+kH7LZUDunsnae/o04k3zDTyumkDaCohYUuwJkoPwwdjPEId1LmuqtK
 e4iResOTsQpDj68tcequG14wqxGeoB8Eg4VKteSkoQtziSkRNRb4/nfhs5JKlA48Bszh
 ehNGRCUepNMj+EeXmUw+yyvh3ACt+dMdFl7mSghggjwiwJAgH1HXOb5aUlED1YiTX6NI
 5uFw==
X-Gm-Message-State: AOJu0YxZgzfpkNw9GTTusgX1C+AiJ0YsX6fl/i/f2AVjuT8hAJtH4vzM
 UmWy/xUUgROhQT1AhfobDYJ5PSzn+A==
X-Google-Smtp-Source: AGHT+IFdH5rL1itK2Bjkj4kTnzeLECoO09hfqUiw9rtz+txGQfcdCqdq3U25fV2tI5gJF20Ot0hvGw==
X-Received: by 2002:a05:622a:130c:b0:419:5b97:2fbb with SMTP id
 v12-20020a05622a130c00b004195b972fbbmr6106355qtk.34.1699551626359; 
 Thu, 09 Nov 2023 09:40:26 -0800 (PST)
Received: from ?IPV6:2603:7000:3400:d5b2:1b9e:c639:2739:6a32?
 (2603-7000-3400-d5b2-1b9e-c639-2739-6a32.res6.spectrum.com.
 [2603:7000:3400:d5b2:1b9e:c639:2739:6a32])
 by smtp.gmail.com with ESMTPSA id
 f10-20020ac8134a000000b004108d49f391sm2114381qtj.48.2023.11.09.09.40.25
 for <bug-gzip@HIDDEN>
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 09 Nov 2023 09:40:25 -0800 (PST)
Message-ID: <95870740-74fc-4416-aad0-640c0eaf8832@HIDDEN>
Date: Thu, 9 Nov 2023 12:40:24 -0500
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Content-Language: en-US
To: bug-gzip@HIDDEN
From: Young Mo Kang <kym327@HIDDEN>
Subject: Gzip decompression can be 60% faster using zlib's CRC32
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=2607:f8b0:4864:20::836;
 envelope-from=kym327@HIDDEN; helo=mail-qt1-x836.google.com
X-Spam_score_int: -18
X-Spam_score: -1.9
X-Spam_bar: -
X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 1.2 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 Content preview:  Hello, I have noticed that GNU Gzip's CRC32 calculation is
 the main bottleneck in decompression, and it can run significantly faster
 >60% if we replace it with crc32 function from zlib. I tested decompression
 speed of linux source code tar.gz file before and after replacing CRC32
 computation. On an AMD 7735HS system, I get 
 Content analysis details:   (1.2 points, 10.0 required)
 pts rule name              description
 ---- ---------------------- --------------------------------------------------
 0.0 FREEMAIL_FROM          Sender email is commonly abused enduser mail
 provider (kym327[at]gmail.com)
 -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
 1.0 SPF_SOFTFAIL           SPF: sender does not match SPF record (softfail)
 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends
 in digit (kym327[at]gmail.com)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 0.2 (/)

Hello,


I have noticed that GNU Gzip's CRC32 calculation is the main bottleneck 
in decompression, and it can run significantly faster >60% if we replace 
it with crc32 function from zlib.


I tested decompression speed of linux source code tar.gz file before and 
after replacing CRC32 computation. On an AMD 7735HS system, I get

GNU Gzip unmodified
     Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.11
GNU Gzip with CRC32 from zlib
     Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.16


And I saw even better performance improvement when tested on an Apple 
Silicon M1 system.

GNU Gzip unmodified
     Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.83
GNU Gzip with CRC32 from zlib
     Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.72


Since both GNU Gzip and zlib are written by the same authors, I was 
wondering if GNU Gzip can share zlib's CRC32 calculation and obtain this 
performance gain--I am not sure if there would be a license issue though.


The following bash script should reproduce the result

```

# download GNU Gzip and zlib
wget -O- https://ftp.gnu.org/gnu/gzip/gzip-1.13.tar.gz | tar xzf -
wget -O- https://zlib.net/zlib-1.3.tar.gz | tar xzf -

# download linux source code as a test file for decompression speed
wget -O- https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.1.tar.xz 
| xz -d | gzip > linux.tar.gz

# compile zlib
cd zlib-1.3
CFLAGS="-O2 -g" ./configure --static && make -j
cd ..

# compile GNU Gzip
cd gzip-1.13
CFLAGS="-O2 -g" ./configure && make -j

# measure decompression speed
/usr/bin/time -v ./gzip -d < ../linux.tar.gz > linux.tar 2> ../gzip1.time

# use crc32 from zlib
cat > util.diff << EOF
@@ -27,6 +27,7 @@
  #include <stdlib.h>
  #include <errno.h>

+#include "crc32.h"
  #include "tailor.h"
  #include "gzip.h"
  #include <dirname.h>
@@ -136,25 +137,14 @@ copy (int in, int out)
  ulg
  updcrc (uch const *s, unsigned n)
  {
-    register ulg c;         /* temporary variable */
-
-    if (s == NULL) {
-        c = 0xffffffffL;
-    } else {
-        c = crc;
-        if (n) do {
-            c = crc_32_tab[((int)c ^ (*s++)) & 0xff] ^ (c >> 8);
-        } while (--n);
-    }
-    crc = c;
-    return c ^ 0xffffffffL;       /* (instead of ~c for 64-bit machines) */
+    crc = crc32(crc, s, n);
  }

  /* Return a current CRC value.  */
  ulg
  getcrc ()
  {
-  return crc ^ 0xffffffffL;
+  return crc;
  }

  #ifdef IBM_Z_DFLTCC
EOF
patch < util.diff util.c

# create header file
cat > crc32.h << EOF
#pragma once

unsigned long  crc32(unsigned long crc, const unsigned char  *buf,
                             unsigned int len);
EOF

# copy crc32 object file from zlib
cp ../zlib-1.3/crc32.o .

# re-compile GNU Gzip
gcc -O2 -g -c util.c -Ilib
gcc -O2 -g *.o lib/libgzip.a -o gzip

# measure decompression speed
/usr/bin/time -v ./gzip -d < ../linux.tar.gz > linux.tar 2> ../gzip2.time

# print out time difference
cd ..
echo
echo "GNU Gzip unmodified"
grep Elapsed gzip1.time
echo "GNU Gzip with CRC32 from zlib"
grep Elapsed gzip2.time
```





Acknowledgement sent to Young Mo Kang <kym327@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gzip@HIDDEN. Full text available.
Report forwarded to bug-gzip@HIDDEN:
bug#67022; Package gzip. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Thu, 9 Nov 2023 17:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.