Received: (at submit) by debbugs.gnu.org; 9 Nov 2023 17:41:17 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Thu Nov 09 12:41:17 2023 Received: from localhost ([127.0.0.1]:48681 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1r191p-0000lA-EJ for submit <at> debbugs.gnu.org; Thu, 09 Nov 2023 12:41:17 -0500 Received: from lists.gnu.org ([2001:470:142::17]:34440) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <kym327@HIDDEN>) id 1r191m-0000kw-Km for submit <at> debbugs.gnu.org; Thu, 09 Nov 2023 12:41:15 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <kym327@HIDDEN>) id 1r1914-0001w5-1L for bug-gzip@HIDDEN; Thu, 09 Nov 2023 12:40:30 -0500 Received: from mail-qt1-x836.google.com ([2607:f8b0:4864:20::836]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <kym327@HIDDEN>) id 1r1912-0004NM-8G for bug-gzip@HIDDEN; Thu, 09 Nov 2023 12:40:29 -0500 Received: by mail-qt1-x836.google.com with SMTP id d75a77b69052e-41cd444d9d0so7507451cf.2 for <bug-gzip@HIDDEN>; Thu, 09 Nov 2023 09:40:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1699551626; x=1700156426; darn=gnu.org; h=content-transfer-encoding:subject:from:to:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=4xj/b7+3lxCUWZ5BqvyWu26PadinSdhpL5Gb+alIpXs=; b=jsUlC78mD7oKOF0L44AGwOmEw8caH3ZBlLtk9+fcKbWa94xCnLhxzzrZcbxCF6eTWa QqPEps5BX9wqHC+0qkOrCnqMtVY5W7s9OWU5R0c5b5eeahNc8N6lFbHxygy1WOac2kU3 1sBx75dE1KfR12VLF8WZZ2ZBIsYTnQ46Nu6WipIjyoBRUtxK9wsXhMAyV4e5XFilo1SX Cuo46rlNk3YTDVp8QeK+wFQ0xpZnmGqiGROA4sCupsxr3gy6sQGTbZIaPHW3Ra45I1bb KUhfnu3333GmBFVL0w2C8okwzBBkzCvkiu2MVSuPrQExvDrZE7xX3tXuTkmEdlnSB252 4A7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699551626; x=1700156426; h=content-transfer-encoding:subject:from:to:content-language :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=4xj/b7+3lxCUWZ5BqvyWu26PadinSdhpL5Gb+alIpXs=; b=wDrw2hn6R0f6+F9w31t+3czeJH5PJ5cTY0ZqzNW71H6dS0KCfR/jTl4vbk5JEAlFfj cN3BLaDXCUN6xnXnXT08WjCYk8HFg90C0VnHFHvfnxSLNl6k59AR3z8Uc8oD1tJXA2fW Nv20/x9C8tz+N+kH7LZUDunsnae/o04k3zDTyumkDaCohYUuwJkoPwwdjPEId1LmuqtK e4iResOTsQpDj68tcequG14wqxGeoB8Eg4VKteSkoQtziSkRNRb4/nfhs5JKlA48Bszh ehNGRCUepNMj+EeXmUw+yyvh3ACt+dMdFl7mSghggjwiwJAgH1HXOb5aUlED1YiTX6NI 5uFw== X-Gm-Message-State: AOJu0YxZgzfpkNw9GTTusgX1C+AiJ0YsX6fl/i/f2AVjuT8hAJtH4vzM UmWy/xUUgROhQT1AhfobDYJ5PSzn+A== X-Google-Smtp-Source: AGHT+IFdH5rL1itK2Bjkj4kTnzeLECoO09hfqUiw9rtz+txGQfcdCqdq3U25fV2tI5gJF20Ot0hvGw== X-Received: by 2002:a05:622a:130c:b0:419:5b97:2fbb with SMTP id v12-20020a05622a130c00b004195b972fbbmr6106355qtk.34.1699551626359; Thu, 09 Nov 2023 09:40:26 -0800 (PST) Received: from ?IPV6:2603:7000:3400:d5b2:1b9e:c639:2739:6a32? (2603-7000-3400-d5b2-1b9e-c639-2739-6a32.res6.spectrum.com. [2603:7000:3400:d5b2:1b9e:c639:2739:6a32]) by smtp.gmail.com with ESMTPSA id f10-20020ac8134a000000b004108d49f391sm2114381qtj.48.2023.11.09.09.40.25 for <bug-gzip@HIDDEN> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 09 Nov 2023 09:40:25 -0800 (PST) Message-ID: <95870740-74fc-4416-aad0-640c0eaf8832@HIDDEN> Date: Thu, 9 Nov 2023 12:40:24 -0500 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: bug-gzip@HIDDEN From: Young Mo Kang <kym327@HIDDEN> Subject: Gzip decompression can be 60% faster using zlib's CRC32 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::836; envelope-from=kym327@HIDDEN; helo=mail-qt1-x836.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 1.2 (+) X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org", has NOT identified this incoming email as spam. The original message has been attached to this so you can view it or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Hello, I have noticed that GNU Gzip's CRC32 calculation is the main bottleneck in decompression, and it can run significantly faster >60% if we replace it with crc32 function from zlib. I tested decompression speed of linux source code tar.gz file before and after replacing CRC32 computation. On an AMD 7735HS system, I get Content analysis details: (1.2 points, 10.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (kym327[at]gmail.com) -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 1.0 SPF_SOFTFAIL SPF: sender does not match SPF record (softfail) 0.2 FREEMAIL_ENVFROM_END_DIGIT Envelope-from freemail username ends in digit (kym327[at]gmail.com) X-Debbugs-Envelope-To: submit X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: 0.2 (/) Hello, I have noticed that GNU Gzip's CRC32 calculation is the main bottleneck in decompression, and it can run significantly faster >60% if we replace it with crc32 function from zlib. I tested decompression speed of linux source code tar.gz file before and after replacing CRC32 computation. On an AMD 7735HS system, I get GNU Gzip unmodified Elapsed (wall clock) time (h:mm:ss or m:ss): 0:05.11 GNU Gzip with CRC32 from zlib Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.16 And I saw even better performance improvement when tested on an Apple Silicon M1 system. GNU Gzip unmodified Elapsed (wall clock) time (h:mm:ss or m:ss): 0:06.83 GNU Gzip with CRC32 from zlib Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.72 Since both GNU Gzip and zlib are written by the same authors, I was wondering if GNU Gzip can share zlib's CRC32 calculation and obtain this performance gain--I am not sure if there would be a license issue though. The following bash script should reproduce the result ``` # download GNU Gzip and zlib wget -O- https://ftp.gnu.org/gnu/gzip/gzip-1.13.tar.gz | tar xzf - wget -O- https://zlib.net/zlib-1.3.tar.gz | tar xzf - # download linux source code as a test file for decompression speed wget -O- https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.6.1.tar.xz | xz -d | gzip > linux.tar.gz # compile zlib cd zlib-1.3 CFLAGS="-O2 -g" ./configure --static && make -j cd .. # compile GNU Gzip cd gzip-1.13 CFLAGS="-O2 -g" ./configure && make -j # measure decompression speed /usr/bin/time -v ./gzip -d < ../linux.tar.gz > linux.tar 2> ../gzip1.time # use crc32 from zlib cat > util.diff << EOF @@ -27,6 +27,7 @@ #include <stdlib.h> #include <errno.h> +#include "crc32.h" #include "tailor.h" #include "gzip.h" #include <dirname.h> @@ -136,25 +137,14 @@ copy (int in, int out) ulg updcrc (uch const *s, unsigned n) { - register ulg c; /* temporary variable */ - - if (s == NULL) { - c = 0xffffffffL; - } else { - c = crc; - if (n) do { - c = crc_32_tab[((int)c ^ (*s++)) & 0xff] ^ (c >> 8); - } while (--n); - } - crc = c; - return c ^ 0xffffffffL; /* (instead of ~c for 64-bit machines) */ + crc = crc32(crc, s, n); } /* Return a current CRC value. */ ulg getcrc () { - return crc ^ 0xffffffffL; + return crc; } #ifdef IBM_Z_DFLTCC EOF patch < util.diff util.c # create header file cat > crc32.h << EOF #pragma once unsigned long crc32(unsigned long crc, const unsigned char *buf, unsigned int len); EOF # copy crc32 object file from zlib cp ../zlib-1.3/crc32.o . # re-compile GNU Gzip gcc -O2 -g -c util.c -Ilib gcc -O2 -g *.o lib/libgzip.a -o gzip # measure decompression speed /usr/bin/time -v ./gzip -d < ../linux.tar.gz > linux.tar 2> ../gzip2.time # print out time difference cd .. echo echo "GNU Gzip unmodified" grep Elapsed gzip1.time echo "GNU Gzip with CRC32 from zlib" grep Elapsed gzip2.time ```
Young Mo Kang <kym327@HIDDEN>
:bug-gzip@HIDDEN
.
Full text available.bug-gzip@HIDDEN
:bug#67022
; Package gzip
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.