GNU bug report logs - #48424
bug in "gzip -lv gzip-file"

Previous Next

Package: gzip;

Reported by: Robert Urban <robert.urban <at> stromasys.com>

Date: Fri, 14 May 2021 19:27:01 UTC

Severity: normal

Merged with 17804, 29089, 30935, 30936, 38766, 42965, 52227

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 48424 in the body.
You can then email your comments to 48424 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#48424; Package gzip. (Fri, 14 May 2021 19:27:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Robert Urban <robert.urban <at> stromasys.com>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Fri, 14 May 2021 19:27:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Robert Urban <robert.urban <at> stromasys.com>
To: bug-gzip <at> gnu.org
Subject: bug in "gzip -lv gzip-file"
Date: Fri, 14 May 2021 20:01:21 +0200
[Message part 1 (text/plain, inline)]
Hello,

gzip (at least my version, v1.10 running on Fedora 33) apparently uses an
unsigned 32-bit value when displaying the uncompressed size of a gzipped file.

This demonstrates the problem:

Create a 5GiB test file:

    $ fallocate -l $((5*1024*1024*1024)) fatfile

Compress it:

    $ gzip -c fatfile > fatfile.gz

List the contents:

    $ gzip -lv fatfile.gz
    method  crc     date  time           compressed        uncompressed  ratio
    uncompressed_name
    defla 193838c3 May 14 19:53             5857306          1073741824  99.5%
    fatfile

As you can see, the value in the "uncompressed" column is exactly 1GiB.

Regards,
Robert Urban

Please cc me in replies, as I'm not a subscriber of the list

[Message part 2 (text/html, inline)]
[smime.p7s (application/pkcs7-signature, attachment)]

Information forwarded to bug-gzip <at> gnu.org:
bug#48424; Package gzip. (Fri, 14 May 2021 19:54:02 GMT) Full text and rfc822 format available.

Message #8 received at 48424 <at> debbugs.gnu.org (full text, mbox):

From: "Adler, Mark" <madler <at> alumni.caltech.edu>
To: Robert Urban <robert.urban <at> stromasys.com>
Cc: "48424 <at> debbugs.gnu.org" <48424 <at> debbugs.gnu.org>
Subject: Re: bug#48424: bug in "gzip -lv gzip-file"
Date: Fri, 14 May 2021 19:53:22 +0000
[Message part 1 (text/plain, inline)]
Robert,

No, it’s not that the gzip utility implementation is using the wrong size integer. This is because the gzip utility is using the gzip-format trailer to guess at the uncompressed length. That trailer has a four-byte length, which is the uncompressed length of the last member modulo 2^32. Sometimes the guess is wrong.

The only way around this limitation, built into the gzip format, would be to decode the entire file to compute the determine the actual uncompressed length. pigz will do this on request with the -lt option.

There is no way to both rapidly and reliably get the uncompressed length.

What’s more, a compressed length of more than 4 GiB is not the only way for gzip -l to be wrong. gzip streams can consist of multiple members, in which case gzip -l will report the length from only the last member. Here is an example, first correctly enumerated by pigz -ltv:

% pigz -ltv mult.gz
method    check    timestamp    compressed   original reduced  name
gzip 8  66007dba  Mar 21  2005       54405     152089   64.2%  alice
gzip 8  b56c3f9d  Mar 21  2005          13         14    7.1%  <...>
gzip 8  8efc3b00  Mar 21  2005       71667     296960   75.9%  <...>

gzip -lv will give information only from the last member:

% gzip -lv mult.gz
method  crc     date  time           compressed        uncompressed  ratio uncompressed_name
defla 8efc3b00 Feb  2 09:30              126145              296960  57.5% mult

pigz -lv just looks at the trailer for the crc and length just like gzip, and also gets it wrong:

% pigz -lv mult.gz
method    check    timestamp    compressed   original reduced  name
gzip 8  8efc3b00  Mar 21  2005      126121     296960   57.5%  alice

Mark


On May 14, 2021, at 11:01 AM, Robert Urban <robert.urban <at> stromasys.com<mailto:robert.urban <at> stromasys.com>> wrote:

Hello,

gzip (at least my version, v1.10 running on Fedora 33) apparently uses an
unsigned 32-bit value when displaying the uncompressed size of a gzipped file.

This demonstrates the problem:

Create a 5GiB test file:

   $ fallocate -l $((5*1024*1024*1024)) fatfile

Compress it:

   $ gzip -c fatfile > fatfile.gz

List the contents:

   $ gzip -lv fatfile.gz
   method  crc     date  time           compressed        uncompressed  ratio
   uncompressed_name
   defla 193838c3 May 14 19:53             5857306          1073741824  99.5%
   fatfile

As you can see, the value in the "uncompressed" column is exactly 1GiB.

Regards,
Robert Urban

Please cc me in replies, as I'm not a subscriber of the list


[Message part 2 (text/html, inline)]

Merged 17804 29089 30935 30936 38766 42965 48424 52227. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Wed, 01 Dec 2021 23:34:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 13 Jan 2022 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 75 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.