GNU bug report logs - #29089
Truncated size of big file

Previous Next

Package: gzip;

Reported by: Alex Peshkoff <peshkoff <at> mail.ru>

Date: Tue, 31 Oct 2017 18:05:01 UTC

Severity: normal

Merged with 17804, 30935, 30936, 38766, 42965, 48424, 52227

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29089 in the body.
You can then email your comments to 29089 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#29089; Package gzip. (Tue, 31 Oct 2017 18:05:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Alex Peshkoff <peshkoff <at> mail.ru>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Tue, 31 Oct 2017 18:05:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Alex Peshkoff <peshkoff <at> mail.ru>
To: bug-gzip <at> gnu.org
Subject: Truncated size of big file
Date: Tue, 31 Oct 2017 20:59:33 +0300
Before decompressing a copy of database I've decided to take a look at 
it's size:

localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
         compressed        uncompressed  ratio uncompressed_name
         3645968323          1782666240 -104.5% SWHTOROLT_20171019.GBK

uncompressed is reported as 1.7Gb which is definitely something unreal 
like -104.5 compress ratio

Actual size after unzip is:

localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
localhost stg # ls -l SWHTOROLT_20171019.GBK
-rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK

Lickily I've had enough disk space - but let me not attach problematic 
archive to email, I suppose it's easier to reproduce this locally ;)

Alex.






Information forwarded to bug-gzip <at> gnu.org:
bug#29089; Package gzip. (Tue, 31 Oct 2017 18:21:01 GMT) Full text and rfc822 format available.

Message #8 received at 29089 <at> debbugs.gnu.org (full text, mbox):

From: Mark Adler <madler <at> alumni.caltech.edu>
To: Alex Peshkoff <peshkoff <at> mail.ru>
Cc: 29089 <at> debbugs.gnu.org
Subject: Re: bug#29089: Truncated size of big file
Date: Tue, 31 Oct 2017 11:20:29 -0700
Alex,

This is inherent in the gzip format, and is not really a bug in gzip. (Though gzip could notice the problem and not display a large negative compression ratio.)

The gzip format stores the uncompressed length at the end using four bytes, which can only represent up to 2^32-1. So what you are seeing is the low 32 bits of 18962535424, which is in fact 1782666240. When gzip uses that truncated value to compute a compression ratio, it gets a nonsensical result.

Unfortunately the only way to get the real uncompressed length and compute a real ratio is to decompress the entire file. (In fact, pigz will do this with "pigz -lt", which tests the entire file without storing the result, and reports the correct uncompressed size and compression ratio. "pigz -l" will do the same bad thing that "gzip -l" does on > 4 GB uncompressed sizes, though it will report “unk” for questionable ratios, i.e. expansions of the data beyond what would be expected for incompressible data.)

Mark


> On Oct 31, 2017, at 10:59 AM, Alex Peshkoff <peshkoff <at> mail.ru> wrote:
> 
> Before decompressing a copy of database I've decided to take a look at it's size:
> 
> localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
>          compressed        uncompressed  ratio uncompressed_name
>          3645968323          1782666240 -104.5% SWHTOROLT_20171019.GBK
> 
> uncompressed is reported as 1.7Gb which is definitely something unreal like -104.5 compress ratio
> 
> Actual size after unzip is:
> 
> localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
> localhost stg # ls -l SWHTOROLT_20171019.GBK
> -rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK
> 
> Lickily I've had enough disk space - but let me not attach problematic archive to email, I suppose it's easier to reproduce this locally ;)
> 
> Alex.
> 
> 
> 
> 
> 





Merged 17804 29089 30935 30936 38766 42965 48424 52227. Request was from Paul Eggert <eggert <at> cs.ucla.edu> to control <at> debbugs.gnu.org. (Wed, 01 Dec 2021 23:34:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 13 Jan 2022 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 104 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.