GNU bug report logs -
#29089
Truncated size of big file
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29089 in the body.
You can then email your comments to 29089 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gzip <at> gnu.org
:
bug#29089
; Package
gzip
.
(Tue, 31 Oct 2017 18:05:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Alex Peshkoff <peshkoff <at> mail.ru>
:
New bug report received and forwarded. Copy sent to
bug-gzip <at> gnu.org
.
(Tue, 31 Oct 2017 18:05:03 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Before decompressing a copy of database I've decided to take a look at
it's size:
localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
compressed uncompressed ratio uncompressed_name
3645968323 1782666240 -104.5% SWHTOROLT_20171019.GBK
uncompressed is reported as 1.7Gb which is definitely something unreal
like -104.5 compress ratio
Actual size after unzip is:
localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
localhost stg # ls -l SWHTOROLT_20171019.GBK
-rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK
Lickily I've had enough disk space - but let me not attach problematic
archive to email, I suppose it's easier to reproduce this locally ;)
Alex.
Information forwarded
to
bug-gzip <at> gnu.org
:
bug#29089
; Package
gzip
.
(Tue, 31 Oct 2017 18:21:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 29089 <at> debbugs.gnu.org (full text, mbox):
Alex,
This is inherent in the gzip format, and is not really a bug in gzip. (Though gzip could notice the problem and not display a large negative compression ratio.)
The gzip format stores the uncompressed length at the end using four bytes, which can only represent up to 2^32-1. So what you are seeing is the low 32 bits of 18962535424, which is in fact 1782666240. When gzip uses that truncated value to compute a compression ratio, it gets a nonsensical result.
Unfortunately the only way to get the real uncompressed length and compute a real ratio is to decompress the entire file. (In fact, pigz will do this with "pigz -lt", which tests the entire file without storing the result, and reports the correct uncompressed size and compression ratio. "pigz -l" will do the same bad thing that "gzip -l" does on > 4 GB uncompressed sizes, though it will report “unk” for questionable ratios, i.e. expansions of the data beyond what would be expected for incompressible data.)
Mark
> On Oct 31, 2017, at 10:59 AM, Alex Peshkoff <peshkoff <at> mail.ru> wrote:
>
> Before decompressing a copy of database I've decided to take a look at it's size:
>
> localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
> compressed uncompressed ratio uncompressed_name
> 3645968323 1782666240 -104.5% SWHTOROLT_20171019.GBK
>
> uncompressed is reported as 1.7Gb which is definitely something unreal like -104.5 compress ratio
>
> Actual size after unzip is:
>
> localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
> localhost stg # ls -l SWHTOROLT_20171019.GBK
> -rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK
>
> Lickily I've had enough disk space - but let me not attach problematic archive to email, I suppose it's easier to reproduce this locally ;)
>
> Alex.
>
>
>
>
>
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 13 Jan 2022 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 104 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.