GNU bug report logs - #29089
Truncated size of big file

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: gzip; Reported by: Alex Peshkoff <peshkoff@HIDDEN>; dated Tue, 31 Oct 2017 18:05:01 UTC; Maintainer for gzip is bug-gzip@HIDDEN.

Message received at 29089 <at> debbugs.gnu.org:


Received: (at 29089) by debbugs.gnu.org; 31 Oct 2017 18:20:39 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 31 14:20:39 2017
Received: from localhost ([127.0.0.1]:44537 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1e9b9T-0002sZ-6p
	for submit <at> debbugs.gnu.org; Tue, 31 Oct 2017 14:20:39 -0400
Received: from mail.alumni.caltech.edu ([131.215.242.114]:39146)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <madler@HIDDEN>) id 1e9b9R-0002sJ-D3
 for 29089 <at> debbugs.gnu.org; Tue, 31 Oct 2017 14:20:38 -0400
Received: from [17.115.232.195] (unknown [17.115.232.195])
 (Authenticated sender: madler)
 by mail.alumni.caltech.edu (Postfix) with ESMTPSA id 46D95120057;
 Tue, 31 Oct 2017 11:20:30 -0700 (PDT)
DKIM-Filter: OpenDKIM Filter v2.11.0 mail.alumni.caltech.edu 46D95120057
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni.caltech.edu;
 s=enforce; t=1509474030;
 bh=muJDECz5YSLfz6QpIBKZqKatEVYUZS6i2RNu+Nsh8/s=;
 h=Subject:From:In-Reply-To:Date:Cc:References:To:From;
 b=YHWRbzMiNGCQLQ3F7oroOu70vIOzhLMHCK+zYiLni3Mu9A/3iv3IhsGTxyPJ4UeTM
 GVAYk60bet+O8xBGeD50V4w+yASBbjxCexn6GG6ZtlaZuGl+n8j4g4E8iggRhkOObM
 b64THckHzB1MJWRa5/ryKYA02lVSLQLdPTH87fgs=
Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: bug#29089: Truncated size of big file
From: Mark Adler <madler@HIDDEN>
In-Reply-To: <dd163c27-4387-c7c0-6be2-28da766baf5c@HIDDEN>
Date: Tue, 31 Oct 2017 11:20:29 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <16601711-B48B-4FBC-A7E7-14EC227388CE@HIDDEN>
References: <dd163c27-4387-c7c0-6be2-28da766baf5c@HIDDEN>
To: Alex Peshkoff <peshkoff@HIDDEN>
X-Mailer: Apple Mail (2.3273)
X-MailScanner-Information-Alumni: 
X-Alumni-MailScanner-ID: 46D95120057.AFA2F
X-MailScanner-Alumni: No Virii found
X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1,
 required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10,
 DKIM_VALID_AU -0.10)
X-MailScanner-From: madler@HIDDEN
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 29089
Cc: 29089 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

Alex,

This is inherent in the gzip format, and is not really a bug in gzip. =
(Though gzip could notice the problem and not display a large negative =
compression ratio.)

The gzip format stores the uncompressed length at the end using four =
bytes, which can only represent up to 2^32-1. So what you are seeing is =
the low 32 bits of 18962535424, which is in fact 1782666240. When gzip =
uses that truncated value to compute a compression ratio, it gets a =
nonsensical result.

Unfortunately the only way to get the real uncompressed length and =
compute a real ratio is to decompress the entire file. (In fact, pigz =
will do this with "pigz -lt", which tests the entire file without =
storing the result, and reports the correct uncompressed size and =
compression ratio. "pigz -l" will do the same bad thing that "gzip -l" =
does on > 4 GB uncompressed sizes, though it will report =E2=80=9Cunk=E2=80=
=9D for questionable ratios, i.e. expansions of the data beyond what =
would be expected for incompressible data.)

Mark


> On Oct 31, 2017, at 10:59 AM, Alex Peshkoff <peshkoff@HIDDEN> wrote:
>=20
> Before decompressing a copy of database I've decided to take a look at =
it's size:
>=20
> localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
>          compressed        uncompressed  ratio uncompressed_name
>          3645968323          1782666240 -104.5% SWHTOROLT_20171019.GBK
>=20
> uncompressed is reported as 1.7Gb which is definitely something unreal =
like -104.5 compress ratio
>=20
> Actual size after unzip is:
>=20
> localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
> localhost stg # ls -l SWHTOROLT_20171019.GBK
> -rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK
>=20
> Lickily I've had enough disk space - but let me not attach problematic =
archive to email, I suppose it's easier to reproduce this locally ;)
>=20
> Alex.
>=20
>=20
>=20
>=20
>=20





Information forwarded to bug-gzip@HIDDEN:
bug#29089; Package gzip. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 31 Oct 2017 18:04:23 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Oct 31 14:04:23 2017
Received: from localhost ([127.0.0.1]:44515 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1e9ati-0002T2-Od
	for submit <at> debbugs.gnu.org; Tue, 31 Oct 2017 14:04:22 -0400
Received: from eggs.gnu.org ([208.118.235.92]:59910)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <peshkoff@HIDDEN>) id 1e9apN-0002Kn-OB
 for submit <at> debbugs.gnu.org; Tue, 31 Oct 2017 13:59:54 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <peshkoff@HIDDEN>) id 1e9apH-0007Tr-Ip
 for submit <at> debbugs.gnu.org; Tue, 31 Oct 2017 13:59:48 -0400
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_40,FREEMAIL_FROM,
 T_DKIM_INVALID autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:39460)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <peshkoff@HIDDEN>) id 1e9apH-0007TZ-CE
 for submit <at> debbugs.gnu.org; Tue, 31 Oct 2017 13:59:47 -0400
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50839)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <peshkoff@HIDDEN>) id 1e9apF-0007cQ-Sg
 for bug-gzip@HIDDEN; Tue, 31 Oct 2017 13:59:47 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <peshkoff@HIDDEN>) id 1e9apA-0007QV-P6
 for bug-gzip@HIDDEN; Tue, 31 Oct 2017 13:59:45 -0400
Received: from smtp49.i.mail.ru ([94.100.177.109]:52154)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <peshkoff@HIDDEN>) id 1e9apA-0007Kh-Ap
 for bug-gzip@HIDDEN; Tue, 31 Oct 2017 13:59:40 -0400
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=mail.ru;
 s=mail2; 
 h=Content-Transfer-Encoding:Content-Type:MIME-Version:Date:Message-ID:Subject:From:To;
 bh=xWy7dmDpBAcZxs4gTvD2JMQYTJ/BiCmdlV8TBXyfMLc=; 
 b=Nmr5ETXwKOMGNQmuIOTc1MwU/EfhFRwc/R14Bku0h8TmQ1xjYwTI0HrZXRcErKhLAOh0Er3tpnuOvhcIAaVse755Sry9WrYuO+jheWSH0IS/2lyjzK9zlthxJApFeLQi6/pt47HK/rwM+1pmnJYGXI1boy2Tcu/HqbPtqkq8nfA=;
Received: by smtp49.i.mail.ru with esmtpa (envelope-from <peshkoff@HIDDEN>)
 id 1e9ap3-0002ng-Vq
 for bug-gzip@HIDDEN; Tue, 31 Oct 2017 20:59:34 +0300
To: bug-gzip@HIDDEN
From: Alex Peshkoff <peshkoff@HIDDEN>
Subject: Truncated size of big file
Message-ID: <dd163c27-4387-c7c0-6be2-28da766baf5c@HIDDEN>
Date: Tue, 31 Oct 2017 20:59:33 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.3.0
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-GB
Authentication-Results: smtp49.i.mail.ru; auth=pass smtp.auth=peshkoff@HIDDEN
 smtp.mailfrom=peshkoff@HIDDEN
X-7FA49CB5: 0D63561A33F958A543C1B584C85F441222EF9B7DEBCED96BF593A7939E0B998A725E5C173C3A84C315AF0D0D4FC4FA3DBEE15BE102E9A750C2546860BDEA057BC4224003CC836476C0CAF46E325F83A50BF2EBBBDD9D6B0F5D41B9178041F3E72623479134186CDE6BA297DBC24807EABDAD6C7F3747799A
X-Mailru-Sender: 4328B98C6DFE3B90D8B585182AAF62D22E11DA63C92638E275584350D7B15BA57B0B9A7C4E99E59BFEA8D0CDE4AE263008335C02508E532CDF6005FC3A0B9B165FEEDEB644C299C0ED14614B50AE0675
X-Mras: OK
X-detected-operating-system: by eggs.gnu.org: Genre and OS details not
 recognized.
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Tue, 31 Oct 2017 14:04:21 -0400
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

Before decompressing a copy of database I've decided to take a look at 
it's size:

localhost stg # gunzip -l SWHTOROLT_20171019.GBK.gz
          compressed        uncompressed  ratio uncompressed_name
          3645968323          1782666240 -104.5% SWHTOROLT_20171019.GBK

uncompressed is reported as 1.7Gb which is definitely something unreal 
like -104.5 compress ratio

Actual size after unzip is:

localhost stg # gunzip SWHTOROLT_20171019.GBK.gz
localhost stg # ls -l SWHTOROLT_20171019.GBK
-rw-r--r-- 1 root root 18962535424 Oct 19 15:59 SWHTOROLT_20171019.GBK

Lickily I've had enough disk space - but let me not attach problematic 
archive to email, I suppose it's easier to reproduce this locally ;)

Alex.






Acknowledgement sent to Alex Peshkoff <peshkoff@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gzip@HIDDEN. Full text available.
Report forwarded to bug-gzip@HIDDEN:
bug#29089; Package gzip. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Mon, 25 Nov 2019 12:00:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.