GNU bug report logs - #30719
Progressively compressing piped input

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: gzip; Severity: wishlist; Reported by: "Garreau\, Alexandre" <galex-713@HIDDEN>; dated Mon, 5 Mar 2018 21:20:02 UTC; Maintainer for gzip is bug-gzip@HIDDEN.
Severity set to 'wishlist' from 'normal' Request was from Paul Eggert <eggert@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 30719 <at> debbugs.gnu.org:


Received: (at 30719) by debbugs.gnu.org; 7 Mar 2018 02:12:43 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 06 21:12:43 2018
Received: from localhost ([127.0.0.1]:48296 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1etOZM-0005Vg-UB
	for submit <at> debbugs.gnu.org; Tue, 06 Mar 2018 21:12:43 -0500
Received: from mail.alumni.caltech.edu ([131.215.242.114]:43198)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <madler@HIDDEN>) id 1etOZK-0005VP-3p
 for 30719 <at> debbugs.gnu.org; Tue, 06 Mar 2018 21:12:38 -0500
Received: from [17.115.235.87] (unknown [17.115.235.87])
 (Authenticated sender: madler)
 by mail.alumni.caltech.edu (Postfix) with ESMTPSA id 2EBC2106AE0E;
 Tue,  6 Mar 2018 18:11:53 -0800 (PST)
DKIM-Filter: OpenDKIM Filter v2.11.0 mail.alumni.caltech.edu 2EBC2106AE0E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni.caltech.edu;
 s=enforce; t=1520388713;
 bh=dL1up2Ont/j7+loczcZq+AMOFpkK9wxcvIRDPv4aLeo=;
 h=Subject:From:In-Reply-To:Date:Cc:References:To:From;
 b=XHk9D8Vfk0CnHWBSyoe6u2MZHmam6Q8FNCEQj3kOUEd1BKfta9PSIZ3pcmFAw/+To
 FGwR9t+LdghPx6NEf44M6j8Uh67d5unorQqe0Wg3z7BqOXpTDwBNJXH6Sr/qNmAaZc
 WJRq6f01bapoketsOYqLgfVO2TFSZ3Lp4r4p1k6o=
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Subject: Re: bug#30719: Progressively compressing piped input
From: Mark Adler <madler@HIDDEN>
In-Reply-To: <3hbuveh2vhln.fbs.xxuns.g6.gal@HIDDEN>
Date: Tue, 6 Mar 2018 18:11:51 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <00FE6CBA-74BC-43E3-A120-F44951F87AF7@HIDDEN>
References: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN>
 <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN>
 <3hbuveh2vhln.fbs.xxuns.g6.gal@HIDDEN>
To: "Garreau, Alexandre" <galex-713@HIDDEN>
X-Mailer: Apple Mail (2.3445.5.20)
X-MailScanner-Information-Alumni: 
X-Alumni-MailScanner-ID: 2EBC2106AE0E.AF72E
X-MailScanner-Alumni: No Virii found
X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.1,
 required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10,
 DKIM_VALID_AU -0.10)
X-MailScanner-From: madler@HIDDEN
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 30719
Cc: 30719 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)

> On Mar 6, 2018, at 1:58 PM, Garreau, Alexandre =
<galex-713@HIDDEN> wrote:
>=20
> Le 05/03/2018 =C3=A0 14h54, Mark Adler a =C3=A9crit :
>> deflate has an inherent latency that accumulates enough data in order
>> to efficiently emit each deflate block. You can deliberately flush
>> (with zlib, not gzip), but if you do that too frequently, e.g. each
>> line, then you will get lousy compression or even expansion.
>=20
> Even if the main repetition is being between the lines? like if 80% of
> half the line, and 70% of the other half lines are the same? like in a
> while loop with only ping and date? I thought to it as a very lazy way
> of not having to remove all the redundant output caused by the usage =
of
> ascii, the repetition of words or similar patterns occuring ever and
> ever.


Alexandre,

It has nothing to do with how much or how little or how often there is =
repetition. It has to do with the overhead of the header of a dynamic =
block that is required to describe the Huffman codes used therein. You =
need several thousand symbols in order to pay for the bits required for =
the header.

Mark





Information forwarded to bug-gzip@HIDDEN:
bug#30719; Package gzip. Full text available.

Message received at 30719 <at> debbugs.gnu.org:


Received: (at 30719) by debbugs.gnu.org; 6 Mar 2018 22:07:17 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Mar 06 17:07:17 2018
Received: from localhost ([127.0.0.1]:48144 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1etKjr-0006Bf-Bp
	for submit <at> debbugs.gnu.org; Tue, 06 Mar 2018 17:07:17 -0500
Received: from [78.192.124.148] (port=44300 helo=galex-713.eu)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <galex-713@HIDDEN>) id 1etKcA-0005ya-R9
 for 30719 <at> debbugs.gnu.org; Tue, 06 Mar 2018 16:59:19 -0500
Received: from PC713 (unknown [37.170.173.92])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 (Authenticated sender: galex-713)
 by galex-713.eu (Postfix) with ESMTPSA id 7941515F5BF;
 Tue,  6 Mar 2018 22:59:09 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=galex-713.eu; s=dkim;
 t=1520373552; bh=wSaTb7XskTHTy3FjO1uzCIT1KBK1DBP3HWJ6yw3/wOc=;
 h=From:To:Cc:Subject:References:Date:In-Reply-To:From;
 b=Wfi+erSwTIo/xLTQrlQ5sl9R/f8xzLZUnv9Sk6o0e45CzvDdRUPUpL700M0Q7D6/t
 /kZ+iq1ni7r7zvuIcL0IZDQKDw4ozjPMPxbJ+k2DYA60tAlb5QkA9TgKlRr8mNV/ib
 +5gx/qBxuui6+29XEyfqyr88+ScnA4tYLydez7zw=
From: "Garreau\, Alexandre" <galex-713@HIDDEN>
To: Mark Adler <madler@HIDDEN>
Subject: Re: bug#30719: Progressively compressing piped input
References: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN>
 <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN>
User-Agent: Gnus (5.13), GNU Emacs 25.1.1 (x86_64-pc-linux-gnu)
X-GPG-FINGERPRINT: E109 9988 4197 D7CB B0BC 5C23 8DEB 24BA 867D 3F7F
X-Accept-Language: fr, en, it, eo
Date: Tue, 06 Mar 2018 22:58:56 +0100
In-Reply-To: <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN> (Mark
 Adler's message of "Mon, 5 Mar 2018 14:54:21 -0800")
Message-ID: <3hbuveh2vhln.fbs.xxuns.g6.gal@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Score: 1.3 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 
 Content preview:  Le 05/03/2018 à 14h54, Mark Adler a écrit : > deflate has
    an inherent latency that accumulates enough data in order > to efficiently
    emit each deflate block. You can deliberately flush > (with zlib, not gzip),
    but if you do that too frequently, e.g. each > line, then you will get lousy
    compression or even expansion. [...] 
 
 Content analysis details:   (1.3 points, 10.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.0 SPF_PASS               SPF: sender matches SPF record
 -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
  1.3 RDNS_NONE              Delivered to internal network by a host with no rDNS
  0.0 T_DKIM_INVALID         DKIM-Signature header exists but is not valid
X-Debbugs-Envelope-To: 30719
X-Mailman-Approved-At: Tue, 06 Mar 2018 17:07:14 -0500
Cc: 30719 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: 1.3 (+)
X-Spam-Report: Spam detection software, running on the system "debbugs.gnu.org",
 has NOT identified this incoming email as spam.  The original
 message has been attached to this so you can view it or label
 similar future email.  If you have any questions, see
 the administrator of that system for details.
 
 Content preview:  Le 05/03/2018 à 14h54, Mark Adler a écrit : > deflate has
    an inherent latency that accumulates enough data in order > to efficiently
    emit each deflate block. You can deliberately flush > (with zlib, not gzip),
    but if you do that too frequently, e.g. each > line, then you will get lousy
    compression or even expansion. [...] 
 
 Content analysis details:   (1.3 points, 10.0 required)
 
  pts rule name              description
 ---- ---------------------- --------------------------------------------------
 -0.0 SPF_PASS               SPF: sender matches SPF record
 -0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
  1.3 RDNS_NONE              Delivered to internal network by a host with no rDNS
  0.0 T_DKIM_INVALID         DKIM-Signature header exists but is not valid

Le 05/03/2018 =C3=A0 14h54, Mark Adler a =C3=A9crit=C2=A0:
> deflate has an inherent latency that accumulates enough data in order
> to efficiently emit each deflate block. You can deliberately flush
> (with zlib, not gzip), but if you do that too frequently, e.g. each
> line, then you will get lousy compression or even expansion.

Even if the main repetition is being between the lines? like if 80% of
half the line, and 70% of the other half lines are the same? like in a
while loop with only ping and date? I thought to it as a very lazy way
of not having to remove all the redundant output caused by the usage of
ascii, the repetition of words or similar patterns occuring ever and
ever.

> I wrote something called gzlog
> (https://github.com/madler/zlib/blob/master/examples/gzlog.h
> <https://github.com/madler/zlib/blob/master/examples/gzlog.h>),
> intended to solve this problem. It can take a small amount of input,
> e.g. a line, and update the output gzip file to be complete and valid
> after each line, yet also get good compression in the long run. It
> does this by writing the lines to the log.gz file effectively
> uncompressed (deflate has a =E2=80=9Cstored=E2=80=9D block type), until i=
t has
> accumulated, say, 1 MB of data. Then it goes back and compresses that
> uncompressed 1 MB, again always leaving the gzip file in a valid
> state. gzlog also maintains something like a journal, which allows
> gzlog to repair the gzip file if the last operation was interrupted,
> e.g. by a power failure.

I rather searched some tool that could be used as an utility (since
that=E2=80=99s for a dirty high-level low-frequency medium-term task) rather
than a C thing, yet that=E2=80=99s quite interesting at least in demonstrat=
ing
the flexibility of gzip=E2=80=A6

>> #!/bin/bash
>> while ping -c1 gnu.org ; do
>>    date --rfc-3339=3Dseconds
>>    sleep 30
>> done | gzip -9 -f | tee sample.log | zcat

maybe the only way to go is just gzipping everything each time a log is
rotated like the standard way, if that pipe thing cannot be done even
with each line being almost the same=E2=80=A6




Information forwarded to bug-gzip@HIDDEN:
bug#30719; Package gzip. Full text available.

Message received at 30719 <at> debbugs.gnu.org:


Received: (at 30719) by debbugs.gnu.org; 5 Mar 2018 22:54:54 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 05 17:54:54 2018
Received: from localhost ([127.0.0.1]:46412 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1esz0P-0001mb-Jk
	for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 17:54:54 -0500
Received: from mail.alumni.caltech.edu ([131.215.242.114]:5679)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <madler@HIDDEN>) id 1esz0M-0001mK-OH
 for 30719 <at> debbugs.gnu.org; Mon, 05 Mar 2018 17:54:51 -0500
Received: from [17.115.236.2] (unknown [17.115.236.2])
 (Authenticated sender: madler)
 by mail.alumni.caltech.edu (Postfix) with ESMTPSA id B2E3E10674E1;
 Mon,  5 Mar 2018 14:54:22 -0800 (PST)
DKIM-Filter: OpenDKIM Filter v2.11.0 mail.alumni.caltech.edu B2E3E10674E1
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=alumni.caltech.edu;
 s=enforce; t=1520290462;
 bh=7djQ16kgLl/xwbq0pZLUcBI/A5Nn2ZsMXT0enG7oZ3A=;
 h=Subject:From:In-Reply-To:Date:Cc:References:To:From;
 b=V5fhIkPRgqFMpUXW7jXxOdx8H6Im12CPV+krpX6Gvtl0wXLLpHSTU8hhIz1dgFLGF
 ZPp3HHbIQC2rdr8MR2J9DwdpUyFjDzRuvHZgtZYEZjNVRrbMfxykxpgmNKveoZipKN
 2EYSYxQWglxy3JGdGn11V8ml45RBLyelj/MTg4c8=
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F"
Mime-Version: 1.0 (Mac OS X Mail 11.2 \(3445.5.20\))
Subject: Re: bug#30719: Progressively compressing piped input
From: Mark Adler <madler@HIDDEN>
In-Reply-To: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN>
Date: Mon, 5 Mar 2018 14:54:21 -0800
Message-Id: <54783A3B-7CB5-4CCB-BD3A-1828894750D4@HIDDEN>
References: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN>
To: "Garreau, Alexandre" <galex-713@HIDDEN>
X-Mailer: Apple Mail (2.3445.5.20)
X-MailScanner-Information-Alumni: 
X-Alumni-MailScanner-ID: B2E3E10674E1.AEB30
X-MailScanner-Alumni: No Virii found
X-Spam-Status-Alumni: not spam, SpamAssassin (not cached, score=-1.099,
 required 5, ALL_TRUSTED -1.00, DKIM_SIGNED 0.10, DKIM_VALID -0.10,
 DKIM_VALID_AU -0.10, HTML_MESSAGE 0.00)
X-MailScanner-From: madler@HIDDEN
X-Spam-Score: -2.3 (--)
X-Debbugs-Envelope-To: 30719
Cc: 30719 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.3 (--)


--Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

deflate has an inherent latency that accumulates enough data in order to =
efficiently emit each deflate block. You can deliberately flush (with =
zlib, not gzip), but if you do that too frequently, e.g. each line, then =
you will get lousy compression or even expansion.

I wrote something called gzlog =
(https://github.com/madler/zlib/blob/master/examples/gzlog.h =
<https://github.com/madler/zlib/blob/master/examples/gzlog.h>), intended =
to solve this problem. It can take a small amount of input, e.g. a line, =
and update the output gzip file to be complete and valid after each =
line, yet also get good compression in the long run. It does this by =
writing the lines to the log.gz file effectively uncompressed (deflate =
has a =E2=80=9Cstored=E2=80=9D block type), until it has accumulated, =
say, 1 MB of data. Then it goes back and compresses that uncompressed 1 =
MB, again always leaving the gzip file in a valid state. gzlog also =
maintains something like a journal, which allows gzlog to repair the =
gzip file if the last operation was interrupted, e.g. by a power =
failure.

> On Mar 5, 2018, at 1:18 PM, Garreau, Alexandre =
<galex-713@HIDDEN> wrote:
>=20
> Hi,
>=20
> I have a script which has a logged very repetitive textual output
> (mostly output of ping and date). To minimize disk usage, I thought to
> pipe it to gzip -9. Then I realized the log, contrarily to before,
> remained empty, and recalled the GNU policy of =E2=80=9Creading all =
input and
> only then outputting=E2=80=9D to maximize overall speed at the expense =
of the
> decreasingly expensive memory.
>=20
> Yet I want to run that script all the time and being able to dirtily
> killing it or just shutdown, without loosing all its output (nor am I
> sure anyway it is a good practice of keeping everything in ram until
> shutdown, considering I suppose gzip only keeps the compressed output =
in
> memory anyway, discarding the then useless input), and =E2=80=9Ctail =
-f=E2=80=9D-ing the
> files it writes.
>=20
> I guess piping the whole output is the way to go to achieve optimal
> compression, since otherwise just gzipping each line/command output
> wouldn=E2=80=99t compress as much (since anyway the repetition occurs =
among the
> lines, not inside them). Yet would there be a way to obtain this =
maximal
> compression, while having gzip outputing each time I stop giving it
> input (has I do every 30 seconds or so), without having to save the
> uncompressed file, nor recompressing the whole file several times?
>=20
> I mean, it seems to me a good thing to wait everything is compressed
> before to output, rather than outputing as soon as possible, but =
isn=E2=80=99t
> there a way to trigger the output each time it has been processed and
> there=E2=80=99s no more input for a certain amount of time (that is =
~30s)?
>=20
> Am I looking at something like this:
> #!/bin/bash
> while ping -c1 gnu.org ; do
>    date --rfc-3339=3Dseconds
>    sleep 30
> done | gzip -9 -f | tee sample.log | zcat


--Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html; =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; line-break: after-white-space;" =
class=3D"">deflate has an inherent latency that accumulates enough data =
in order to efficiently emit each deflate block. You can deliberately =
flush (with zlib, not gzip), but if you do that too frequently, e.g. =
each line, then you will get lousy compression or even expansion.<div =
class=3D""><br class=3D""></div><div class=3D"">I wrote something called =
gzlog (<a =
href=3D"https://github.com/madler/zlib/blob/master/examples/gzlog.h" =
class=3D"">https://github.com/madler/zlib/blob/master/examples/gzlog.h</a>=
), intended to solve this problem. It can take a small amount of input, =
e.g. a line, and update the output gzip file to be complete and valid =
after each line, yet also get good compression in the long run. It does =
this by writing the lines to the log.gz file effectively uncompressed =
(deflate has a =E2=80=9Cstored=E2=80=9D block type), until it has =
accumulated, say, 1 MB of data. Then it goes back and compresses that =
uncompressed 1 MB, again always leaving the gzip file in a valid state. =
gzlog also maintains something like a journal, which allows gzlog to =
repair the gzip file if the last operation was interrupted, e.g. by a =
power failure.<br class=3D""><div><br class=3D""><blockquote type=3D"cite"=
 class=3D""><div class=3D"">On Mar 5, 2018, at 1:18 PM, Garreau, =
Alexandre &lt;<a href=3D"mailto:galex-713@HIDDEN" =
class=3D"">galex-713@HIDDEN</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D"">Hi,<br class=3D""><br =
class=3D"">I have a script which has a logged very repetitive textual =
output<br class=3D"">(mostly output of ping and date). To minimize disk =
usage, I thought to<br class=3D"">pipe it to gzip -9. Then I realized =
the log, contrarily to before,<br class=3D"">remained empty, and =
recalled the GNU policy of =E2=80=9Creading all input and<br =
class=3D"">only then outputting=E2=80=9D to maximize overall speed at =
the expense of the<br class=3D"">decreasingly expensive memory.<br =
class=3D""><br class=3D"">Yet I want to run that script all the time and =
being able to dirtily<br class=3D"">killing it or just shutdown, without =
loosing all its output (nor am I<br class=3D"">sure anyway it is a good =
practice of keeping everything in ram until<br class=3D"">shutdown, =
considering I suppose gzip only keeps the compressed output in<br =
class=3D"">memory anyway, discarding the then useless input), and =
=E2=80=9Ctail -f=E2=80=9D-ing the<br class=3D"">files it writes.<br =
class=3D""><br class=3D"">I guess piping the whole output is the way to =
go to achieve optimal<br class=3D"">compression, since otherwise just =
gzipping each line/command output<br class=3D"">wouldn=E2=80=99t =
compress as much (since anyway the repetition occurs among the<br =
class=3D"">lines, not inside them). Yet would there be a way to obtain =
this maximal<br class=3D"">compression, while having gzip outputing each =
time I stop giving it<br class=3D"">input (has I do every 30 seconds or =
so), without having to save the<br class=3D"">uncompressed file, nor =
recompressing the whole file several times?<br class=3D""><br class=3D"">I=
 mean, it seems to me a good thing to wait everything is compressed<br =
class=3D"">before to output, rather than outputing as soon as possible, =
but isn=E2=80=99t<br class=3D"">there a way to trigger the output each =
time it has been processed and<br class=3D"">there=E2=80=99s no more =
input for a certain amount of time (that is ~30s)?<br class=3D""><br =
class=3D"">Am I looking at something like this:<br =
class=3D"">#!/bin/bash<br class=3D"">while ping -c1 <a =
href=3D"http://gnu.org" class=3D"">gnu.org</a> ; do<br class=3D""> =
&nbsp;&nbsp;&nbsp;date --rfc-3339=3Dseconds<br class=3D""> =
&nbsp;&nbsp;&nbsp;sleep 30<br class=3D"">done | gzip -9 -f | tee =
sample.log | zcat<br class=3D""></div></blockquote></div><br =
class=3D""></div></body></html>=

--Apple-Mail=_4E2713BB-B797-4685-9CB3-962C21B3388F--




Information forwarded to bug-gzip@HIDDEN:
bug#30719; Package gzip. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 5 Mar 2018 21:19:26 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Mar 05 16:19:26 2018
Received: from localhost ([127.0.0.1]:46323 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1esxW0-00060K-RM
	for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:25 -0500
Received: from eggs.gnu.org ([208.118.235.92]:57897)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <galex-713@HIDDEN>) id 1esxVo-0005zf-UW
 for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:13 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <galex-713@HIDDEN>) id 1esxVi-0001Ec-De
 for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:07 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_50,T_DKIM_INVALID
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:41979)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <galex-713@HIDDEN>)
 id 1esxVi-0001EY-8u
 for submit <at> debbugs.gnu.org; Mon, 05 Mar 2018 16:19:06 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:48811)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <galex-713@HIDDEN>) id 1esxVh-0003sX-04
 for bug-gzip@HIDDEN; Mon, 05 Mar 2018 16:19:06 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <galex-713@HIDDEN>) id 1esxVd-0001Bo-Pm
 for bug-gzip@HIDDEN; Mon, 05 Mar 2018 16:19:04 -0500
Received: from [2a01:e34:ec07:c940:20f:feff:fe1d:bfc] (port=58405
 helo=galex-713.eu)
 by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <galex-713@HIDDEN>)
 id 1esxVc-00019o-V8
 for bug-gzip@HIDDEN; Mon, 05 Mar 2018 16:19:01 -0500
Received: from PC713 (unknown [37.171.183.80])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 (Authenticated sender: galex-713)
 by galex-713.eu (Postfix) with ESMTPSA id 1D13B15F5CF
 for <bug-gzip@HIDDEN>; Mon,  5 Mar 2018 22:18:56 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=galex-713.eu; s=dkim;
 t=1520284736; bh=wl6XlXxWWnJosGcMNs3OOgKlYd+FF6L+HDF/f2QOzK4=;
 h=From:To:Subject:Date:From;
 b=aaKZHPb4wNxMusK3nw7Si91CL1Atl4/wQFS1UcSunSt0Ntlqq6md89jz8/Uuwkp7l
 BxrsaA64omIM8YFjmcrVLVYXgqDsYH9INhxD/yFx2mSm8SImSsN7us8PM/qxfPmmpm
 yOOtasD83Fcx/gvGtTzkBuy4da7SBzdXcVG7V5v8=
From: "Garreau\, Alexandre" <galex-713@HIDDEN>
To: bug-gzip@HIDDEN
Subject: Progressively compressing piped input
User-Agent: Gnus (5.13), GNU Emacs 25.1.1 (x86_64-pc-linux-gnu)
X-GPG-FINGERPRINT: E109 9988 4197 D7CB B0BC 5C23 8DEB 24BA 867D 3F7F
X-Accept-Language: fr, en, it, eo
Date: Mon, 05 Mar 2018 22:18:53 +0100
Message-ID: <ve1y9f9vsiln.46t.xxuns.g6.gal@HIDDEN>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="=-=-="
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -4.0 (----)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Mon, 05 Mar 2018 16:19:23 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -4.0 (----)

--=-=-=
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi,

I have a script which has a logged very repetitive textual output
(mostly output of ping and date). To minimize disk usage, I thought to
pipe it to gzip -9. Then I realized the log, contrarily to before,
remained empty, and recalled the GNU policy of =E2=80=9Creading all input a=
nd
only then outputting=E2=80=9D to maximize overall speed at the expense of t=
he
decreasingly expensive memory.

Yet I want to run that script all the time and being able to dirtily
killing it or just shutdown, without loosing all its output (nor am I
sure anyway it is a good practice of keeping everything in ram until
shutdown, considering I suppose gzip only keeps the compressed output in
memory anyway, discarding the then useless input), and =E2=80=9Ctail -f=E2=
=80=9D-ing the
files it writes.

I guess piping the whole output is the way to go to achieve optimal
compression, since otherwise just gzipping each line/command output
wouldn=E2=80=99t compress as much (since anyway the repetition occurs among=
 the
lines, not inside them). Yet would there be a way to obtain this maximal
compression, while having gzip outputing each time I stop giving it
input (has I do every 30 seconds or so), without having to save the
uncompressed file, nor recompressing the whole file several times?

I mean, it seems to me a good thing to wait everything is compressed
before to output, rather than outputing as soon as possible, but isn=E2=80=
=99t
there a way to trigger the output each time it has been processed and
there=E2=80=99s no more input for a certain amount of time (that is ~30s)?

Am I looking at something like this:

--=-=-=
Content-Type: text/x-sh
Content-Disposition: inline; filename=sample.sh
Content-Description: An example of what am I trying to do, where
 =?utf-8?Q?I=E2=80=99d?= like regular output

#!/bin/bash
while ping -c1 gnu.org ; do
    date --rfc-3339=seconds
    sleep 30
done | gzip -9 -f | tee sample.log | zcat

--=-=-=--




Acknowledgement sent to "Garreau\, Alexandre" <galex-713@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-gzip@HIDDEN. Full text available.
Report forwarded to bug-gzip@HIDDEN:
bug#30719; Package gzip. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Wed, 30 Mar 2022 18:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.