GNU bug report logs - #33878
zcat vs zcat -f -- different output

Previous Next

Package: gzip;

Reported by: Namikaze Minato <lloydsensei <at> gmail.com>

Date: Wed, 26 Dec 2018 16:41:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33878 in the body.
You can then email your comments to 33878 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gzip <at> gnu.org:
bug#33878; Package gzip. (Wed, 26 Dec 2018 16:41:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Namikaze Minato <lloydsensei <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gzip <at> gnu.org. (Wed, 26 Dec 2018 16:41:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Namikaze Minato <lloydsensei <at> gmail.com>
To: bug-gzip <at> gnu.org
Subject: zcat vs zcat -f -- different output
Date: Wed, 26 Dec 2018 17:24:14 +0100
Hello guys.

I have a large amount of confidential gzip compressed binary data.
These files _all_ have a very specific property of giving a different
output whether or not I use the "-f" flag of zcat (or gzip -d -c): one
additional line appears when I use the -f flag.

- I don't have the uncompressed versions of these files, nor the
actual tool used to compress them
- I am trying to create a reproducible example but have not yet succeeded

Here is what it looks like, with null bytes replaced by dots for
readability: (sorry for gmail's automatic line wrap, there are of
course only two lines per output)
$ file p.gz: gzip compressed data, was "20181218.TXT", last modified:
Wed Dec 19 08:59:07 2018, from NTFS filesystem (NT)
$ wc -c p.gz
9099264 p.gz
$ zcat p.gz | wc -c
48085600
$ zcat -f p.gz | wc -c
48085955
$ gzip -d -c p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g'
20010101AAAAAAAA 010120010101Q   AA....00A0000000AA0AA0AAA 0A AA 0101
         0012001010101:01T2001012001:0101:01AAAAAAD/S\r$
T000378625.....................
...............................................\r$
$ gzip -d -c -f p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g'
T000378625.....................
...............................................\r$
...................................................................................................................................................................................................................................................................................................................................................................$

That additional line containing only null bytes is not supposed to
appear, is that some kind of padding that was not handled correctly by
gzip?

If this is not yet an identified bug, here are my questions:

Do you know what could be happening?
Do you know how I could try to reproduce the problem on
non-confidential data for you to be able to debug?
(I already tried re-compressing both versions of the decompressed
files with this binary from 2007:
http://gnuwin32.sourceforge.net/packages/gzip.htm but the problem does
not happen)
I can contact the guys who created the files and ask them anything,
but I'd like to be sure of what to ask them because contacting them
repeatedly would be considered very rude. What should I ask them?

Thank you very much in advance for any reply which could make me
understand what is happening :)

Minato
PS: I am not subscribed to the mailing list yet




Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Wed, 26 Dec 2018 18:05:01 GMT) Full text and rfc822 format available.

Notification sent to Namikaze Minato <lloydsensei <at> gmail.com>:
bug acknowledged by developer. (Wed, 26 Dec 2018 18:05:02 GMT) Full text and rfc822 format available.

Message #10 received at 33878-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Namikaze Minato <lloydsensei <at> gmail.com>, 33878-done <at> debbugs.gnu.org
Subject: Re: bug#33878: zcat vs zcat -f -- different output
Date: Wed, 26 Dec 2018 10:03:57 -0800
Namikaze Minato wrote:

> Do you know what could be happening?

When gzip -cdf sees junk input data, it simply copies it to standard output; 
this behavior is documented in the gzip manual (look for --force). Your input 
files have NUL-byte padding at the end, contrary to Internet RFC 1952.

> Do you know how I could try to reproduce the problem on
> non-confidential data for you to be able to debug?

$ (gzip </dev/null; printf '\0') >t.gz
$ gzip -cd <t.gz | od -c
0000000
$ gzip -cdf <t.gz | od -c
0000000  \0
0000001

Though it's not a bug....




Information forwarded to bug-gzip <at> gnu.org:
bug#33878; Package gzip. (Thu, 27 Dec 2018 13:11:01 GMT) Full text and rfc822 format available.

Message #13 received at 33878-done <at> debbugs.gnu.org (full text, mbox):

From: Namikaze Minato <lloydsensei <at> gmail.com>
To: 33878-done <at> debbugs.gnu.org
Subject: Re: bug#33878: zcat vs zcat -f -- different output
Date: Thu, 27 Dec 2018 14:09:37 +0100
Thanks a lot for the explanation!
I checked and my files do contain unexpected NUL-byte trailing!

Have a nice day.
Minato

On Wed, 26 Dec 2018 at 19:03, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> Namikaze Minato wrote:
>
> > Do you know what could be happening?
>
> When gzip -cdf sees junk input data, it simply copies it to standard output;
> this behavior is documented in the gzip manual (look for --force). Your input
> files have NUL-byte padding at the end, contrary to Internet RFC 1952.
>
> > Do you know how I could try to reproduce the problem on
> > non-confidential data for you to be able to debug?
>
> $ (gzip </dev/null; printf '\0') >t.gz
> $ gzip -cd <t.gz | od -c
> 0000000
> $ gzip -cdf <t.gz | od -c
> 0000000  \0
> 0000001
>
> Though it's not a bug....




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 25 Jan 2019 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 92 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.