GNU bug report logs -
#33878
zcat vs zcat -f -- different output
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33878 in the body.
You can then email your comments to 33878 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gzip <at> gnu.org
:
bug#33878
; Package
gzip
.
(Wed, 26 Dec 2018 16:41:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Namikaze Minato <lloydsensei <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-gzip <at> gnu.org
.
(Wed, 26 Dec 2018 16:41:01 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello guys.
I have a large amount of confidential gzip compressed binary data.
These files _all_ have a very specific property of giving a different
output whether or not I use the "-f" flag of zcat (or gzip -d -c): one
additional line appears when I use the -f flag.
- I don't have the uncompressed versions of these files, nor the
actual tool used to compress them
- I am trying to create a reproducible example but have not yet succeeded
Here is what it looks like, with null bytes replaced by dots for
readability: (sorry for gmail's automatic line wrap, there are of
course only two lines per output)
$ file p.gz: gzip compressed data, was "20181218.TXT", last modified:
Wed Dec 19 08:59:07 2018, from NTFS filesystem (NT)
$ wc -c p.gz
9099264 p.gz
$ zcat p.gz | wc -c
48085600
$ zcat -f p.gz | wc -c
48085955
$ gzip -d -c p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g'
20010101AAAAAAAA 010120010101Q AA....00A0000000AA0AA0AAA 0A AA 0101
0012001010101:01T2001012001:0101:01AAAAAAD/S\r$
T000378625.....................
...............................................\r$
$ gzip -d -c -f p.gz | tail -2 | sed -n 'l 0' | sed 's/\\000/./g'
T000378625.....................
...............................................\r$
...................................................................................................................................................................................................................................................................................................................................................................$
That additional line containing only null bytes is not supposed to
appear, is that some kind of padding that was not handled correctly by
gzip?
If this is not yet an identified bug, here are my questions:
Do you know what could be happening?
Do you know how I could try to reproduce the problem on
non-confidential data for you to be able to debug?
(I already tried re-compressing both versions of the decompressed
files with this binary from 2007:
http://gnuwin32.sourceforge.net/packages/gzip.htm but the problem does
not happen)
I can contact the guys who created the files and ask them anything,
but I'd like to be sure of what to ask them because contacting them
repeatedly would be considered very rude. What should I ask them?
Thank you very much in advance for any reply which could make me
understand what is happening :)
Minato
PS: I am not subscribed to the mailing list yet
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Wed, 26 Dec 2018 18:05:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Namikaze Minato <lloydsensei <at> gmail.com>
:
bug acknowledged by developer.
(Wed, 26 Dec 2018 18:05:02 GMT)
Full text and
rfc822 format available.
Message #10 received at 33878-done <at> debbugs.gnu.org (full text, mbox):
Namikaze Minato wrote:
> Do you know what could be happening?
When gzip -cdf sees junk input data, it simply copies it to standard output;
this behavior is documented in the gzip manual (look for --force). Your input
files have NUL-byte padding at the end, contrary to Internet RFC 1952.
> Do you know how I could try to reproduce the problem on
> non-confidential data for you to be able to debug?
$ (gzip </dev/null; printf '\0') >t.gz
$ gzip -cd <t.gz | od -c
0000000
$ gzip -cdf <t.gz | od -c
0000000 \0
0000001
Though it's not a bug....
Information forwarded
to
bug-gzip <at> gnu.org
:
bug#33878
; Package
gzip
.
(Thu, 27 Dec 2018 13:11:01 GMT)
Full text and
rfc822 format available.
Message #13 received at 33878-done <at> debbugs.gnu.org (full text, mbox):
Thanks a lot for the explanation!
I checked and my files do contain unexpected NUL-byte trailing!
Have a nice day.
Minato
On Wed, 26 Dec 2018 at 19:03, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> Namikaze Minato wrote:
>
> > Do you know what could be happening?
>
> When gzip -cdf sees junk input data, it simply copies it to standard output;
> this behavior is documented in the gzip manual (look for --force). Your input
> files have NUL-byte padding at the end, contrary to Internet RFC 1952.
>
> > Do you know how I could try to reproduce the problem on
> > non-confidential data for you to be able to debug?
>
> $ (gzip </dev/null; printf '\0') >t.gz
> $ gzip -cd <t.gz | od -c
> 0000000
> $ gzip -cdf <t.gz | od -c
> 0000000 \0
> 0000001
>
> Though it's not a bug....
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 25 Jan 2019 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 5 years and 92 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.