Received: (at 45148) by debbugs.gnu.org; 9 Dec 2020 20:39:53 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Dec 09 15:39:53 2020 Received: from localhost ([127.0.0.1]:36313 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1kn6Fc-0008La-Rk for submit <at> debbugs.gnu.org; Wed, 09 Dec 2020 15:39:53 -0500 Received: from mout.gmx.net ([212.227.17.21]:39443) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <efanomars@HIDDEN>) id 1kn6Ao-0008D3-SG for 45148 <at> debbugs.gnu.org; Wed, 09 Dec 2020 15:34:55 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1607546088; bh=xTKonwKb9slFXDgXz8UcvxowjQqE5AipZTdRJBbk9jU=; h=X-UI-Sender-Class:From:To:Subject:Date; b=VArO8uok5NEIFVtY0fAnzC8S02qQnGqVCHC5NZKvMkWouk2UfBb/QHPb+Q7+rxrfE wdG1JwnV28rfMLID6/p/m2ZBA5inWv4TjeBWe9Pztm64kzuBFTYourgpk5u+iBVE54 5iQ2iHh0G8LjvZmLEo8YLOwXR7FGvlraAU/9jYj0= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [178.39.94.137] ([178.39.94.137]) by web-mail.gmx.net (3c-app-gmx-bap30.server.lan [172.19.172.100]) (via HTTP); Wed, 9 Dec 2020 21:34:48 +0100 MIME-Version: 1.0 Message-ID: <trinity-b531dd53-8e00-4afb-b424-b2c40be0de83-1607546088783@3c-app-gmx-bap30> From: Stefano Marsili <efanomars@HIDDEN> To: 45148 <at> debbugs.gnu.org Subject: Piping into gzip --list - new version Content-Type: multipart/mixed; boundary=trekuen-a8ca5eaf-dcef-4f87-80ce-c97c77621f6a Date: Wed, 9 Dec 2020 21:34:48 +0100 Importance: normal Sensitivity: Normal X-Priority: 3 X-Provags-ID: V03:K1:PuxbNsMO2WCJJASoP4rTmNVUNg5Fv+8SyWedof+l3l9u6CJ2C3M8gDTj2QjN59FBHMpR1 j3oKMhCD8egtJTKK2WgNnenI0vaHhAv1KHUNorUeKlr5iNL2GKPf5+SxoA/WMeLFsVwHWp7LQjf7 yZTAHM92VyXPTi7fCF6fBhR7MiMoAdfy+Owubh/ZYnmuC/B6DKhdQT3mDrxoCgfCChGqKHSu8ey2 kNA6+k8+yamgkI8hsej/8zoK14HVYGiSeU8Duti17Dzz1nrNSXqlxmLCPofWUOJZL+d00GI0f+ei b0= X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:UbcjcmSpTNs=:0D2BATsb+5DObqXVs4YIAd L84R5Dp1ZMEnh6A0Q6VLoX0StSG0E+6QJ5W4dmObpKNjbFf7hrLdB8dvSwt8SNGqSlnxfayji kNsJLlA/b6jO68k/bvQ5PPIAXYRQrrXI5XNZ369xSdAhAZznSw0yqhZdvGGp2UtWjmeVDD9jF boNPd26YeKtDGdY0aIk+MKiswpY8isVn8+TD3bbqxcjZKQEDlUvgJj/RzIJDTeljhuE5HW4b1 nw/ieJcTqPeRRgFO+XFfvdTdHAHupzCbi0bQXXRrjbzwZElBrYMlVXNhMxXAF/Q096ORCsdAn hbDUlEQ7w+NnQZp76YjAc5CORQa9ggym3CllFBpXHF+hZ5IHgwHY+ewvbC962W5MOAf7llerW ta9HKzZgc6RU7SkxFjbBN/7iCoWlnEIS9jAaUIF6nukHHnY5Av/l7I+YIgplGowZ1g6lL7Snh +o7he92TDfM+To2Ngu0PTpoyEh8lS35n39RH8EtMIw1grTaDfDAAlSroeYDWDd5aT72uu8hU3 5lV9x5oM0JoBboVEa6XQ3xUnG9yPUIhipE62j6fGkyclga+T+eGhgrJl7G2Ep0lw4g0ioMFX9 GRxjbIyw8Trag= X-Spam-Score: -0.7 (/) X-Debbugs-Envelope-To: 45148 X-Mailman-Approved-At: Wed, 09 Dec 2020 15:39:52 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -1.7 (-) --trekuen-a8ca5eaf-dcef-4f87-80ce-c97c77621f6a Content-Type: text/html; charset=UTF-8 <html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>After posting the patch I noticed some statements are superfluous.</div> <div> </div> <div>The modified patch is attached.</div> <div> </div> <div> </div> <div> </div></div></body></html> --trekuen-a8ca5eaf-dcef-4f87-80ce-c97c77621f6a Content-Type: text/x-patch Content-Disposition: attachment; filename=gnubug45148.diff Content-Transfer-Encoding: quoted-printable diff --git a/gzip.c b/gzip.c index cceb420..43cb9ea 100644 =2D-- a/gzip.c +++ b/gzip.c @@ -1514,6 +1514,7 @@ local void do_list(ifd, method) int ifd; /* input file descriptor */ int method; /* compression method */ { + const off_t save_bytes_in =3D bytes_in; ulg crc; /* original crc */ static int first_time =3D 1; static char const *const methods[MAX_METHODS] =3D { @@ -1564,11 +1565,14 @@ local void do_list(ifd, method) if (!RECORD_IO && method =3D=3D DEFLATED && !last_member) { /* Get the crc and uncompressed size for gzip'ed (not zip'ed) fil= es. - * If the lseek fails, we could use read() to get to the end, but - * --list is used to get quick results. - * Use "gunzip < foo.gz | wc -c" to get the uncompressed size if - * you are not concerned about speed. */ + if (insize !=3D INBUFSIZ) { + /* eof: no need to lseek */ + /* assert( insize >=3D 8 ) */ + bytes_in =3D save_bytes_in; + crc =3D LG(inbuf + insize - 8); + bytes_out =3D LG(inbuf + insize - 4); + } else { bytes_in =3D lseek(ifd, (off_t)(-8), SEEK_END); if (bytes_in !=3D -1L) { uch buf[8]; @@ -1578,6 +1582,62 @@ local void do_list(ifd, method) } crc =3D LG(buf); bytes_out =3D LG(buf+4); + } else { + /* assert(insize =3D=3D INBUFSIZ) */ + /* assert((INBUFSIZ % 2) =3D=3D 0) */ + bytes_in =3D save_bytes_in; + const int half_buf_size =3D INBUFSIZ / 2; + /* If present (possibly partially), the last 8 bytes can only + * be in the second half of the inbuf buffer, + * so the next block to read is the first half. */ + ssize_t nread; + uch *buf; + size_t buf_to_read =3D half_buf_size; + int half_idx =3D 0; + errno =3D 0; /* reset lseek error */ + while (1) { + nread =3D read_buffer(ifd, inbuf + half_idx * half_buf_si= ze, buf_to_read); + if (nread =3D=3D 0) { + break; + } + if (nread < 0) { + read_error(); + } + bytes_in +=3D nread; + buf_to_read -=3D nread; + if (buf_to_read =3D=3D 0) { + buf_to_read =3D half_buf_size; + half_idx =3D 1 - half_idx; + } + } + insize =3D half_buf_size - buf_to_read; + if (insize >=3D 8) { + /* All 8 bytes fit in the current half buffer */ + buf =3D inbuf + half_idx * half_buf_size + insize - 8; + } else if (insize =3D=3D 0) { + /* All 8 bytes are in the other half buffer */ + buf =3D inbuf + (1 - half_idx) * half_buf_size + half_buf= _size - 8; + } else { + /* The 8 bytes are partially on the other half buffer */ + if (half_idx =3D=3D 1) { + /* The 8 bytes are contiguous */ + buf =3D inbuf + half_buf_size + insize - 8; + } else { + /* The end of the 8 bytes is at the beginning of the = first half, + * the start of the 8 bytes is at the end of the seco= nd half. + * Let's move them both at the start of the first hal= f. */ + const size_t start_size =3D 8 - insize; + memmove(inbuf + start_size, inbuf, insize); + memcpy(inbuf, inbuf + half_buf_size + half_buf_size -= start_size, start_size); + buf =3D inbuf; + } + } + crc =3D LG(buf); + bytes_out =3D LG(buf+4); + } } } --trekuen-a8ca5eaf-dcef-4f87-80ce-c97c77621f6a--
bug-gzip@HIDDEN
:bug#45148
; Package gzip
.
Full text available.Received: (at submit) by debbugs.gnu.org; 9 Dec 2020 19:37:34 +0000 From debbugs-submit-bounces <at> debbugs.gnu.org Wed Dec 09 14:37:34 2020 Received: from localhost ([127.0.0.1]:36184 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>) id 1kn5HJ-0004du-Qg for submit <at> debbugs.gnu.org; Wed, 09 Dec 2020 14:37:34 -0500 Received: from lists.gnu.org ([209.51.188.17]:50490) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from <efanomars@HIDDEN>) id 1kn4mn-0003o9-27 for submit <at> debbugs.gnu.org; Wed, 09 Dec 2020 14:06:01 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:36594) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <efanomars@HIDDEN>) id 1kn4ml-0003MS-Ct for bug-gzip@HIDDEN; Wed, 09 Dec 2020 14:06:00 -0500 Received: from mout.gmx.net ([212.227.15.15]:33539) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <efanomars@HIDDEN>) id 1kn4me-0000ee-Si for bug-gzip@HIDDEN; Wed, 09 Dec 2020 14:05:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1607540750; bh=0I0bGHK9iOJHggIylTsEGcDlwGEgKjzXGyHSwzEPbiU=; h=X-UI-Sender-Class:From:To:Subject:Date; b=FYEkcSh0YGVVtVFcnWaeX68A5p6XOjt2RGyD41hg6+3aZOlllnTkH7+qQna+mJgms 3lqUsgaQf3aEDaSBAXa9RNSwO1goxO5ySqG1GyEG9jAFQ6JckKfG5cuBStcjdcU4iG u4j+hKJBPAGV5OUf+9DQZ4LOl3nhmn0cm06Q4MYc= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [178.39.94.137] ([178.39.94.137]) by web-mail.gmx.net (3c-app-gmx-bs13.server.lan [172.19.170.65]) (via HTTP); Wed, 9 Dec 2020 20:05:50 +0100 MIME-Version: 1.0 Message-ID: <trinity-45881c75-a670-400e-96e8-38d931a82af3-1607540750674@3c-app-gmx-bs13> From: Stefano Marsili <efanomars@HIDDEN> To: bug-gzip@HIDDEN Subject: Piping into gzip --list - Content-Type: multipart/mixed; boundary=nika-2c6ff152-d945-4fdd-a72e-0b4e655b2691 Date: Wed, 9 Dec 2020 20:05:50 +0100 Importance: normal Sensitivity: Normal X-Priority: 3 X-Provags-ID: V03:K1:i243KCgDPnErVqRnpJ7SY22JgSE+I2IMYTsmS8FwEvjTMSYf0iK74y9ED1FLImBFfU8oO R6YdocRrjyzbYPRCwl/mDd7Dv1AjVsyqVny/DSEpXdljE85fXCjhkOCzMqfJ7zaaicv8IMEpT5b3 YuGLAsDiocvl+7WNlVxqmmbstg1q1DDtALOf3g9hN3iAQSYUgeKVpWVmcBLtbY4jt7WnX2xIBFgV MS3tG3pTCUsC7LDXcRD9Bdr7AIP7AtPEOOvjuJ8C/EZRGOehf0KIivydoel9O+Z37jlpB4RNBbqf /Q= X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:mqeDHCmiE7U=:hReeUpEsu9vZglIkyJYO6n 1fT5xeE/kP/YdOME8bq/t2Sv+F2dLl2XNFkrSbdu0xe6GgxKGa/9O49NZQbtLw4WQ/lZj5m7l OddSTKU3y1JBFqtlIHG2q5jav4yDJW0Os9PW2wM/fjTgUn7XYq5NwDtSpJhifvFDzSVo8W3SP pcb2/d7tg5GB2SlquPoxb3J2Sz85TWWEPj0DAxrw+VOGSElpsAZZ1SrQ6z7Q08vab+fFmF5c2 3g3eWKYRBT+mq2SmQ1Ur1xiR+rGt8DkG0+nBMDDiHzl5jwAE7boJ8b+QZz5wTkiDRMKnFCmOz aqljMjuqsl/esjkdsO2E66NnB2FHfWwOAbFbcyG2WaYAetaEjaO7w5AOiWZN1bP7KjLeFWFqs XcJt4/pa0j+BXIK4NqrbdMS+PNlOFG6bIDCvIb4a9EiDuvKNVV9ZZ51vSxIRp5lLelc/fawuo rjUu4oQD1mX7VEx0n23kTZDR+gBkmNmPtyl7Ml2nZVVhRSCvZnWmmPxB9BRs/bdZS8jxUcYIj CkPb/a3FTBNDHtbmVjygil9wY3FEL5yTXi0A06EIZ0XjC8icSjCFp7urSwJkXCw8mFUjaa79g 4cmXEP+VhjQwY= Received-SPF: pass client-ip=212.227.15.15; envelope-from=efanomars@HIDDEN; helo=mout.gmx.net X-Spam_score_int: -23 X-Spam_score: -2.4 X-Spam_bar: -- X-Spam_report: (-2.4 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Spam-Score: 0.6 (/) X-Debbugs-Envelope-To: submit X-Mailman-Approved-At: Wed, 09 Dec 2020 14:37:32 -0500 X-BeenThere: debbugs-submit <at> debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: <debbugs-submit.debbugs.gnu.org> List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe> List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/> List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org> List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help> List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe> Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org> X-Spam-Score: -2.4 (--) --nika-2c6ff152-d945-4fdd-a72e-0b4e655b2691 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, I wanted to help the Debian project and found the gzip bug report =C2=A0 https://bugs=2Edebian=2Eorg/cgi-bin/bugreport=2Ecgi?bug=3D575884 Despite not being a C programmer I decided to give it a try=2E Since this is the first time I have a patch for an open source project, I thought that maybe I should first ask upstream if it makes sense=2E In the code of gzip=2Ec there is the comment: =C2=A0* If the lseek fails, we could use read() to get to the end, but =C2=A0* --list is used to get quick results=2E =C2=A0* Use "gunzip < foo=2Egz | wc -c" to get the uncompressed size if =C2=A0* you are not concerned about speed=2E Assuming it is correct, the patch does just that, use read() to get to the end=2E After applying the patch and running =C2=A0 $ time cat rnd0=2Ebin=2Egz | gzip -l - on a gzipped 3GB file created with /dev/urandom, the result is =C2=A0=C2=A0 compressed=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 uncompre= ssed=C2=A0 ratio uncompressed_name =C2=A0=C2=A0 3000485948=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 3000000000=C2=A0 -0=2E0% stdout =C2=A0=C2=A0 real=C2=A0=C2=A0 0m0=2E740s =C2=A0=C2=A0 user=C2=A0=C2=A0 0m0=2E013s =C2=A0=C2=A0 sys=C2=A0=C2=A0=C2=A0 0m1=2E134s To me it seems quite fast, but maybe gzip is used with much bigger files and one second is too slow=2E The patch is attached to this e-mail=2E I'd like to know what you think about it=2E Thanks =C2=A0 =C2=A0 --nika-2c6ff152-d945-4fdd-a72e-0b4e655b2691 Content-Type: text/x-patch Content-Disposition: attachment; filename=pipetolist.diff Content-Transfer-Encoding: quoted-printable Index: b/gzip.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =2D-- a/gzip.c +++ b/gzip.c @@ -1727,6 +1727,7 @@ local void do_list(ifd, method) int ifd; /* input file descriptor */ int method; /* compression method */ { + const off_t save_bytes_in =3D bytes_in; ulg crc; /* original crc */ static int first_time =3D 1; static char const *const methods[MAX_METHODS] =3D { @@ -1772,11 +1773,14 @@ local void do_list(ifd, method) if (method =3D=3D DEFLATED && !last_member) { /* Get the crc and uncompressed size for gzip'ed (not zip'ed) fil= es. - * If the lseek fails, we could use read() to get to the end, but - * --list is used to get quick results. - * Use "gunzip < foo.gz | wc -c" to get the uncompressed size if - * you are not concerned about speed. */ + if (insize !=3D INBUFSIZ) { + /* eof: no need to lseek */ + /* assert( insize >=3D 8 ) */ + bytes_in =3D save_bytes_in; + crc =3D LG(inbuf + insize - 8); + bytes_out =3D LG(inbuf + insize - 4); + } else { bytes_in =3D lseek(ifd, (off_t)(-8), SEEK_END); if (bytes_in !=3D -1L) { uch buf[8]; @@ -1786,6 +1790,62 @@ local void do_list(ifd, method) } crc =3D LG(buf); bytes_out =3D LG(buf+4); + } else { + /* assert(insize =3D=3D INBUFSIZ) */ + /* assert((INBUFSIZ % 2) =3D=3D 0) */ + bytes_in =3D save_bytes_in; + const int half_buf_size =3D INBUFSIZ / 2; + /* If present (possibly partially), the last 8 bytes can only + * be in the second half of the inbuf buffer, + * so the next block to read is the first half. */ + ssize_t nread; + uch *buf; + size_t buf_to_read =3D half_buf_size; + int half_idx =3D 0; + errno =3D 0; /* reset lseek error */ + insize =3D 0; + while (1) { + nread =3D read_buffer(ifd, inbuf + half_idx * half_buf_si= ze, buf_to_read); + if (nread =3D=3D 0) { + break; + } + if (nread < 0) { + read_error(); + } + bytes_in +=3D nread; + insize +=3D nread; + buf_to_read -=3D nread; + if (buf_to_read =3D=3D 0) { + buf_to_read =3D half_buf_size; + insize =3D 0; + half_idx =3D 1 - half_idx; + } + } + insize =3D half_buf_size - buf_to_read; + if (insize >=3D 8) { + /* All 8 bytes fit in the current half buffer */ + buf =3D inbuf + half_idx * half_buf_size + insize - 8; + } else if (insize =3D=3D 0) { + /* All 8 bytes are in the other half buffer */ + buf =3D inbuf + (1 - half_idx) * half_buf_size + half_buf= _size - 8; + } else { + /* The 8 bytes are partially on the other half buffer */ + if (half_idx =3D=3D 1) { + /* The 8 bytes are contiguous */ + buf =3D inbuf + half_buf_size + insize - 8; + } else { + /* The end of the 8 bytes is at the beginning of the = first half, + * the start of the 8 bytes is at the end of the seco= nd half. + * Let's move them both at the start of the first hal= f. */ + const size_t start_size =3D 8 - insize; + memmove(inbuf + start_size, inbuf, insize); + memcpy(inbuf, inbuf + half_buf_size + half_buf_size -= start_size, start_size); + buf =3D inbuf; + } + } + crc =3D LG(buf); + bytes_out =3D LG(buf+4); + } } } --nika-2c6ff152-d945-4fdd-a72e-0b4e655b2691--
Stefano Marsili <efanomars@HIDDEN>
:bug-gzip@HIDDEN
.
Full text available.bug-gzip@HIDDEN
:bug#45148
; Package gzip
.
Full text available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997 nCipher Corporation Ltd,
1994-97 Ian Jackson.