GNU bug report logs - #61722
(guix cpio) produces corrupted archives when there are non-ASCII filenames

Previous Next

Package: guix;

Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Date: Thu, 23 Feb 2023 03:15:02 UTC

Severity: normal

Tags: patch

Done: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61722 in the body.
You can then email your comments to 61722 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#61722; Package guix. (Thu, 23 Feb 2023 03:15:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Thu, 23 Feb 2023 03:15:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: bug-guix <bug-guix <at> gnu.org>
Subject: (guix cpio) produces corrupted archives when there are non-ASCII
 filenames
Date: Wed, 22 Feb 2023 22:14:26 -0500
Hi,

It appears that the code we have to generate CPIO archives doesn't
handle the presence of non-ASCII characters in the file names of files
to be archived well:

First, to make rpm usable on a Guix System:

--8<---------------cut here---------------start------------->8---
# mkdir /var/lib/rpm
# chown root:users /var/lib/rpm
# chmod g+rw /var/lib/rpm
--8<---------------cut here---------------end--------------->8---

Then, produce a problematic CPIO via 'guix pack -f rpm', which uses
(guix cpio):

--8<---------------cut here---------------start------------->8---
$ rpm_archive=$(guix pack -R -C none -f rpm nss-certs)
--8<---------------cut here---------------end--------------->8---

Notice that it cannot be installed:
--8<---------------cut here---------------start------------->8---
$ mkdir /tmp/nss-certs
# rpm --prefix=/tmp/nss-certs -i $rpm_archive
error: unpacking of archive failed: cpio: Bad magic
error: nss-certs-3.81-0.x86_64: install failed
--8<---------------cut here---------------end--------------->8---

Let's now inspect the cpio itself.

--8<---------------cut here---------------start------------->8---
$ guix shell rpm cpio
[env]$ rpm2cpio $rpm_archive > nss-certs.cpio
[env]$ cpio -t < nss-certs.cpio |& grep -B3 junk
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9482e63a.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/9846683b.0
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/988a38cb.0
cpio: warning: skipped 248 bytes of junk
--
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/Microsoft_RSA_Root_Certificate_Authority_2017.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NAVER_Global_Root_Certification_Authority.pem
./gnu/store/1klwvqm3njp070h982ydcix1gzf2zmdl-nss-certs-3.81/etc/ssl/certs/NetLock_Arany_=Class_Gold=_Főtanúsítvány.
cpio: warning: skipped 4 bytes of junk
--8<---------------cut here---------------end--------------->8---

I haven't yet pin-pointed what the problem is.

I could do with extra eyes :-).

-- 
Thanks,
Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#61722; Package guix. (Fri, 24 Feb 2023 04:55:02 GMT) Full text and rfc822 format available.

Message #8 received at 61722 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: 61722 <at> debbugs.gnu.org
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, Tobias Geerinckx-Rice <me <at> tobias.gr>,
 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Ludovic Courtès <ludo <at> gnu.org>,
 Christopher Baines <mail <at> cbaines.net>, Ricardo Wurmus <rekado <at> elephly.net>
Subject: [PATCH] cpio: Properly handle Unicode characters in file names.
Date: Thu, 23 Feb 2023 23:54:01 -0500
Fixes <https://issues.guix.gnu.org/61722>.

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

 guix/cpio.scm | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..8fd7552450 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
                       #:size (stat:size st)
                       #:dev (stat:dev st)
                       #:rdev (stat:rdev st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (bytevector-length
+                                   (string->utf8 file-name)))))
 
 (define* (file->cpio-header* file
                              #:optional (file-name file)
@@ -182,7 +183,8 @@ (define* (file->cpio-header* file
     (make-cpio-header #:mode (stat:mode st)
                       #:nlink (stat:nlink st)
                       #:size (stat:size st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (bytevector-length
+                                   (string->utf8 file-name)))))
 
 (define* (special-file->cpio-header* file
                                      device-type
@@ -201,7 +203,8 @@ (define* (special-file->cpio-header* file
                                     permission-bits)
                     #:nlink 1
                     #:rdev (device-number device-major device-minor)
-                    #:name-size (string-length file-name)))
+                    #:name-size (bytevector-length
+                                 (string->utf8 file-name))))
 
 (define %trailer
   "TRAILER!!!")
@@ -237,7 +240,7 @@ (define (dump-file file)
 
       ;; We're padding the header + following file name + trailing zero, and
       ;; the header is 110 byte long.
-      (write-padding (+ 110 1 (string-length file)) port)
+      (write-padding (+ 110 (bytevector-length (string->utf8 file)) 1) port)
 
       (case (mode->type (cpio-header-mode header))
         ((regular)
@@ -246,7 +249,7 @@ (define (dump-file file)
              (dump-port input port))))
         ((symlink)
          (let ((target (readlink file)))
-           (put-string port target)))
+           (put-bytevector port (string->utf8 target))))
         ((directory)
          #t)
         ((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
-- 
2.39.1





Information forwarded to bug-guix <at> gnu.org:
bug#61722; Package guix. (Fri, 24 Feb 2023 11:48:01 GMT) Full text and rfc822 format available.

Message #11 received at 61722 <at> debbugs.gnu.org (full text, mbox):

From: Mark H Weaver <mhw <at> netris.org>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>, 61722 <at> debbugs.gnu.org
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, Christopher Baines <mail <at> cbaines.net>,
 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>,
 Simon Tournier <zimon.toutoune <at> gmail.com>, Mathieu Othacehe <othacehe <at> gnu.org>,
 Ludovic Courtès <ludo <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Ricardo Wurmus <rekado <at> elephly.net>
Subject: Re: bug#61722: [PATCH] cpio: Properly handle Unicode characters in
 file names.
Date: Fri, 24 Feb 2023 06:46:21 -0500
Hi Maxim,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

> Fixes <https://issues.guix.gnu.org/61722>.
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.
>
> ---
>
>  guix/cpio.scm | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/guix/cpio.scm b/guix/cpio.scm
> index d4a7d5f1e0..8fd7552450 100644
> --- a/guix/cpio.scm
> +++ b/guix/cpio.scm
> @@ -170,7 +170,8 @@ (define* (file->cpio-header file #:optional (file-name file)
>                        #:size (stat:size st)
>                        #:dev (stat:dev st)
>                        #:rdev (stat:rdev st)
> -                      #:name-size (string-length file-name))))
> +                      #:name-size (bytevector-length
> +                                   (string->utf8 file-name)))))

(string-utf8-length file-name) would produce the same result more
efficiently.

      Regards,
        Mark




Added tag(s) patch. Request was from Simon Tournier <zimon.toutoune <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 24 Feb 2023 12:11:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#61722; Package guix. (Fri, 24 Feb 2023 13:28:02 GMT) Full text and rfc822 format available.

Message #16 received at 61722 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: 61722 <at> debbugs.gnu.org
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, Tobias Geerinckx-Rice <me <at> tobias.gr>,
 Maxim Cournoyer <maxim.cournoyer <at> gmail.com>,
 Simon Tournier <zimon.toutoune <at> gmail.com>, mhw <at> netris.org,
 Ludovic Courtès <ludo <at> gnu.org>,
 Christopher Baines <mail <at> cbaines.net>, Ricardo Wurmus <rekado <at> elephly.net>,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: [PATCH v2] cpio: Properly handle Unicode characters in file names.
Date: Fri, 24 Feb 2023 08:26:51 -0500
Fixes <https://issues.guix.gnu.org/61722>.

* guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
characters.
(file->cpio-header*, special-file->cpio-header*): Likewise.
(write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
textually, to avoid encoding it as ISO-8859-1.

---

Changes in v2:
- Use string-utf8-length

 guix/cpio.scm | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/guix/cpio.scm b/guix/cpio.scm
index d4a7d5f1e0..876f61ea3c 100644
--- a/guix/cpio.scm
+++ b/guix/cpio.scm
@@ -170,7 +170,7 @@ (define* (file->cpio-header file #:optional (file-name file)
                       #:size (stat:size st)
                       #:dev (stat:dev st)
                       #:rdev (stat:rdev st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (string-utf8-length file-name))))
 
 (define* (file->cpio-header* file
                              #:optional (file-name file)
@@ -182,7 +182,7 @@ (define* (file->cpio-header* file
     (make-cpio-header #:mode (stat:mode st)
                       #:nlink (stat:nlink st)
                       #:size (stat:size st)
-                      #:name-size (string-length file-name))))
+                      #:name-size (string-utf8-length file-name))))
 
 (define* (special-file->cpio-header* file
                                      device-type
@@ -201,7 +201,7 @@ (define* (special-file->cpio-header* file
                                     permission-bits)
                     #:nlink 1
                     #:rdev (device-number device-major device-minor)
-                    #:name-size (string-length file-name)))
+                    #:name-size (string-utf8-length file-name)))
 
 (define %trailer
   "TRAILER!!!")
@@ -237,7 +237,7 @@ (define (dump-file file)
 
       ;; We're padding the header + following file name + trailing zero, and
       ;; the header is 110 byte long.
-      (write-padding (+ 110 1 (string-length file)) port)
+      (write-padding (+ 110 (string-utf8-length file) 1) port)
 
       (case (mode->type (cpio-header-mode header))
         ((regular)
@@ -246,7 +246,7 @@ (define (dump-file file)
              (dump-port input port))))
         ((symlink)
          (let ((target (readlink file)))
-           (put-string port target)))
+           (put-bytevector port (string->utf8 target))))
         ((directory)
          #t)
         ((block-special)

base-commit: c756c62cfdba8d4079be1ba9e370779b850f16b6
-- 
2.39.1





Reply sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
You have taken responsibility. (Sat, 25 Feb 2023 19:53:02 GMT) Full text and rfc822 format available.

Notification sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
bug acknowledged by developer. (Sat, 25 Feb 2023 19:53:02 GMT) Full text and rfc822 format available.

Message #21 received at 61722-done <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: 61722-done <at> debbugs.gnu.org
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, Christopher Baines <mail <at> cbaines.net>,
 Simon Tournier <zimon.toutoune <at> gmail.com>, mhw <at> netris.org,
 Ludovic Courtès <ludo <at> gnu.org>,
 Tobias Geerinckx-Rice <me <at> tobias.gr>, Ricardo Wurmus <rekado <at> elephly.net>,
 Mathieu Othacehe <othacehe <at> gnu.org>
Subject: Re: bug#61722: (guix cpio) produces corrupted archives when there
 are non-ASCII filenames
Date: Sat, 25 Feb 2023 14:52:15 -0500
Hi,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

> Fixes <https://issues.guix.gnu.org/61722>.
>
> * guix/cpio.scm (file->cpio-header): Compute the file name length in bytes rather than in
> characters.
> (file->cpio-header*, special-file->cpio-header*): Likewise.
> (write-cpio-archive): Likewise, and write the file name as UTF-8 bytes, not
> textually, to avoid encoding it as ISO-8859-1.

Pushed to master.

Closing.

-- 
Thanks,
Maxim




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 26 Mar 2023 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 29 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.