GNU bug report logs - #62290
Error when handling invalid unicode with suspendable ports

Previous Next

Package: guile;

Reported by: Christopher Baines <mail <at> cbaines.net>

Date: Mon, 20 Mar 2023 09:13:01 UTC

Severity: normal

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 62290 in the body.
You can then email your comments to 62290 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#62290; Package guile. (Mon, 20 Mar 2023 09:13:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Christopher Baines <mail <at> cbaines.net>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Mon, 20 Mar 2023 09:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: bug-guile <at> gnu.org
Subject: Error when handling invalid unicode with suspendable ports
Date: Mon, 20 Mar 2023 09:09:14 +0000
Here's a simple reproducer:

  (use-modules (ice-9 binary-ports)
               (ice-9 suspendable-ports)
               (rnrs bytevectors))

  (define (test)
    (let* ((sequence
             '(#xf4 #xa4 #xbd #xa4))
           (p (open-bytevector-input-port
               (u8-list->bytevector sequence))))
      (set-port-encoding! p "UTF-8")
      (set-port-conversion-strategy! p 'substitute)
      (peek (read-char p))))

  (test)

  (install-suspendable-ports!)

  (test)


If you run it, it outputs #\� as expected the first time, but then using
suspendable ports, it raises an exception. The behaviour should be the
same.


;;; (#\�)
Backtrace:
In ice-9/boot-9.scm:
  1752:10  8 (with-exception-handler _ _ #:unwind? _ # _)
In unknown file:
           7 (apply-smob/0 #<thunk 7f3f09cbe300>)
In ice-9/boot-9.scm:
    724:2  6 (call-with-prompt ("prompt") #<procedure 7f3f09ccb320 …> …)
In ice-9/eval.scm:
    619:8  5 (_ #(#(#<directory (guile-user) 7f3f09cc1c80>)))
In ice-9/boot-9.scm:
   2836:4  4 (save-module-excursion #<procedure 7f3f09cb2300 at ice-…>)
  4388:12  3 (_)
In /home/chris/Projects/Guile/guile/bad-unicode.scm:
    12:10  2 (test)
In ice-9/suspendable-ports.scm:
   591:33  1 (read-char _)
   499:12  0 (peek-char-and-next-cur/utf8 _ _ _ _)

ice-9/suspendable-ports.scm:499:12: In procedure peek-char-and-next-cur/utf8:
In procedure integer->char: Argument 1 out of range: 1199972




Information forwarded to bug-guile <at> gnu.org:
bug#62290; Package guile. (Mon, 20 Mar 2023 09:16:02 GMT) Full text and rfc822 format available.

Message #8 received at 62290 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Baines <mail <at> cbaines.net>
To: 62290 <at> debbugs.gnu.org
Subject: [PATCH] Fix some invalid unicode handling issues with suspendable
 ports.
Date: Mon, 20 Mar 2023 09:15:13 +0000
Based on the implementation in ports.c.  I don't understand what this
code is really doing, but the suspendable ports implementation differs
from the similar C code for a couple of inequalities.

* module/ice-9/suspendable-ports.scm (decode-utf8, bad-utf8-len): Flip a
couple of inequalities.
* test-suite/tests/ports.test ("string ports"): Add additional invalid
UTF-8 test case.
---
 module/ice-9/suspendable-ports.scm | 8 ++++----
 test-suite/tests/ports.test        | 7 +++++++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/module/ice-9/suspendable-ports.scm b/module/ice-9/suspendable-ports.scm
index a823f1d37..9fac1df62 100644
--- a/module/ice-9/suspendable-ports.scm
+++ b/module/ice-9/suspendable-ports.scm
@@ -419,7 +419,7 @@
                (= (logand u8_2 #xc0) #x80)
                (case u8_0
                  ((#xe0) (>= u8_1 #xa0))
-                 ((#xed) (>= u8_1 #x9f))
+                 ((#xed) (<= u8_1 #x9f))
                  (else #t)))
           (kt (integer->char
                (logior (ash (logand u8_0 #x0f) 12)
@@ -436,7 +436,7 @@
                (= (logand u8_3 #xc0) #x80)
                (case u8_0
                  ((#xf0) (>= u8_1 #x90))
-                 ((#xf4) (>= u8_1 #x8f))
+                 ((#xf4) (<= u8_1 #x8f))
                  (else #t)))
           (kt (integer->char
                (logior (ash (logand u8_0 #x07) 18)
@@ -462,7 +462,7 @@
      ((< buffering 2) 1)
      ((not (= (logand (ref 1) #xc0) #x80)) 1)
      ((and (eq? first-byte #xe0) (< (ref 1) #xa0)) 1)
-     ((and (eq? first-byte #xed) (< (ref 1) #x9f)) 1)
+     ((and (eq? first-byte #xed) (> (ref 1) #x9f)) 1)
      ((< buffering 3) 2)
      ((not (= (logand (ref 2) #xc0) #x80)) 2)
      (else 0)))
@@ -471,7 +471,7 @@
      ((< buffering 2) 1)
      ((not (= (logand (ref 1) #xc0) #x80)) 1)
      ((and (eq? first-byte #xf0) (< (ref 1) #x90)) 1)
-     ((and (eq? first-byte #xf4) (< (ref 1) #x8f)) 1)
+     ((and (eq? first-byte #xf4) (> (ref 1) #x8f)) 1)
      ((< buffering 3) 2)
      ((not (= (logand (ref 2) #xc0) #x80)) 2)
      ((< buffering 4) 3)
diff --git a/test-suite/tests/ports.test b/test-suite/tests/ports.test
index 66e10e3dd..1b30e1a68 100644
--- a/test-suite/tests/ports.test
+++ b/test-suite/tests/ports.test
@@ -1059,6 +1059,13 @@
        eof))
 
     (test-decoding-error (#xf0 #x88 #x88 #x88) "UTF-8"
+      (error                ;; 2nd byte should be in the 90..BF range
+       error                ;; 88: not a valid starting byte
+       error                ;; 88: not a valid starting byte
+       error                ;; 88: not a valid starting byte
+       eof))
+
+    (test-decoding-error (#xf4 #xa4 #xbd #xa4) "UTF-8"
       (error                ;; 2nd byte should be in the 90..BF range
        error                ;; 88: not a valid starting byte
        error                ;; 88: not a valid starting byte
-- 
2.39.1





Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Mon, 20 Mar 2023 22:28:02 GMT) Full text and rfc822 format available.

Notification sent to Christopher Baines <mail <at> cbaines.net>:
bug acknowledged by developer. (Mon, 20 Mar 2023 22:28:02 GMT) Full text and rfc822 format available.

Message #13 received at 62290-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Christopher Baines <mail <at> cbaines.net>
Cc: 62290-done <at> debbugs.gnu.org
Subject: Re: bug#62290: Error when handling invalid unicode with suspendable
 ports
Date: Mon, 20 Mar 2023 23:27:32 +0100
Hello,

Christopher Baines <mail <at> cbaines.net> skribis:

> Based on the implementation in ports.c.  I don't understand what this
> code is really doing, but the suspendable ports implementation differs
> from the similar C code for a couple of inequalities.
>
> * module/ice-9/suspendable-ports.scm (decode-utf8, bad-utf8-len): Flip a
> couple of inequalities.
> * test-suite/tests/ports.test ("string ports"): Add additional invalid
> UTF-8 test case.

Pushed as cba2e7e3fec3c781230570f5d1ef070625eeeda8.

Thanks for documenting the problem and providing a perfect patch!

Ludo’.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 18 Apr 2023 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 6 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.