GNU bug report logs -
#77392
‘regexp-exec’ gets match boundaries wrong for multibyte strings
Previous Next
To reply to this bug, email your comments to 77392 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guile <at> gnu.org
:
bug#77392
; Package
guile
.
(Sun, 30 Mar 2025 20:55:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Ludovic Courtès <ludo <at> gnu.org>
:
New bug report received and forwarded. Copy sent to
bug-guile <at> gnu.org
.
(Sun, 30 Mar 2025 20:55:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
‘regexp-exec’ sometimes gets match boundaries wrong when operating on a
Unicode string but in a C locale (this is with
af96820e072d18c49ac03e80c6f3466d568dc77d):
--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(ice-9 regex)
scheme@(guile-user)> (setlocale LC_ALL "C")
$52 = "C"
scheme@(guile-user)> (string-match "start (.*)"
(string-append "start "
(string (integer->char 1002))))
$53 = #("start \u03ea" (0 . 8) (6 . 8))
scheme@(guile-user)> (match:substring $53 1)
ice-9/boot-9.scm:1683:22: In procedure raise-exception:
Value out of range 6 to< 7: 8
Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
--8<---------------cut here---------------end--------------->8---
The attached program produces more failures at random. (The example
above works well under a UTF-8 locale.)
So I believe ‘fixup_multibyte_match’ isn’t quite correct.
Ludo’.
PS: This originates in <https://issues.guix.gnu.org/77283>.
[regexp-unicode-ascii.scm (text/plain, inline)]
(use-modules (ice-9 regex))
(define rx
(make-regexp "^start (.*)"))
(setlocale LC_ALL "C")
(let loop ()
(let* ((i (+ 256 (random (expt 2 10))))
(str (string-append "start " (string (integer->char i)))))
(with-exception-handler
(lambda (exc)
(pk 'exc exc '<-- i)
(display-backtrace (make-stack #t) (current-error-port))
(exit 1))
(lambda ()
(match:substring (regexp-exec rx str) 1)))
(loop)))
This bug report was last modified 5 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.