GNU bug report logs -
#78690
31.0.50; split string: args out of range with TRIM
Previous Next
To reply to this bug, email your comments to 78690 AT debbugs.gnu.org.
There is no need to reopen the bug first.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Wed, 04 Jun 2025 02:35:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Michael Heerdegen <michael_heerdegen <at> web.de>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Wed, 04 Jun 2025 02:35:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hello,
I stumbled across this:
#+begin_src emacs-lisp
(split-string
"-*- lexical-binding: t; -*-"
"-\\*-" nil "[ \t\n\r-]+")
#+end_src
~~>
| Debugger entered--Lisp error: (args-out-of-range "-*- lexical-binding: t; -*-" 1 0)
| (substring "-*- lexical-binding: t; -*-" 1 0)
| (let ((this (substring string this-start this-end))) (if trim (progn (let ((tem (string-match (concat trim "\\'") this 0))) (and tem (< tem (length this)) (setq this (substring this 0 tem)))))) (if (or keep-nulls (> (length this) 0)) (progn (setq list (cons this list)))))
| (progn (let ((this (substring string this-start this-end))) (if trim (progn (let ((tem (string-match ... this 0))) (and tem (< tem (length this)) (setq this (substring this 0 tem)))))) (if (or keep-nulls (> (length this) 0)) (progn (setq list (cons this list))))))
| (if (or keep-nulls (< this-start this-end)) (progn (let ((this (substring string this-start this-end))) (if trim (progn (let ((tem ...)) (and tem (< tem ...) (setq this ...))))) (if (or keep-nulls (> (length this) 0)) (progn (setq list (cons this list)))))))
| (#f(lambda () [(list nil) (this-end 0) (this-start 1) (keep-nulls t) (trim "[ \11\n\15-]+") (string "-*- lexical-binding: t; -*-")] (if trim (progn (let ((tem (string-match trim string this-start))) (and (eq tem this-start) (setq this-start (match-end 0)))))) (if (or keep-nulls (< this-start this-end)) (progn (let ((this (substring string this-start this-end))) (if trim (progn (let ... ...))) (if (or keep-nulls (> ... 0)) (progn (setq list ...))))))))
| (funcall #f(lambda () [(list nil) (this-end 0) (this-start 1) (keep-nulls t) (trim "[ \11\n\15-]+") (string "-*- lexical-binding: t; -*-")] (if trim (progn (let ((tem (string-match trim string this-start))) (and (eq tem this-start) (setq this-start (match-end 0)))))) (if (or keep-nulls (< this-start this-end)) (progn (let ((this (substring string this-start this-end))) (if trim (progn (let ... ...))) (if (or keep-nulls (> ... 0)) (progn (setq list ...))))))))
| (while (and (string-match rexp string (if (and notfirst (= start (match-beginning 0)) (< start (length string))) (1+ start) start)) (< start (length string))) (setq notfirst t) (progn (setq this-start start) (setq this-end (match-beginning 0)) (setq start (match-end 0))) (funcall push-one))
| (let* ((keep-nulls (not (if separators omit-nulls t))) (rexp (or separators split-string-default-separators)) (start 0) this-start this-end notfirst (list nil) (push-one #'(lambda nil (if trim (progn (let ... ...))) (if (or keep-nulls (< this-start this-end)) (progn (let ... ... ...)))))) (while (and (string-match rexp string (if (and notfirst (= start (match-beginning 0)) (< start (length string))) (1+ start) start)) (< start (length string))) (setq notfirst t) (progn (setq this-start start) (setq this-end (match-beginning 0)) (setq start (match-end 0))) (funcall push-one)) (progn (setq this-start start) (setq this-end (length string))) (funcall push-one) (nreverse list))
| (split-string "-*- lexical-binding: t; -*-" "-\\*-" nil "[ \11\n\15-]+")
No problem without TRIM arg in the call.
TIA,
Michael.
In GNU Emacs 31.0.50 (build 23, x86_64-pc-linux-gnu, cairo version
1.16.0) of 2025-06-03 built on drachen
Repository revision: 8e4a0ea35908e08c2220bafee33a05c33f24bbc3
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101007
System Description: Debian GNU/Linux 12 (bookworm)
Configured using:
'configure --with-x-toolkit=no --with-native-compilation=no'
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Thu, 05 Jun 2025 15:49:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 78690 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 04 Jun 2025 04:36:26 +0200
> From: Michael Heerdegen via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs <at> gnu.org>
>
>
> Hello,
>
> I stumbled across this:
>
> #+begin_src emacs-lisp
> (split-string
> "-*- lexical-binding: t; -*-"
> "-\\*-" nil "[ \t\n\r-]+")
> #+end_src
>
> ~~>
>
> | Debugger entered--Lisp error: (args-out-of-range "-*- lexical-binding: t; -*-" 1 0)
> | (substring "-*- lexical-binding: t; -*-" 1 0)
> | (let ((this (substring string this-start this-end))) (if trim (progn (let ((tem (string-match (concat trim "\\'") this 0))) (and tem (< tem (length this)) (setq this (substring this 0 tem)))))) (if (or keep-nulls (> (length this) 0)) (progn (setq list (cons this list)))))
It is quite obvious that split-string is not prepared to deal with a
situation where the argument STRING begins with a match for
SEPARATORS. The breakage here happens because the match for
SEPARATORS at the very beginning of STRING also matches TRIM, but even
if that is not so, a match for SEPARATORS at the beginning of STRING
sets THIS-START incorrectly for the first call to push-one inside the
while-loop.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Fri, 06 Jun 2025 01:53:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 78690 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> It is quite obvious that split-string is not prepared to deal with a
> situation where the argument STRING begins with a match for
> SEPARATORS. The breakage here happens because the match for
> SEPARATORS at the very beginning of STRING also matches TRIM, but even
> if that is not so, a match for SEPARATORS at the beginning of STRING
> sets THIS-START incorrectly for the first call to push-one inside the
> while-loop.
I read that as "confirmed, a bug". Ok, thanks for the analysis.
The original use case is in Helm btw, which does a call like this. I
made the example a bit shorter for this report - in the original issue
the string starts with whitespace, like
" -*- lexical-binding: t; -*-"
but as you already mentioned the issue is the same.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Fri, 06 Jun 2025 07:11:03 GMT)
Full text and
rfc822 format available.
Message #14 received at 78690 <at> debbugs.gnu.org (full text, mbox):
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: 78690 <at> debbugs.gnu.org
> Date: Fri, 06 Jun 2025 03:53:47 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > It is quite obvious that split-string is not prepared to deal with a
> > situation where the argument STRING begins with a match for
> > SEPARATORS. The breakage here happens because the match for
> > SEPARATORS at the very beginning of STRING also matches TRIM, but even
> > if that is not so, a match for SEPARATORS at the beginning of STRING
> > sets THIS-START incorrectly for the first call to push-one inside the
> > while-loop.
>
> I read that as "confirmed, a bug". Ok, thanks for the analysis.
It's more than that: I'm working on this bug. It just takes time to
unlock all the subtleties of the implementation and understand how to
fix it in a most economical and safe way. I've just succeeded to
understand what was the root cause when I ran out of time.
The interim analysis was intended to attract others to the problem and
perhaps nudge someone to work out a solution. Also to serve a
reminder to myself when I get to look at this next time.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Sat, 07 Jun 2025 19:33:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 78690 <at> debbugs.gnu.org (full text, mbox):
> Cc: 78690 <at> debbugs.gnu.org
> Date: Fri, 06 Jun 2025 10:10:25 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
>
> > From: Michael Heerdegen <michael_heerdegen <at> web.de>
> > Cc: 78690 <at> debbugs.gnu.org
> > Date: Fri, 06 Jun 2025 03:53:47 +0200
> >
> > Eli Zaretskii <eliz <at> gnu.org> writes:
> >
> > > It is quite obvious that split-string is not prepared to deal with a
> > > situation where the argument STRING begins with a match for
> > > SEPARATORS. The breakage here happens because the match for
> > > SEPARATORS at the very beginning of STRING also matches TRIM, but even
> > > if that is not so, a match for SEPARATORS at the beginning of STRING
> > > sets THIS-START incorrectly for the first call to push-one inside the
> > > while-loop.
> >
> > I read that as "confirmed, a bug". Ok, thanks for the analysis.
>
> It's more than that: I'm working on this bug. It just takes time to
> unlock all the subtleties of the implementation and understand how to
> fix it in a most economical and safe way. I've just succeeded to
> understand what was the root cause when I ran out of time.
>
> The interim analysis was intended to attract others to the problem and
> perhaps nudge someone to work out a solution. Also to serve a
> reminder to myself when I get to look at this next time.
The patch below seems to fix the problem, and passes all the tests.
Does it look reasonable?
I took the opportunity to also fix the (dangerous, IMO) calls to
match-beginning in the 'while' condition, since I couldn't be certain
it cannot be affected by the calls to string-match inside push-one.
diff --git a/lisp/subr.el b/lisp/subr.el
index 729f8b3..41aaadf 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -5785,7 +5785,9 @@ split-string
(start 0)
this-start this-end
notfirst
+ match-beg
(list nil)
+ (strlen (length string))
(push-one
;; Push the substring in range THIS-START to THIS-END
;; onto LIST, trimming it and perhaps discarding it.
@@ -5794,6 +5796,7 @@ split-string
;; Discard the trim from start of this substring.
(let ((tem (string-match trim string this-start)))
(and (eq tem this-start)
+ (not (eq this-start this-end))
(setq this-start (match-end 0)))))
(when (or keep-nulls (< this-start this-end))
@@ -5811,18 +5814,22 @@ split-string
(while (and (string-match rexp string
(if (and notfirst
- (= start (match-beginning 0))
- (< start (length string)))
+ (= start match-beg)
+ (< start strlen))
(1+ start) start))
- (< start (length string)))
- (setq notfirst t)
- (setq this-start start this-end (match-beginning 0)
- start (match-end 0))
+ (< start strlen))
+ (setq notfirst t
+ match-beg (match-beginning 0))
+ (if (= start match-beg)
+ (setq this-start (match-end 0)
+ this-end this-start)
+ (setq this-start start this-end match-beg))
+ (setq start (match-end 0))
(funcall push-one))
;; Handle the substring at the end of STRING.
- (setq this-start start this-end (length string))
+ (setq this-start start this-end strlen)
(funcall push-one)
(nreverse list)))
diff --git a/test/lisp/subr-tests.el b/test/lisp/subr-tests.el
index 024cbe8..2e8cbec 100644
--- a/test/lisp/subr-tests.el
+++ b/test/lisp/subr-tests.el
@@ -1505,5 +1505,14 @@ hash-table-contains-p
(should (hash-table-contains-p 'cookie h))
(should (hash-table-contains-p 'milk h))))
+(ert-deftest subr-test-split-string ()
+ (let ((text "-*- lexical-binding: t; -*-")
+ (seps "-\\*-")
+ (trim "[ \t\n\r-]+"))
+ (should (equal (split-string text seps nil trim)
+ '("" "lexical-binding: t;" "")))
+ (should (equal (split-string text seps t trim)
+ '("lexical-binding: t;")))))
+
(provide 'subr-tests)
;;; subr-tests.el ends here
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Mon, 09 Jun 2025 01:21:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 78690 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> The patch below seems to fix the problem, and passes all the tests.
> Does it look reasonable?
Thank you very much. I did not look at the details of your change yet;
however, I see that a string starting with whitespace still makes the
function error:
#+begin_src emacs-lisp
(let ((text " -*- lexical-binding: t; -*-")
;; ^^^ see here
(seps "-\\*-")
(trim "[ \t\n\r-]+"))
(split-string text seps nil trim))
#+end_src
~~> split-string: Args out of range: " -*- lexical-binding: t; -*-", 2, 1
Could you please have a look?
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Thu, 12 Jun 2025 07:50:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 78690 <at> debbugs.gnu.org (full text, mbox):
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: 78690 <at> debbugs.gnu.org
> Date: Mon, 09 Jun 2025 03:22:16 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > The patch below seems to fix the problem, and passes all the tests.
> > Does it look reasonable?
>
> Thank you very much. I did not look at the details of your change yet;
> however, I see that a string starting with whitespace still makes the
> function error:
>
> #+begin_src emacs-lisp
> (let ((text " -*- lexical-binding: t; -*-")
> ;; ^^^ see here
> (seps "-\\*-")
> (trim "[ \t\n\r-]+"))
> (split-string text seps nil trim))
> #+end_src
>
> ~~> split-string: Args out of range: " -*- lexical-binding: t; -*-", 2, 1
>
> Could you please have a look?
Thanks. I guess I've found something similar independently, because
the changes I have stashed (reproduced below) don't signal an error in
this case. With those changes, I get what I think is the expected
value
("" "lexical-binding: t;" "")
Here's the up-to-date version of the patch:
diff --git a/lisp/subr.el b/lisp/subr.el
index 729f8b3..f674e51 100644
--- a/lisp/subr.el
+++ b/lisp/subr.el
@@ -5785,7 +5785,9 @@ split-string
(start 0)
this-start this-end
notfirst
+ match-beg
(list nil)
+ (strlen (length string))
(push-one
;; Push the substring in range THIS-START to THIS-END
;; onto LIST, trimming it and perhaps discarding it.
@@ -5794,6 +5796,7 @@ split-string
;; Discard the trim from start of this substring.
(let ((tem (string-match trim string this-start)))
(and (eq tem this-start)
+ (<= (match-end 0) this-end)
(setq this-start (match-end 0)))))
(when (or keep-nulls (< this-start this-end))
@@ -5811,18 +5814,25 @@ split-string
(while (and (string-match rexp string
(if (and notfirst
- (= start (match-beginning 0))
- (< start (length string)))
+ (= start match-beg) ; empty match
+ (< start strlen))
(1+ start) start))
- (< start (length string)))
- (setq notfirst t)
- (setq this-start start this-end (match-beginning 0)
- start (match-end 0))
+ (< start strlen))
+ (setq notfirst t
+ match-beg (match-beginning 0))
+ ;; If the separator is right at the beginning, produce an empty
+ ;; substring in the result list.
+ (if (= start match-beg)
+ (setq this-start (match-end 0)
+ this-end this-start)
+ ;; Otherwise produce a substring from start to the separator.
+ (setq this-start start this-end match-beg))
+ (setq start (match-end 0))
(funcall push-one))
;; Handle the substring at the end of STRING.
- (setq this-start start this-end (length string))
+ (setq this-start start this-end strlen)
(funcall push-one)
(nreverse list)))
diff --git a/test/lisp/subr-tests.el b/test/lisp/subr-tests.el
index 024cbe8..2e8cbec 100644
--- a/test/lisp/subr-tests.el
+++ b/test/lisp/subr-tests.el
@@ -1505,5 +1505,14 @@ hash-table-contains-p
(should (hash-table-contains-p 'cookie h))
(should (hash-table-contains-p 'milk h))))
+(ert-deftest subr-test-split-string ()
+ (let ((text "-*- lexical-binding: t; -*-")
+ (seps "-\\*-")
+ (trim "[ \t\n\r-]+"))
+ (should (equal (split-string text seps nil trim)
+ '("" "lexical-binding: t;" "")))
+ (should (equal (split-string text seps t trim)
+ '("lexical-binding: t;")))))
+
(provide 'subr-tests)
;;; subr-tests.el ends here
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Thu, 12 Jun 2025 08:42:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 78690 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> > #+begin_src emacs-lisp
> > (let ((text " -*- lexical-binding: t; -*-")
> > ;; ^^^ see here
> > (seps "-\\*-")
> > (trim "[ \t\n\r-]+"))
> > (split-string text seps nil trim))
> > #+end_src
> >
> > ~~> split-string: Args out of range: " -*- lexical-binding: t; -*-", 2, 1
> >
> > Could you please have a look?
>
> Thanks. I guess I've found something similar independently, because
> the changes I have stashed (reproduced below) don't signal an error in
> this case. With those changes, I get what I think is the expected
> value
>
> ("" "lexical-binding: t;" "")
> Here's the up-to-date version of the patch:
> [...]
Confirmed - works for me, thank you.
BTW, I tried to follow the code and find it a bit troublesome. If you
would want to try to give some local variables a bit more meaningful
names where possible, or want to add some very short comments, I would
not be opposed to that.
Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Thu, 12 Jun 2025 11:06:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 78690 <at> debbugs.gnu.org (full text, mbox):
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: 78690 <at> debbugs.gnu.org
> Date: Thu, 12 Jun 2025 10:42:42 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > Thanks. I guess I've found something similar independently, because
> > the changes I have stashed (reproduced below) don't signal an error in
> > this case. With those changes, I get what I think is the expected
> > value
> >
> > ("" "lexical-binding: t;" "")
>
> > Here's the up-to-date version of the patch:
> > [...]
>
> Confirmed - works for me, thank you.
Thanks, will install soon.
> BTW, I tried to follow the code and find it a bit troublesome. If you
> would want to try to give some local variables a bit more meaningful
> names where possible, or want to add some very short comments, I would
> not be opposed to that.
If you tell me what is confusing or unclear, I will try to clarify
that. I guess after staring at the code as much as I did, I've lost
the ability to see the confusing parts, beyond the comments I added
that you saw in the patch.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#78690
; Package
emacs
.
(Thu, 19 Jun 2025 17:43:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 78690 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
> If you tell me what is confusing or unclear, I will try to clarify
> that. I guess after staring at the code as much as I did, I've lost
> the ability to see the confusing parts, beyond the comments I added
> that you saw in the patch.
The comments are actually good. I got a bit confused because of the
non-functional programming style (the helper lambda `push-one' changes
state that is not local to that helper function), but... it's ok, it's
just a complicated semantics being implemented so it takes some time to
see what is going on where here. All good. Along with the comments one
can make the way through the code.
Thanks,
Michael.
Reply sent
to
Eli Zaretskii <eliz <at> gnu.org>
:
You have taken responsibility.
(Sat, 21 Jun 2025 08:14:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Michael Heerdegen <michael_heerdegen <at> web.de>
:
bug acknowledged by developer.
(Sat, 21 Jun 2025 08:14:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 78690-done <at> debbugs.gnu.org (full text, mbox):
> From: Michael Heerdegen <michael_heerdegen <at> web.de>
> Cc: 78690 <at> debbugs.gnu.org
> Date: Thu, 19 Jun 2025 19:43:31 +0200
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > If you tell me what is confusing or unclear, I will try to clarify
> > that. I guess after staring at the code as much as I did, I've lost
> > the ability to see the confusing parts, beyond the comments I added
> > that you saw in the patch.
>
> The comments are actually good. I got a bit confused because of the
> non-functional programming style (the helper lambda `push-one' changes
> state that is not local to that helper function), but... it's ok, it's
> just a complicated semantics being implemented so it takes some time to
> see what is going on where here. All good. Along with the comments one
> can make the way through the code.
Thanks, now installed on the master branch, and closing the bug.
This bug report was last modified 5 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.