GNU bug report logs - #78263
[PATCH 1/3] Ignore OSC sequences in term instead of printing them

Previous Next

Package: emacs;

Reported by: Stephane Zermatten <szermatt <at> gmail.com>

Date: Mon, 5 May 2025 15:21:02 UTC

Severity: normal

Tags: patch

To reply to this bug, email your comments to 78263 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to dann <at> ics.uci.edu, per <at> bothner.com, bug-gnu-emacs <at> gnu.org:
bug#78263; Package emacs. (Mon, 05 May 2025 15:21:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stephane Zermatten <szermatt <at> gmail.com>:
New bug report received and forwarded. Copy sent to dann <at> ics.uci.edu, per <at> bothner.com, bug-gnu-emacs <at> gnu.org. (Mon, 05 May 2025 15:21:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephane Zermatten <szermatt <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: [PATCH 1/3] Ignore OSC sequences in term instead of printing them
Date: Mon, 05 May 2025 17:30:35 +0300
[Message part 1 (text/plain, inline)]
Tags: patch


This change detects OSC sequences and ignores them. This is what's
normally expected and avoids strange outputs.

Starting with version 4, the fish shell has started sending out OSC
sequences by default, which looks pretty bad on term. This change avoids
this particular problem.

This is PATCH 1/3, because I'd like to propose making it possible to
handle OSC sequences in term in follow-up changes. Emacs already has
handlers for OSC sequences in ansi-osc, why not make use of them?

In GNU Emacs 31.0.50 (build 5, x86_64-apple-darwin23.6.0, NS
 appkit-2487.70 Version 14.7.5 (Build 23H527)) of 2025-05-02 built on
 boomer.zia
Repository revision: 99ca41b6ef300653a0d15b73a0c0d446a9a9e059
Repository branch: master
Windowing system distributor 'Apple', version 10.3.2487
System Description:  macOS 14.7.5

Configured using:
 'configure --program-suffix=-head --with-tree-sitter
 --with-native-compilation=aot'

[0001-Ignore-OSC-sequences-in-term-instead-of-printing-the.patch (text/patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78263; Package emacs. (Mon, 05 May 2025 15:56:02 GMT) Full text and rfc822 format available.

Message #8 received at 78263 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephane Zermatten <szermatt <at> gmail.com>
Cc: per <at> bothner.com, dann <at> ics.uci.edu, 78263 <at> debbugs.gnu.org
Subject: Re: bug#78263: [PATCH 1/3] Ignore OSC sequences in term instead of
 printing them
Date: Mon, 05 May 2025 18:55:06 +0300
> Cc: Dan Nicolaescu <dann <at> ics.uci.edu>, Per Bothner <per <at> bothner.com>
> From: Stephane Zermatten <szermatt <at> gmail.com>
> Date: Mon, 05 May 2025 17:30:35 +0300
> 
> This change detects OSC sequences and ignores them. This is what's
> normally expected and avoids strange outputs.
> 
> Starting with version 4, the fish shell has started sending out OSC
> sequences by default, which looks pretty bad on term. This change avoids
> this particular problem.
> 
> This is PATCH 1/3, because I'd like to propose making it possible to
> handle OSC sequences in term in follow-up changes. Emacs already has
> handlers for OSC sequences in ansi-osc, why not make use of them?

I indeed think that teaching Emacs about these sequences is better
than filtering them out.  Since you seem to think the same, and will
follow up with patches to that effect, why do we need this patch at
all?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78263; Package emacs. (Mon, 05 May 2025 16:38:02 GMT) Full text and rfc822 format available.

Message #11 received at 78263 <at> debbugs.gnu.org (full text, mbox):

From: Per Bothner <per <at> bothner.com>
To: Stephane Zermatten <szermatt <at> gmail.com>, 78263 <at> debbugs.gnu.org
Cc: Eli Zaretskii <eliz <at> gnu.org>, Dan Nicolaescu <dann <at> ics.uci.edu>
Subject: Re: bug#78263: [PATCH 1/3] Ignore OSC sequences in term instead of
 printing them
Date: Mon, 5 May 2025 09:36:45 -0700
The whole term-control-seq-regexp is fundamentally broken, and does not appear
in any version of term.el I have worked on or blessed.  It assumes all relevant sequences
are complete in a single call to term-emulate-terminal. That is not a safe assumption.

The correct way to parse terminal escape sequences is using a state machine, as in
my original implementation. Alternatively, if you really want to use pattern matching,
it is possible to recognize "incomplete escape sequence", and save that until the
next call.  However, I don't see any code to handle that; it is possible I'm missing something.
(That approach can in theory lead to O(n^2) behavior, which the state machine approach avoids,
but I don't think that is a major problem.)

I agree with Eli that term should when possible handle OSC sequences rather than ignoring them.
(Ones it can't handle should of course be ignored.)

On 5/5/25 7:30 AM, Stephane Zermatten wrote:
> Tags: patch
> 
> 
> This change detects OSC sequences and ignores them. This is what's
> normally expected and avoids strange outputs.
> 
> Starting with version 4, the fish shell has started sending out OSC
> sequences by default, which looks pretty bad on term. This change avoids
> this particular problem.
> 
> This is PATCH 1/3, because I'd like to propose making it possible to
> handle OSC sequences in term in follow-up changes. Emacs already has
> handlers for OSC sequences in ansi-osc, why not make use of them?
> 
> In GNU Emacs 31.0.50 (build 5, x86_64-apple-darwin23.6.0, NS
>   appkit-2487.70 Version 14.7.5 (Build 23H527)) of 2025-05-02 built on
>   boomer.zia
> Repository revision: 99ca41b6ef300653a0d15b73a0c0d446a9a9e059
> Repository branch: master
> Windowing system distributor 'Apple', version 10.3.2487
> System Description:  macOS 14.7.5
> 
> Configured using:
>   'configure --program-suffix=-head --with-tree-sitter
>   --with-native-compilation=aot'
> 

-- 
	--Per Bothner
per <at> bothner.com   http://per.bothner.com/





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78263; Package emacs. (Mon, 05 May 2025 16:59:02 GMT) Full text and rfc822 format available.

Message #14 received at 78263 <at> debbugs.gnu.org (full text, mbox):

From: Stéphane Zermatten <szermatt <at> gmail.com>
To: Per Bothner <per <at> bothner.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Dan Nicolaescu <dann <at> ics.uci.edu>,
 78263 <at> debbugs.gnu.org
Subject: Re: bug#78263: [PATCH 1/3] Ignore OSC sequences in term instead of
 printing them
Date: Mon, 5 May 2025 19:57:45 +0300
[Message part 1 (text/plain, inline)]
I attached a merged version of all 3 patches to this e-mail that handles OSC sequences, so they’re handled right away.  I’m happy to do it that way, if it’s more convenient. 



[0001-Handle-OSC-sequences-in-term.patch (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]

> On 5 May 2025, at 19:36, Per Bothner <per <at> bothner.com> wrote:
> 
> The whole term-control-seq-regexp is fundamentally broken, and does not appear
> in any version of term.el I have worked on or blessed.  It assumes all relevant sequences
> are complete in a single call to term-emulate-terminal. That is not a safe assumption.
> 
> The correct way to parse terminal escape sequences is using a state machine, as in
> my original implementation. Alternatively, if you really want to use pattern matching,
> it is possible to recognize "incomplete escape sequence", and save that until the
> next call.  However, I don't see any code to handle that; it is possible I'm missing something.
> (That approach can in theory lead to O(n^2) behavior, which the state machine approach avoids,
> but I don't think that is a major problem.)

A state machine is indeed the right way of handling terminal escape sequences, and it would be a great idea to refactor term.el to do it that way. I don’t really want to use pattern matching. I’m using it here because it integrates with the current code. 

The current implementation does indeed recognize incomplete escape sequence, See term-control-seq-prefix-regexp and its use in term-emulate-terminal. It’s a bit, well, funny, but it seems to work. It keeps incomplete sequences in term-terminal-undecoded-bytes for the next call. 

In the patch attached to this e-mail, you’ll notice that I had to add some code that detects whether a sequence is terminated and buffers it for next time, in term-terminal-undecoded-bytes to be consistent with the current implementation. This is covered by the tests. 

If you guys think it’s time to refactor term.el to use a state machine and have time to spend on it, I’d be happy to help as much as I can. It’s a bit scary, because, unless I missed something, test coverage of the existing code isn’t very high, so adding more tests would be, I think, a good first step before engaging in larger changes. 
 
> 
> I agree with Eli that term should when possible handle OSC sequences rather than ignoring them.
> (Ones it can't handle should of course be ignored.)
> 
> On 5/5/25 7:30 AM, Stephane Zermatten wrote:
>> Tags: patch
>> This change detects OSC sequences and ignores them. This is what's
>> normally expected and avoids strange outputs.
>> Starting with version 4, the fish shell has started sending out OSC
>> sequences by default, which looks pretty bad on term. This change avoids
>> this particular problem.
>> This is PATCH 1/3, because I'd like to propose making it possible to
>> handle OSC sequences in term in follow-up changes. Emacs already has
>> handlers for OSC sequences in ansi-osc, why not make use of them?
>> In GNU Emacs 31.0.50 (build 5, x86_64-apple-darwin23.6.0, NS
>>  appkit-2487.70 Version 14.7.5 (Build 23H527)) of 2025-05-02 built on
>>  boomer.zia
>> Repository revision: 99ca41b6ef300653a0d15b73a0c0d446a9a9e059
>> Repository branch: master
>> Windowing system distributor 'Apple', version 10.3.2487
>> System Description:  macOS 14.7.5
>> Configured using:
>>  'configure --program-suffix=-head --with-tree-sitter
>>  --with-native-compilation=aot'
> 
> -- 
> 	--Per Bothner
> per <at> bothner.com   http://per.bothner.com/
> 


Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78263; Package emacs. (Mon, 05 May 2025 18:47:02 GMT) Full text and rfc822 format available.

Message #17 received at 78263 <at> debbugs.gnu.org (full text, mbox):

From: Per Bothner <per <at> bothner.com>
To: Stéphane Zermatten <szermatt <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, Dan Nicolaescu <dann <at> ics.uci.edu>,
 78263 <at> debbugs.gnu.org
Subject: Re: bug#78263: [PATCH 1/3] Ignore OSC sequences in term instead of
 printing them
Date: Mon, 5 May 2025 11:46:24 -0700
[Message part 1 (text/plain, inline)]

On 5/5/25 9:57 AM, Stéphane Zermatten wrote:
> A state machine is indeed the right way of handling terminal escape sequences, and it would be a great idea to refactor term.el to do it that way. I don’t really want to use pattern matching. I’m using it here because it integrates with the current code.
> 
> The current implementation does indeed recognize incomplete escape sequence, See term-control-seq-prefix-regexp and its use in term-emulate-terminal. It’s a bit, well, funny, but it seems to work. It keeps incomplete sequences in term-terminal-undecoded-bytes for the next call.
> 
> In the patch attached to this e-mail, you’ll notice that I had to add some code that detects whether a sequence is terminated and buffers it for next time, in term-terminal-undecoded-bytes to be consistent with the current implementation. This is covered by the tests.
> 
> If you guys think it’s time to refactor term.el to use a state machine and have time to spend on it, I’d be happy to help as much as I can. It’s a bit scary, because, unless I missed something, test coverage of the existing code isn’t very high, so adding more tests would be, I think, a good first step before engaging in larger changes.

I don't feel stronly about state machines - I think it would be more robust, but I don't know
if that justifies the effort. In principle using the term-terminal-undecoded-bytes logic is fine;
however I'm not sure how "complete" the logic is in terms of unusual unhandled escape sequences or garbled inputs.

If somebody does re-write term to use a state machine, it might save a little time and thought
to re-use the code from my original term.el (attached).
-- 
	--Per Bothner
per <at> bothner.com   http://per.bothner.com/
[term.el (text/x-emacs-lisp, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78263; Package emacs. (Thu, 08 May 2025 10:04:01 GMT) Full text and rfc822 format available.

Message #20 received at 78263 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stéphane Zermatten <szermatt <at> gmail.com>,
 Jared Finder <jared <at> finder.org>
Cc: per <at> bothner.com, 78263 <at> debbugs.gnu.org
Subject: Re: bug#78263: [PATCH 1/3] Ignore OSC sequences in term instead of
 printing them
Date: Thu, 08 May 2025 13:03:26 +0300
> From: Stéphane Zermatten <szermatt <at> gmail.com>
> Date: Mon, 5 May 2025 19:57:45 +0300
> Cc: 78263 <at> debbugs.gnu.org,
>  Dan Nicolaescu <dann <at> ics.uci.edu>,
>  Eli Zaretskii <eliz <at> gnu.org>
> 
> I attached a merged version of all 3 patches to this e-mail that handles OSC sequences, so they’re handled right away.  I’m happy to do it that way, if it’s more convenient. 

Thanks, see some minor comments below.

Jared, any further comments about the issue or the patch?

> * lisp/term.el (term-emulate-terminal) handle OSC sequences

The log message should mention all the functions/variables where you
make changes, with a short description of each change.  Please also
mention the bug number.

> +(defcustom term-osc-handlers '(("0" . ansi-osc-window-title-handler)
> +                               ("2" . ansi-osc-window-title-handler)
> +                               ("7" . ansi-osc-directory-tracker)
> +                               ("8" . ansi-osc-hyperlink-handler))
> +  "OSC sequence handler function alist.
> +
> +OSC (Operating System Command) is a category of ANSI escape sequence
> +used in terminal application to introduce custom commands. Terminals
> +ignore unknown OSC sequences by default. Handlers can be registered here
> +to add support for new OSC sequences to `term'.
> +
> +Functions in this alist are passed matching valid OSC sequences as
> +they're sent to the terminal.
> +
> +Valid OSC sequences are of the form
> +  ESC ] code ; text BEL
> +  ESC ] code ; text ESC \
> +
> +Each entry has the form (CODE . FUNCTION), where CODE is the string that
> +appears before the semicolon.
> +
> +FUNCTION is called with two arguments CODE and TEXT with the term buffer
> +active and its point and state active at the time the OSC sequence
> +appeared in the stream."
> +  :type '(alist :key-type string :value-type function)
> +  :group 'term)

New defcustoms should have a :version tag.

> +(defconst term-osc--max-bytes (* 32 1024 1024)
> +  "Limit the length of OSC sequences to keep in memory.")

This begs the question: what happens with longer sequences?  Can the
doc string answer that question?

> +                              (with-demoted-errors "term OSC error: %S"
> +                                (funcall
> +                                 func code
> +                                 (decode-coding-string
> +                                  (substring seq-str text-start end-mark)
> +                                  locale-coding-system t)))))

Why locale-coding-system?  Wouldn't the result of calling
process-coding-system with the process running in term.el be a better
candidate?

Last, but not least: you don't seem to have a copyright assignment on
file, without which we cannot accept such a large contribution.  Would
you like to start the paperwork of copyright assignment at this time?
If yes, I will send you the form to fill and the instructions.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#78263; Package emacs. (Tue, 13 May 2025 14:25:01 GMT) Full text and rfc822 format available.

Message #23 received at 78263 <at> debbugs.gnu.org (full text, mbox):

From: Stéphane Zermatten <szermatt <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Per Bothner <per <at> bothner.com>, Jared Finder <jared <at> finder.org>,
 78263 <at> debbugs.gnu.org
Subject: Re: bug#78263: [PATCH 1/3] Ignore OSC sequences in term instead of
 printing them
Date: Tue, 13 May 2025 17:24:29 +0300
[Message part 1 (text/plain, inline)]
Thank you for the review! I addressed the points you mentioned. See the updated patch attached to this e-mail. 

> On 8 May 2025, at 13:03, Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>> From: Stéphane Zermatten <szermatt <at> gmail.com>
>> Date: Mon, 5 May 2025 19:57:45 +0300
>> Cc: 78263 <at> debbugs.gnu.org,
>> Dan Nicolaescu <dann <at> ics.uci.edu>,
>> Eli Zaretskii <eliz <at> gnu.org>
>> 
>> I attached a merged version of all 3 patches to this e-mail that handles OSC sequences, so they’re handled right away.  I’m happy to do it that way, if it’s more convenient.
> 
> Thanks, see some minor comments below.
> 
> Jared, any further comments about the issue or the patch?
> 
>> * lisp/term.el (term-emulate-terminal) handle OSC sequences
> 
> The log message should mention all the functions/variables where you
> make changes, with a short description of each change.  Please also
> mention the bug number.

I updated the description, trying to follow the format of older commits.

> 
>> +(defcustom term-osc-handlers '(("0" . ansi-osc-window-title-handler)
>> +                               ("2" . ansi-osc-window-title-handler)
>> +                               ("7" . ansi-osc-directory-tracker)
>> +                               ("8" . ansi-osc-hyperlink-handler))
>> +  "OSC sequence handler function alist.
>> +
>> +OSC (Operating System Command) is a category of ANSI escape sequence
>> +used in terminal application to introduce custom commands. Terminals
>> +ignore unknown OSC sequences by default. Handlers can be registered here
>> +to add support for new OSC sequences to `term'.
>> +
>> +Functions in this alist are passed matching valid OSC sequences as
>> +they're sent to the terminal.
>> +
>> +Valid OSC sequences are of the form
>> +  ESC ] code ; text BEL
>> +  ESC ] code ; text ESC \
>> +
>> +Each entry has the form (CODE . FUNCTION), where CODE is the string that
>> +appears before the semicolon.
>> +
>> +FUNCTION is called with two arguments CODE and TEXT with the term buffer
>> +active and its point and state active at the time the OSC sequence
>> +appeared in the stream."
>> +  :type '(alist :key-type string :value-type function)
>> +  :group 'term)
> 
> New defcustoms should have a :version tag.

I added one. I guess we’re at 31.1, now. 

> 
>> +(defconst term-osc--max-bytes (* 32 1024 1024)
>> +  "Limit the length of OSC sequences to keep in memory.")
> 
> This begs the question: what happens with longer sequences?  Can the
> doc string answer that question?

emulate-terminal goes back into its normal state, because presumably the end tag was lost.
I tried to describe that.  

> 
>> +                              (with-demoted-errors "term OSC error: %S"
>> +                                (funcall
>> +                                 func code
>> +                                 (decode-coding-string
>> +                                  (substring seq-str text-start end-mark)
>> +                                  locale-coding-system t)))))
> 
> Why locale-coding-system?  Wouldn't the result of calling
> process-coding-system with the process running in term.el be a better
> candidate?

process-coding-system is just 'binary because emulate-terminal want to process the escape sequences before the string is decoded. Using locale-coding-system is the way strings are decoded everywhere in emulate-terminal.

> 
> Last, but not least: you don't seem to have a copyright assignment on
> file, without which we cannot accept such a large contribution.  Would
> you like to start the paperwork of copyright assignment at this time?
> If yes, I will send you the form to fill and the instructions.

Yes, I still need to do that. I sent out a request to assign <at> gnu.org <mailto:assign <at> gnu.org> following the instructions in etc/copyright-assign.txt but have yet to hear back from them. 


[Message part 2 (text/html, inline)]
[0001-Handle-OSC-sequences-in-term-bug-78263.patch (application/octet-stream, attachment)]
[Message part 4 (text/html, inline)]

This bug report was last modified 1 day ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.