GNU bug report logs -
#79670
31.0.50; [markdown-ts-mode] Errors and incorrect highlighting for fenced code blocks with unknown languages
Previous Next
To reply to this bug, email your comments to 79670 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Tue, 21 Oct 2025 23:19:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Rahul Martim Juliato <rahuljuliato <at> gmail.com>:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org.
(Tue, 21 Oct 2025 23:19:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello there!
I'm writing to report an issue in `markdown-ts-mode.el`. When a fenced
code block uses a language identifier for which no Tree-sitter grammar
is installed, e.g.:
```txt
This is some text
```
or
```some_invalid_lang
This is some other text
```
Emacs raises a treesit-load-language-error.
This makes for a poor user experience, as using common but un-grammared
identifiers like `txt` or even a typo, causes disruptive errors.
* The Problem:
The current implementation of `markdown-ts--convert-code-block-language`
resolves the language identifier to a symbol and passes it to the
Tree-sitter injection mechanism, regardless of whether a grammar for
that language is actually available. This causes the
treesit-load-language-error.
An initial attempt to fix this by returning `nil` for unknown languages
revealed another issue: the injection mechanism interprets nil as a
language named "nil" and tries to load libtree-sitter-nil.so, which also
results in an error.
* Proposed Fix
I propose the attached patch to `markdown-ts--convert-code-block-language`
to make it more robust by checking for the grammar's existence and
providing a safe fallback.
[0001-lisp-textmodes-markdown-ts-mode.el-Handle-unknown-co.patch (text/x-patch, inline)]
From 88b18089a3574508299e574c96c867ee973645ef Mon Sep 17 00:00:00 2001
From: Rahul Martim Juliato <rahul.juliato <at> gmail.com>
Date: Tue, 21 Oct 2025 20:01:39 -0300
Subject: [PATCH] * lisp/textmodes/markdown-ts-mode.el: Handle unknown code
block langs
(markdown-ts--convert-code-block-language): Check if a language
is available before using it. If the grammar for the language in
a code block is not installed, default to 'markdown' to prevent
errors. This is safer than trying to use a language that doesn't
exist.
(markdown-ts-mode-maybe): Move declaration for
treesit-language-available-p to the top of the file.
---
lisp/textmodes/markdown-ts-mode.el | 38 +++++++++++++++---------------
1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/lisp/textmodes/markdown-ts-mode.el b/lisp/textmodes/markdown-ts-mode.el
index 7e579f41628..c1b81d87540 100644
--- a/lisp/textmodes/markdown-ts-mode.el
+++ b/lisp/textmodes/markdown-ts-mode.el
@@ -44,6 +44,7 @@
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-parser-create "treesit.c")
+(declare-function treesit-language-available-p "treesit.c")
(add-to-list
'treesit-language-source-alist
@@ -228,7 +229,7 @@ markdown-ts-imenu-name-function
"Return an imenu entry if NODE is a valid header."
(let ((name (treesit-node-text node)))
(if (markdown-ts-imenu-node-p node)
- (thread-first (treesit-node-parent node) (treesit-node-text))
+ (thread-first (treesit-node-parent node) (treesit-node-text))
name)))
(defun markdown-ts-outline-predicate (node)
@@ -284,20 +285,20 @@ markdown-ts--add-config-for-mode
(defun markdown-ts--convert-code-block-language (node)
"Convert NODE to a language for the code block."
- (let* ((lang-string (alist-get (treesit-node-text node)
- markdown-ts--code-block-language-map
- (treesit-node-text node) nil #'equal))
- (lang (if (symbolp lang-string)
- lang-string
- (intern (downcase lang-string)))))
- ;; FIXME: Kind of a hack here: we use this function as a hook for
- ;; loading up configs for the language for the code block on-demand.
- (unless (memq lang markdown-ts--configured-languages)
- (let ((mode (alist-get lang markdown-ts-code-block-source-mode-map)))
- (when (fboundp mode)
- (markdown-ts--add-config-for-mode lang mode)
- (push lang markdown-ts--configured-languages))))
- lang))
+ (let* ((lang-str (downcase (treesit-node-text node)))
+ (lang (or (alist-get lang-str markdown-ts--code-block-language-map nil nil #'equal)
+ (intern-soft lang-str))))
+ (if (and lang (treesit-language-available-p lang))
+ (progn
+ ;; FIXME: Kind of a hack here: we use this function as a hook for
+ ;; loading up configs for the language for the code block on-demand.
+ (when (not (memq lang markdown-ts--configured-languages))
+ (when-let ((mode (alist-get lang markdown-ts-code-block-source-mode-map)))
+ (when (fboundp mode)
+ (markdown-ts--add-config-for-mode lang mode)
+ (push lang markdown-ts--configured-languages))))
+ lang)
+ 'markdown)))
(defun markdown-ts--range-settings ()
"Return range settings for `markdown-ts-mode'."
@@ -392,9 +393,9 @@ markdown-ts-mode
(setq-local comment-end " -->")
(setq-local font-lock-defaults nil
- treesit-font-lock-feature-list '((delimiter heading)
- (paragraph)
- (paragraph-inline)))
+ treesit-font-lock-feature-list '((delimiter heading)
+ (paragraph)
+ (paragraph-inline)))
(setq-local treesit-simple-imenu-settings
`(("Headings" ,#'markdown-ts-imenu-node-p
@@ -414,7 +415,6 @@ markdown-ts-mode-maybe
"Enable `markdown-ts-mode' when its grammar is available.
Also propose to install the grammar when `treesit-enabled-modes'
is t or contains the mode name."
- (declare-function treesit-language-available-p "treesit.c")
(if (or (treesit-language-available-p 'markdown)
(eq treesit-enabled-modes t)
(memq 'markdown-ts-mode treesit-enabled-modes))
--
2.51.1
[Message part 3 (text/plain, inline)]
* How the Fix Works
1. It checks if a grammar for the resolved language symbol is available
using treesit-language-available-p.
2. If the language exists, it proceeds as normal.
3. If the language does not exist, it returns 'markdown as a fallback
language for the injection. This prevents the crash by always providing
a valid language that is guaranteed to exist in this mode.
* Limitations
This is a pragmatic workaround, not a perfect solution. The main
limitation is that for an unknown language, the content of the code
block will be parsed by the markdown grammar. This can lead to
undesirable highlighting, such as *text* inside the code block being
rendered as emphasized text. Ideally, the block should be treated as
plain, un-highlighted text, but I could not find "how".
This workaround is necessary due to a constraint in the treesit.el API,
which seems to require that the :embed function in an injection rule
always return a valid language symbol. There appears to be no mechanism
to conditionally cancel an injection.
While not perfect, this solution prioritizes stability and prevents
user-facing errors, trading ideal highlighting for a crash-free
experience.
A more complete, long-term solution might involve enhancing treesit.el
to provide a way for an injection to be gracefully declined.
I hope this patch and explanation are useful and welcome any discussion
or suggestions for a better approach.
Thank you,
Rahul Martim Juliato
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Wed, 22 Oct 2025 16:15:04 GMT)
Full text and
rfc822 format available.
Message #8 received at 79670 <at> debbugs.gnu.org (full text, mbox):
Cc-ing Yuan.
> I propose the attached patch to `markdown-ts--convert-code-block-language`
> to make it more robust by checking for the grammar's existence and
> providing a safe fallback.
Thanks, this problem occurs also when typing a lang name,
and while it is still incomplete.
> Ideally, the block should be treated as plain,
> un-highlighted text, but I could not find "how".
Maybe could fall back to fundamental-mode or text-mode?
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Fri, 24 Oct 2025 05:57:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 79670 <at> debbugs.gnu.org (full text, mbox):
> On Oct 22, 2025, at 9:10 AM, Juri Linkov <juri <at> linkov.net> wrote:
>
> Cc-ing Yuan.
>
>> I propose the attached patch to `markdown-ts--convert-code-block-language`
>> to make it more robust by checking for the grammar's existence and
>> providing a safe fallback.
>
> Thanks, this problem occurs also when typing a lang name,
> and while it is still incomplete.
>
>> Ideally, the block should be treated as plain,
>> un-highlighted text, but I could not find "how".
>
> Maybe could fall back to fundamental-mode or text-mode?
Or, if the function returns nil, we don’t create an embedded parser for that range. That feels cleaner to me. WDYT? I can make a patch for it.
Yuan
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Sat, 25 Oct 2025 17:00:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 79670 <at> debbugs.gnu.org (full text, mbox):
>>> I propose the attached patch to `markdown-ts--convert-code-block-language`
>>> to make it more robust by checking for the grammar's existence and
>>> providing a safe fallback.
>>
>> Thanks, this problem occurs also when typing a lang name,
>> and while it is still incomplete.
>>
>>> Ideally, the block should be treated as plain,
>>> un-highlighted text, but I could not find "how".
>>
>> Maybe could fall back to fundamental-mode or text-mode?
>
> Or, if the function returns nil, we don’t create an embedded parser
> for that range. That feels cleaner to me. WDYT?
Not creating a parser looks like the right thing to do.
> I can make a patch for it.
This would be nice, then we could see how it works.
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Sun, 26 Oct 2025 02:11:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 79670 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> linkov.net> writes:
>>>> I propose the attached patch to `markdown-ts--convert-code-block-language`
>>>> to make it more robust by checking for the grammar's existence and
>>>> providing a safe fallback.
>>>
>>> Thanks, this problem occurs also when typing a lang name,
>>> and while it is still incomplete.
>>>
>>>> Ideally, the block should be treated as plain,
>>>> un-highlighted text, but I could not find "how".
>>>
>>> Maybe could fall back to fundamental-mode or text-mode?
>>
>> Or, if the function returns nil, we don’t create an embedded parser
>> for that range. That feels cleaner to me. WDYT?
>
> Not creating a parser looks like the right thing to do.
Agreed.
>
>> I can make a patch for it.
>
> This would be nice, then we could see how it works.
Sure thing!
Regarding my initial patch, no worries if you decide to drop it and
start from scratch.
--
Rahul Martim Juliato
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Thu, 30 Oct 2025 04:31:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 79670 <at> debbugs.gnu.org (full text, mbox):
> On Oct 25, 2025, at 7:10 PM, Rahul Martim Juliato <rahuljuliato <at> gmail.com> wrote:
>
> Juri Linkov <juri <at> linkov.net> writes:
>
>>>>> I propose the attached patch to `markdown-ts--convert-code-block-language`
>>>>> to make it more robust by checking for the grammar's existence and
>>>>> providing a safe fallback.
>>>>
>>>> Thanks, this problem occurs also when typing a lang name,
>>>> and while it is still incomplete.
>>>>
>>>>> Ideally, the block should be treated as plain,
>>>>> un-highlighted text, but I could not find "how".
>>>>
>>>> Maybe could fall back to fundamental-mode or text-mode?
>>>
>>> Or, if the function returns nil, we don’t create an embedded parser
>>> for that range. That feels cleaner to me. WDYT?
>>
>> Not creating a parser looks like the right thing to do.
>
> Agreed.
>
>>
>>> I can make a patch for it.
>>
>> This would be nice, then we could see how it works.
>
> Sure thing!
>
> Regarding my initial patch, no worries if you decide to drop it and
> start from scratch.
Thanks, I implemented the fix in 9f468fd6eb9 and 9e8557fe855. Now code blocks with unknown languages are simply ignored.
Yuan
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Thu, 30 Oct 2025 07:13:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 79670 <at> debbugs.gnu.org (full text, mbox):
> Cc: 79670 <at> debbugs.gnu.org, Juri Linkov <juri <at> linkov.net>
> From: Yuan Fu <casouri <at> gmail.com>
> Date: Wed, 29 Oct 2025 21:30:08 -0700
>
>
>
> > On Oct 25, 2025, at 7:10 PM, Rahul Martim Juliato <rahuljuliato <at> gmail.com> wrote:
> >
> > Juri Linkov <juri <at> linkov.net> writes:
> >
> >>>>> I propose the attached patch to `markdown-ts--convert-code-block-language`
> >>>>> to make it more robust by checking for the grammar's existence and
> >>>>> providing a safe fallback.
> >>>>
> >>>> Thanks, this problem occurs also when typing a lang name,
> >>>> and while it is still incomplete.
> >>>>
> >>>>> Ideally, the block should be treated as plain,
> >>>>> un-highlighted text, but I could not find "how".
> >>>>
> >>>> Maybe could fall back to fundamental-mode or text-mode?
> >>>
> >>> Or, if the function returns nil, we don’t create an embedded parser
> >>> for that range. That feels cleaner to me. WDYT?
> >>
> >> Not creating a parser looks like the right thing to do.
> >
> > Agreed.
> >
> >>
> >>> I can make a patch for it.
> >>
> >> This would be nice, then we could see how it works.
> >
> > Sure thing!
> >
> > Regarding my initial patch, no worries if you decide to drop it and
> > start from scratch.
>
> Thanks, I implemented the fix in 9f468fd6eb9 and 9e8557fe855. Now code blocks with unknown languages are simply ignored.
Thanks, should this bug be closed now?
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Thu, 30 Oct 2025 07:25:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 79670 <at> debbugs.gnu.org (full text, mbox):
> On Oct 30, 2025, at 12:12 AM, Eli Zaretskii <eliz <at> gnu.org> wrote:
>
>> Cc: 79670 <at> debbugs.gnu.org, Juri Linkov <juri <at> linkov.net>
>> From: Yuan Fu <casouri <at> gmail.com>
>> Date: Wed, 29 Oct 2025 21:30:08 -0700
>>
>>
>>
>>> On Oct 25, 2025, at 7:10 PM, Rahul Martim Juliato <rahuljuliato <at> gmail.com> wrote:
>>>
>>> Juri Linkov <juri <at> linkov.net> writes:
>>>
>>>>>>> I propose the attached patch to `markdown-ts--convert-code-block-language`
>>>>>>> to make it more robust by checking for the grammar's existence and
>>>>>>> providing a safe fallback.
>>>>>>
>>>>>> Thanks, this problem occurs also when typing a lang name,
>>>>>> and while it is still incomplete.
>>>>>>
>>>>>>> Ideally, the block should be treated as plain,
>>>>>>> un-highlighted text, but I could not find "how".
>>>>>>
>>>>>> Maybe could fall back to fundamental-mode or text-mode?
>>>>>
>>>>> Or, if the function returns nil, we don’t create an embedded parser
>>>>> for that range. That feels cleaner to me. WDYT?
>>>>
>>>> Not creating a parser looks like the right thing to do.
>>>
>>> Agreed.
>>>
>>>>
>>>>> I can make a patch for it.
>>>>
>>>> This would be nice, then we could see how it works.
>>>
>>> Sure thing!
>>>
>>> Regarding my initial patch, no worries if you decide to drop it and
>>> start from scratch.
>>
>> Thanks, I implemented the fix in 9f468fd6eb9 and 9e8557fe855. Now code blocks with unknown languages are simply ignored.
>
> Thanks, should this bug be closed now?
I tested locally and it worked fine. But let’s hear from Rahul and Juri.
Yuan
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Thu, 30 Oct 2025 07:52:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 79670 <at> debbugs.gnu.org (full text, mbox):
>>>>>> I can make a patch for it.
>>>>>
>>>>> This would be nice, then we could see how it works.
>>>>
>>>> Sure thing!
>>>>
>>>> Regarding my initial patch, no worries if you decide to drop it and
>>>> start from scratch.
>>>
>>> Thanks, I implemented the fix in 9f468fd6eb9 and 9e8557fe855. Now code blocks with unknown languages are simply ignored.
>>
>> Thanks, should this bug be closed now?
>
> I tested locally and it worked fine. But let’s hear from Rahul and Juri.
Let's wait for what Rahul says since I see that it works nicely,
and 'treesit-explore' shows local parsers. One strange thing is that
only the ```c ...``` block creates both global and local parsers
unlike other languages that create only local parsers.
Information forwarded
to
bug-gnu-emacs <at> gnu.org:
bug#79670; Package
emacs.
(Thu, 30 Oct 2025 21:22:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 79670 <at> debbugs.gnu.org (full text, mbox):
Juri Linkov <juri <at> linkov.net> writes:
>>>>>>> I can make a patch for it.
>>>>>>
>>>>>> This would be nice, then we could see how it works.
>>>>>
>>>>> Sure thing!
>>>>>
>>>>> Regarding my initial patch, no worries if you decide to drop it and
>>>>> start from scratch.
>>>>
>>>> Thanks, I implemented the fix in 9f468fd6eb9 and 9e8557fe855. Now
>>>> code blocks with unknown languages are simply ignored.
>>>
>>> Thanks, should this bug be closed now?
>>
>> I tested locally and it worked fine. But let’s hear from Rahul and Juri.
>
> Let's wait for what Rahul says since I see that it works nicely,
> and 'treesit-explore' shows local parsers. One strange thing is that
> only the ```c ...``` block creates both global and local parsers
> unlike other languages that create only local parsers.
Hello everyone,
It works wonderfully!
(I think) I tested it thoroughly: typing code blocks, pasting them, and even
integrating my experimental setup where I plug `markdown-ts-mode` into
`eldoc/eglot` and everything behaves perfectly.
Thank you for the fix!
---
Rahul Martim Juliato
This bug report was last modified 5 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.