GNU bug report logs - #79682
31.0.50; Eglot does not support buffers containing special characters.

Previous Next

Package: emacs;

Reported by: ISouthRain <isouthrain <at> gmail.com>

Date: Thu, 23 Oct 2025 07:09:02 UTC

Severity: normal

Found in version 31.0.50

To reply to this bug, email your comments to 79682 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Thu, 23 Oct 2025 07:09:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to ISouthRain <isouthrain <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 23 Oct 2025 07:09:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: ISouthRain <isouthrain <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 31.0.50; Eglot does not support buffers containing special characters.
Date: Thu, 23 Oct 2025 14:51:00 +0800
维护者, 你们好!!
请原谅我使用我的母语而不是 English 来反馈这个问题.
我发现了一个问题, 有关于 eglot 启动失败的问题.
如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败.
在 cc-mode 使用 eglot 启动 clangd.
比如 buffer 内容存在:
```C
// Return:  true--OK   false--Error
```
我不确定发送到你那边的邮件这个特殊编码是否存在, 它是:
\5414622

因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符.
所以我想看看这是否合理.

感谢你们的工作, 真的!!


In GNU Emacs 31.0.50 (build 1, x86_64-w64-mingw32) of 2025-09-05 built
 on runnervmp943j
Repository revision: 8589735be509190060f82398da69cf6d9d659434
Repository branch: HEAD
Windowing system distributor 'Microsoft Corp.', version 10.0.19045
System Description: Microsoft Windows 10 Enterprise (v10.0.2009.19045.3324)

Configured using:
 'configure
 --prefix=/d/a/emacs-build/emacs-build/pkg/8589735-ucrt-x86_64
 'CFLAGS=-O2 -fno-semantic-interposition -floop-parallelize-all
 -ftree-parallelize-loops=4 -g -pipe ' --disable-build-details
 --without-dbus --enable-build-details --with-compress-install
 --with-cairo --with-gif --with-gnutls --with-harfbuzz --with-jpeg
 --with-json --with-lcms2 --with-mps --with-native-compilation
 --with-png --with-rsvg --with-tiff --with-tree-sitter --with-xml2
 --with-xpm --with-zlib'

Configured features:
ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES MPS NATIVE_COMP
NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF
TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB

Important settings:
  value of $LANG: CHS
  locale-coding-system: cp936

Major mode: C//

Minor modes in effect:
  treesit-fold-mode: t
  eglot-tempel-mode: t
  eglot-inactive-regions-mode: t
  eglot-booster-mode: t
  eglot--managed-mode: t
  flymake-mode: t
  gptel-watch-mode: t
  annotate-mode: t
  subword-mode: t
  which-function-mode: t
  server-mode: t
  auto-image-file-mode: t
  global-dict-line-mode: t
  dict-line-mode: t
  which-key-mode: t
  winner-mode: t
  indent-bars-mode: t
  rainbow-delimiters-mode: t
  eldoc-box-hover-mode: t
  display-time-mode: t
  global-auto-revert-mode: t
  save-place-mode: t
  recentf-mode: t
  global-hl-line-mode: t
  global-whitespace-mode: t
  whitespace-mode: t
  savehist-mode: t
  electric-pair-mode: t
  pixel-scroll-precision-mode: t
  empx-mode: t
  vertico-mode: t
  marginalia-mode: t
  undo-fu-session-global-mode: t
  undo-fu-session-mode: t
  super-save-mode: t
  global-hl-todo-mode: t
  hl-todo-mode: t
  global-page-break-lines-mode: t
  page-break-lines-mode: t
  dape-breakpoint-global-mode: t
  dape-many-windows: t
  hexl-follow-ascii: t
  repeat-mode: t
  term-keys-mode: t
  corfu-history-mode: t
  corfu-popupinfo-mode: t
  global-corfu-mode: t
  corfu-mode: t
  global-display-fill-column-indicator-mode: t
  display-fill-column-indicator-mode: t
  elpaca-use-package-mode: t
  override-global-mode: t
  elpaca-no-symlink-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  show-paren-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  window-divider-mode: t
  minibuffer-regexp-mode: t
  column-number-mode: t
  line-number-mode: t
  global-visual-line-mode: t
  visual-line-mode: t
  transient-mark-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t

Load-path shadows:
c:/Users/adminuirs/AppData/Roaming/.emacs.d/elpaca/builds/transient/transient hides f:/App/emacs/share/emacs/31.0.50/lisp/transient

Features:
(shadow sort mail-extr consult-imenu emacsbug lisp-mnt vertico-directory
consult-xref markdown-mode treesit-fold treesit-fold-summary
treesit-fold-parsers treesit-fold-util c++-ts-mode c-ts-mode c-ts-common
eglot-tempel peg eglot-inactive-regions eglot-booster eglot
external-completion diff ert ewoc flymake gptel-watch gptel-rewrite
gptel-transient gptel-gemini gptel-org gptel-prompter gptel gptel-openai
annotate ibuffer ibuffer-loaddefs consult bookmark undo-fu
string-inflection tabify ffap ispell vc-git diff-mode vc-dispatcher
org-appear oc-basic bibtex expreg cap-words superword subword
tempel-collection tempel ace-window avy posframe jka-compr vertico-sort
helpful cc-langs trace cl-print edebug debug backtrace info-look info f
help-fns radix-tree elisp-refs dash add-log which-func imenu server
org-protocol-capture-html eww track-changes vtable mule-util url-queue
mm-url gnus-demon image-file image-converter gnus-agent gnus-srvr
gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu mml2015 mm-view
mml-smime smime dig nntp gnus-cache gnus-sum shr pixel-fill kinsoku
url-file svg dom gnus-group gnus-undo gnus-start gnus-dbus dbus xml
gnus-cloud nnimap nnmail browse-url mail-source utf7 nnoo gnus-spec
gnus-int gnus-range message sendmail yank-media dired dired-loaddefs
rfc822 mml mml-sec epa epg rfc6068 epg-config mm-decode mm-bodies
mm-encode mailabbrev gmm-utils mailheader gnus-win gnus nnheader
gnus-util range s org-protocol org-capture dict-line
org-limit-image-size which-key winner indent-bars rainbow-delimiters
eldoc-box time autorevert filenotify saveplace tramp-cache time-stamp
tramp-sh recentf tree-widget hl-line whitespace savehist elec-pair
pixel-scroll cua-base empx xref vertico marginalia undo-fu-session
super-save hl-todo diminish page-break-lines dape jsonrpc tramp trampver
tramp-integration files-x tramp-message tramp-compat tramp-loaddefs hexl
gdb-mi bindat gud compile text-property-search repeat pulse face-remap
color term-keys transient corfu-history corfu-popupinfo corfu pinyinlib
orderless modus-operandi-tinted-theme modus-themes derived cal-julian
theme-changer solar cal-dst term-keys-autoloads copilot-autoloads
eca-autoloads gptel-autoloads emms-autoloads ement-autoloads
persist-autoloads plz-autoloads taxy-magit-section-autoloads
taxy-autoloads svg-lib-autoloads visual-fill-column-autoloads
crdt-autoloads string-inflection-autoloads markdown-toc-autoloads
annotate-autoloads eglot-tempel-autoloads tempel-collection-autoloads
tempel-autoloads cargo-autoloads markdown-mode-autoloads
rust-mode-autoloads eglot-inactive-regions-autoloads
eglot-booster-autoloads eldoc-box-autoloads indent-bars-autoloads
treesit-fold-autoloads dape-autoloads page-break-lines-autoloads
pyim-basedict-autoloads pyim-autoloads async-autoloads xr-autoloads
posframe-autoloads magit-autoloads with-editor-autoloads
transient-autoloads flyspell-correct-avy-menu-autoloads
avy-menu-autoloads flyspell-correct-autoloads google-translate-autoloads
popup-autoloads corfu-english-helper-autoloads corfu-autoloads
ace-window-autoloads avy-autoloads expreg-autoloads
rainbow-delimiters-autoloads super-save-autoloads
undo-fu-session-autoloads undo-fu-autoloads deadgrep-autoloads
spinner-autoloads org-appear-autoloads ox-pandoc-autoloads ht-autoloads
org-cliplink-autoloads org-roam-autoloads emacsql-autoloads
magit-section-autoloads cond-let-autoloads llama-autoloads
ox-hugo-autoloads tomelr-autoloads org-protocol-capture-html-autoloads
embark-consult-autoloads embark-autoloads marginalia-autoloads
consult-todo-autoloads hl-todo-autoloads consult-autoloads
pinyinlib-autoloads orderless-autoloads vertico-autoloads
helpful-autoloads f-autoloads elisp-refs-autoloads dash-autoloads
s-autoloads theme-changer-autoloads diminish-autoloads parse-time
iso8601 mail-utils gnutls network-stream url-http mail-parse rfc2231
rfc2047 rfc2045 mm-util ietf-drums mail-prsvr url-gw nsm puny url-cache
url-auth elpaca-menu-melpa elpaca-menu-org ob-latex ob-python python
compat pcase ob-shell shell ob-C cc-mode cc-fonts cc-guess cc-menus
cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs ox-odt rng-loc
rng-uri rng-parse rng-match rng-dt rng-util rng-pttrn nxml-parse nxml-ns
nxml-enc xmltok nxml-util ox-latex ox-icalendar org-agenda ox-html table
ox-ascii ox-publish ox org-attach org-element org-persist xdg org-id
org-refile org-element-ast inline avl-tree generator org ob ob-tangle
ob-ref ob-lob ob-table ob-exp org-macro org-src sh-script smie treesit
executable ob-comint org-pcomplete pcomplete comint ansi-osc ansi-color
ring org-list org-footnote org-faces org-entities time-date noutline
outline ob-emacs-lisp ob-core ob-eval org-cycle org-table ol org-fold
org-fold-core org-keys oc org-loaddefs thingatpt find-func cal-menu
calendar cal-loaddefs org-version org-compat org-macs format-spec
project edmacro kmacro display-fill-column-indicator comp comp-cstr
warnings comp-run comp-common rx cus-edit pp cus-start cus-load wid-edit
cl-extra help-mode elpaca-use-package use-package use-package-ensure
use-package-delight use-package-diminish use-package-bind-key bind-key
easy-mmode use-package-core elpaca-use-package-autoloads elpaca-log
elpaca-ui url url-proxy url-privacy url-expand url-methods url-history
url-cookie generate-lisp-file url-domsuf url-util url-parse auth-source
eieio eieio-core cl-macs icons password-cache json subr-x map byte-opt
gv bytecomp byte-compile url-vars mailcap cl-seq elpaca elpaca-process
cl-loaddefs cl-lib elpaca-autoloads china-util rmc iso-transl tooltip
cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type
elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table
term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode
prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu
timer select scroll-bar mouse jit-lock font-lock syntax font-core
term/tty-colors frame minibuffer nadvice seq simple cl-generic
indonesian philippine cham georgian utf-8-lang misc-lang vietnamese
tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek
romanian slovak czech european ethiopic indian cyrillic chinese
composite emoji-zwj charscript charprop case-table epa-hook
jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs
theme-loaddefs faces cus-face macroexp files window text-properties
overlay sha1 md5 base64 format env code-pages mule custom widget keymap
hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty
move-toolbar make-network-process tty-child-frames native-compile mps
emacs)

Memory information:
((conses 24 0 0) (symbols 56 0 0) (strings 40 0 0) (string-bytes 1 0)
 (vectors 24 0) (vector-slots 8 0 0) (floats 24 0 0) (intervals 64 0 0)
 (buffers 1072 0))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Thu, 23 Oct 2025 08:32:02 GMT) Full text and rfc822 format available.

Message #8 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ISouthRain <isouthrain <at> gmail.com>, joaotavora <at> gmail.com
Cc: 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50;
 Eglot does not support buffers containing special characters.
Date: Thu, 23 Oct 2025 11:31:19 +0300
> From: ISouthRain <isouthrain <at> gmail.com>
> Date: Thu, 23 Oct 2025 14:51:00 +0800
> 
> 
> 维护者, 你们好!!
> 请原谅我使用我的母语而不是 English 来反馈这个问题.
> 我发现了一个问题, 有关于 eglot 启动失败的问题.
> 如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败.
> 在 cc-mode 使用 eglot 启动 clangd.
> 比如 buffer 内容存在:
> ```C
> // Return:  true--OK   false--Error
> ```
> 我不确定发送到你那边的邮件这个特殊编码是否存在, 它是:
> \5414622
> 
> 因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符.
> 所以我想看看这是否合理.
> 
> 感谢你们的工作, 真的!!

Translation to English:

> Hello maintainers!!
> Please forgive me for reporting this issue in my native language instead of English.
> I've discovered an issue with eglot failing to start.
> If the buffer contains special characters, eglot will fail to start.
> Use eglot in cc-mode to start clangd.
> For example, if the buffer contains:
> ```C
> // Return:  true--OK false--Error
> ```
> I'm not sure if the special encoding used in the email sent to you is:
> \5414622
> 
> Because I edit files in an editor outside of Emacs, this special character might be present.
> So I wanted to see if this is reasonable.
> 
> Thanks for your work, really!

That character (0x161992) is beyond the last character supported by
Unicode, which is 0x10ffff.  I suspect that the LSP you are using
cannot cope with non-Unicode characters.  But maybe Eglot could
somehow ignore it or translate it to some acceptable string?

Can you show what does "eglot will fail to start" really mean?  Do you
see some error message or something?

Joao, any suggestions?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Thu, 23 Oct 2025 13:30:02 GMT) Full text and rfc822 format available.

Message #11 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ISouthRain <isouthrain <at> gmail.com>
Cc: joaotavora <at> gmail.com, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Thu, 23 Oct 2025 16:29:15 +0300
[Please always use Reply to All to reply, to keep everyone CC'ed.]

> From: ISouthRain <isouthrain <at> gmail.com>
> Date: Thu, 23 Oct 2025 21:16:32 +0800
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > That character (0x161992) is beyond the last character supported by
> > Unicode, which is 0x10ffff.  I suspect that the LSP you are using
> > cannot cope with non-Unicode characters.  But maybe Eglot could
> > somehow ignore it or translate it to some acceptable string?
> >
> > Can you show what does "eglot will fail to start" really mean?  Do you
> > see some error message or something?
> >
> > Joao, any suggestions?
> 
> Start Test: 
> 
> emacs -Q
> 
> Open test.c:
> ```C
> // -*- coding: utf-8; -*-
> 
> // Return:  true--OK false--Error
> ```
> And then message buffer out:
> ```
> Error running timer: (wrong-type-argument json-value-p "// -*- coding: utf-8; -*-

This is expected: JSON cannot handle non-Unicode characters.

> 
> // Return:  true--OK false--Error
> ")
> ```
> 
> In them mode-line show [eglot:MyProject/error]
> 
> Here's the content of the `eglot-events-buffer`:
> ```
> [jsonrpc] D[21:06:15.844] Running language server: f:/App/Scoop/apps/mingw-winlibs-llvm-ucrt/current/bin/clangd.exe
> [jsonrpc] e[21:06:15.846] --> initialize[1] {"jsonrpc":"2.0","id":1,"method":"initialize","params":{"processId":13204,"clientInfo":{"name":"Eglot","version":"1.18"},"rootPath":"c:/Users/Jack/Desktop/C/","rootUri":"file:///c%3A/Users/Jack/Desktop/C","initializationOptions":{},"capabilities":{"workspace":{"applyEdit":true,"executeCommand":{"dynamicRegistration":false},"workspaceEdit":{"documentChanges":true},"didChangeWatchedFiles":{"dynamicRegistration":true},"symbol":{"dynamicRegistration":false},"configuration":true,"workspaceFolders":true},"textDocument":{"synchronization":{"dynamicRegistration":false,"willSave":true,"willSaveWaitUntil":true,"didSave":true},"completion":{"dynamicRegistration":false,"completionItem":{"snippetSupport":false,"deprecatedSupport":true,"resolveSupport":{"properties":["documentation","details","additionalTextEdits"]},"tagSupport":{"valueSet":[1]},"insertReplaceSupport":true},"contextSupport":true},"hover":{"dynamicRegistration":false,"contentFormat":["plaintext"]},"signatureHelp":{"dynamicRegistration":false,"signatureInformation":{"parameterInformation":{"labelOffsetSupport":true},"documentationFormat":["plaintext"],"activeParameterSupport":true}},"references":{"dynamicRegistration":false},"definition":{"dynamicRegistration":false,"linkSupport":true},"declaration":{"dynamicRegistration":false,"linkSupport":true},"implementation":{"dynamicRegistration":false,"linkSupport":true},"typeDefinition":{"dynamicRegistration":false,"linkSupport":true},"documentSymbol":{"dynamicRegistration":false,"hierarchicalDocumentSymbolSupport":true,"symbolKind":{"valueSet":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]}},"documentHighlight":{"dynamicRegistration":false},"codeAction":{"dynamicRegistration":false,"resolveSupport":{"properties":["edit","command"]},"dataSupport":true,"codeActionLiteralSupport":{"codeActionKind":{"valueSet":["quickfix","refactor","refactor.extract","refactor.inline","refactor.rewrite","source","source.organizeImports"]}},"isPreferredSupport":true},"formatting":{"dynamicRegistration":false},"rangeFormatting":{"dynamicRegistration":false},"rename":{"dynamicRegistration":false},"inlayHint":{"dynamicRegistration":false},"callHierarchy":{"dynamicRegistration":false},"typeHierarchy":{"dynamicRegistration":false},"publishDiagnostics":{"relatedInformation":false,"versionSupport":true,"codeDescriptionSupport":false,"tagSupport":{"valueSet":[1,2]}}},"window":{"showDocument":{"support":true},"showMessage":{"messageActionItem":{"additionalPropertiesSupport":true}},"workDoneProgress":true},"general":{"positionEncodings":["utf-32","utf-8","utf-16"]},"experimental":{}},"workspaceFolders":[{"uri":"file:///c%3A/Users/Jack/Desktop/C","name":"c:/Users/Jack/Desktop/C/"}]}}
> [stderr]  I[21:06:15.884] (built by Brecht Sanders, r1) clangd version 18.1.8
> [stderr]  I[21:06:15.884] Features: windows
> [stderr]  I[21:06:15.884] PID: 18176
> [stderr]  I[21:06:15.884] Working directory: c:/Users/Jack/Desktop/C
> [stderr]  I[21:06:15.884] argv[0]: f:/App/Scoop/apps/mingw-winlibs-llvm-ucrt/current/bin/clangd.exe
> [stderr]  I[21:06:15.886] Starting LSP over stdin/stdout
> [stderr]  I[21:06:15.886] <-- initialize(1)
> [stderr]  I[21:06:15.888] --> reply:initialize(1) 1 ms
> [jsonrpc] e[21:06:15.888] <-- initialize[1] {"id":1,"jsonrpc":"2.0","result":{"capabilities":{"astProvider":true,"callHierarchyProvider":true,"clangdInlayHintsProvider":true,"codeActionProvider":{"codeActionKinds":["quickfix","refactor","info"]},"compilationDatabase":{"automaticReload":true},"completionProvider":{"resolveProvider":false,"triggerCharacters":[".","<",">",":","\"","/","*"]},"declarationProvider":true,"definitionProvider":true,"documentFormattingProvider":true,"documentHighlightProvider":true,"documentLinkProvider":{"resolveProvider":false},"documentOnTypeFormattingProvider":{"firstTriggerCharacter":"\n","moreTriggerCharacter":[]},"documentRangeFormattingProvider":true,"documentSymbolProvider":true,"executeCommandProvider":{"commands":["clangd.applyFix","clangd.applyTweak"]},"foldingRangeProvider":true,"hoverProvider":true,"implementationProvider":true,"inactiveRegionsProvider":true,"inlayHintProvider":true,"memoryUsageProvider":true,"referencesProvider":true,"renameProvider":true,"selectionRangeProvider":true,"semanticTokensProvider":{"full":{"delta":true},"legend":{"tokenModifiers":["declaration","definition","deprecated","deduced","readonly","static","abstract","virtual","dependentName","defaultLibrary","usedAsMutableReference","usedAsMutablePointer","constructorOrDestructor","userDefined","functionScope","classScope","fileScope","globalScope"],"tokenTypes":["variable","variable","parameter","function","method","function","property","variable","class","interface","enum","enumMember","type","type","unknown","namespace","typeParameter","concept","type","macro","modifier","operator","bracket","label","comment"]},"range":false},"signatureHelpProvider":{"triggerCharacters":["(",")","{","}","<",">",","]},"standardTypeHierarchyProvider":true,"textDocumentSync":{"change":2,"openClose":true,"save":true},"typeDefinitionProvider":true,"typeHierarchyProvider":true,"workspaceSymbolProvider":true},"serverInfo":{"name":"clangd","version":"(built by Brecht Sanders, r1) clangd version 18.1.8 windows x86_64-w64-windows-gnu; target=x86_64-w64-mingw32"}}}
> [jsonrpc] e[21:06:15.888] --> initialized {"jsonrpc":"2.0","method":"initialized","params":{}}
> [stderr]  I[21:06:15.888] <-- initialized
> [jsonrpc] e[21:06:16.438] --> textDocument/codeAction[2] {"jsonrpc":"2.0","id":2,"method":"textDocument/codeAction","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"range":{"start":{"line":0,"character":0},"end":{"line":3,"character":0}},"context":{"diagnostics":[],"triggerKind":2}}}
> [jsonrpc] e[21:06:16.438] --> textDocument/documentHighlight[3] {"jsonrpc":"2.0","id":3,"method":"textDocument/documentHighlight","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:16.438] --> textDocument/hover[4] {"jsonrpc":"2.0","id":4,"method":"textDocument/hover","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:16.438] --> textDocument/signatureHelp[5] {"jsonrpc":"2.0","id":5,"method":"textDocument/signatureHelp","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:16.440] <-- textDocument/codeAction[2] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":2,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:16.440] <-- textDocument/documentHighlight[3] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":3,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:16.440] <-- textDocument/hover[4] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":4,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:16.440] <-- textDocument/signatureHelp[5] {"error":{"code":-32602,"message":"trying to get preamble for non-added document"},"id":5,"jsonrpc":"2.0"}
> [stderr]  I[21:06:16.438] <-- textDocument/codeAction(2)
> [stderr]  I[21:06:16.438] --> reply:textDocument/codeAction(2) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:16.438] <-- textDocument/documentHighlight(3)
> [stderr]  I[21:06:16.438] --> reply:textDocument/documentHighlight(3) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:16.438] <-- textDocument/hover(4)
> [stderr]  I[21:06:16.438] --> reply:textDocument/hover(4) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:16.438] <-- textDocument/signatureHelp(5)
> [stderr]  I[21:06:16.438] --> reply:textDocument/signatureHelp(5) 0 ms, error: -32602: trying to get preamble for non-added document
> [jsonrpc] e[21:06:24.039] --> textDocument/codeAction[6] {"jsonrpc":"2.0","id":6,"method":"textDocument/codeAction","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"range":{"start":{"line":0,"character":0},"end":{"line":3,"character":0}},"context":{"diagnostics":[],"triggerKind":2}}}
> [jsonrpc] e[21:06:24.039] --> textDocument/documentHighlight[7] {"jsonrpc":"2.0","id":7,"method":"textDocument/documentHighlight","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:24.039] --> textDocument/hover[8] {"jsonrpc":"2.0","id":8,"method":"textDocument/hover","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:24.039] --> textDocument/signatureHelp[9] {"jsonrpc":"2.0","id":9,"method":"textDocument/signatureHelp","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:24.060] <-- textDocument/codeAction[6] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":6,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:24.060] <-- textDocument/documentHighlight[7] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":7,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:24.060] <-- textDocument/hover[8] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":8,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:24.060] <-- textDocument/signatureHelp[9] {"error":{"code":-32602,"message":"trying to get preamble for non-added document"},"id":9,"jsonrpc":"2.0"}
> [stderr]  I[21:06:24.039] <-- textDocument/codeAction(6)
> [stderr]  I[21:06:24.039] --> reply:textDocument/codeAction(6) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:24.039] <-- textDocument/documentHighlight(7)
> [stderr]  I[21:06:24.039] --> reply:textDocument/documentHighlight(7) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:24.039] <-- textDocument/hover(8)
> [stderr]  I[21:06:24.039] --> reply:textDocument/hover(8) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:24.039] <-- textDocument/signatureHelp(9)
> [stderr]  I[21:06:24.039] --> reply:textDocument/signatureHelp(9) 0 ms, error: -32602: trying to get preamble for non-added document
> [jsonrpc] e[21:06:33.885] --> textDocument/codeAction[10] {"jsonrpc":"2.0","id":10,"method":"textDocument/codeAction","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"range":{"start":{"line":0,"character":0},"end":{"line":3,"character":0}},"context":{"diagnostics":[],"triggerKind":2}}}
> [jsonrpc] e[21:06:33.885] --> textDocument/documentHighlight[11] {"jsonrpc":"2.0","id":11,"method":"textDocument/documentHighlight","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":3,"character":0}}}
> [jsonrpc] e[21:06:33.885] --> textDocument/hover[12] {"jsonrpc":"2.0","id":12,"method":"textDocument/hover","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":3,"character":0}}}
> [jsonrpc] e[21:06:33.885] --> textDocument/signatureHelp[13] {"jsonrpc":"2.0","id":13,"method":"textDocument/signatureHelp","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":3,"character":0}}}
> [jsonrpc] e[21:06:33.923] <-- textDocument/codeAction[10] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":10,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:33.923] <-- textDocument/documentHighlight[11] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":11,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:33.923] <-- textDocument/hover[12] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":12,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:33.923] <-- textDocument/signatureHelp[13] {"error":{"code":-32602,"message":"trying to get preamble for non-added document"},"id":13,"jsonrpc":"2.0"}
> [stderr]  I[21:06:33.885] <-- textDocument/codeAction(10)
> [stderr]  I[21:06:33.885] --> reply:textDocument/codeAction(10) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:33.885] <-- textDocument/documentHighlight(11)
> [stderr]  I[21:06:33.885] --> reply:textDocument/documentHighlight(11) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:33.885] <-- textDocument/hover(12)
> [stderr]  I[21:06:33.885] --> reply:textDocument/hover(12) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr]  I[21:06:33.885] <-- textDocument/signatureHelp(13)
> [stderr]  I[21:06:33.885] --> reply:textDocument/signatureHelp(13) 0 ms, error: -32602: trying to get preamble for non-added document
> 
> ```
> 
> So, I think the problem might be caused by jsonrpc.el????

I think Eglot should filter the stuff it sends to remove non-Unicode
characters.  jsonrpc.el is too low-level to do that.  Let's hear what
Joao thinks about this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Thu, 23 Oct 2025 15:44:02 GMT) Full text and rfc822 format available.

Message #14 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Thu, 23 Oct 2025 16:43:25 +0100
[Message part 1 (text/plain, inline)]
On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote:

> > From: ISouthRain <isouthrain <at> gmail.com>
> > Date: Thu, 23 Oct 2025 14:51:00 +0800
> >
> >
> > 维护者, 你们好!!
> > 请原谅我使用我的母语而不是 English 来反馈这个问题.
> > 我发现了一个问题, 有关于 eglot 启动失败的问题.
> > 如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败.
> > 在 cc-mode 使用 eglot 启动 clangd.
> > 比如 buffer 内容存在:
> > ```C
> > // Return:  true--OK   false--Error
> > ```
> > 我不确定发送到你那边的邮件这个特殊编码是否存在, 它是:
> > \5414622
> >
> > 因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符.
> > 所以我想看看这是否合理.
> >
> > 感谢你们的工作, 真的!!
>
> Translation to English:
>
> > Because I edit files in an editor outside of Emacs, this special
> character might be present.
> > So I wanted to see if this is reasonable.
> >
> > Thanks for your work, really!
>
> That character (0x161992) is beyond the last character supported by
> Unicode, which is 0x10ffff.  I suspect that the LSP you are using
> cannot cope with non-Unicode characters.


If this is true, then it's a server problem.

But maybe Eglot could
> somehow ignore it or translate it to some acceptable string?
>

Among the jobs of the LSP client, a particularly important one is to
provide the server with an accurate picture of the (saved on unsaved)
document under the clients' control, so that the server can reconstruct it
perfectly as if it had direct access to the disk.

So if Eglot is doing this job correctly (is it?) then there's nothing else
to do. The picture we report of the file shall not differ from the picture
we observe, however imperfect it may be.

So please report this to the server developers and/or make them aware of
this bug thread.

 What version of clangd are you using? How can I insert this bizarre
character into,say, a C++ file under c++-ts-mode?

João
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Fri, 24 Oct 2025 05:13:03 GMT) Full text and rfc822 format available.

Message #17 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: Rain ISouth <isouthrain <at> gmail.com>
To: João Távora <joaotavora <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Fri, 24 Oct 2025 08:46:50 +0800
[Message part 1 (text/plain, inline)]
I send `test.c` file to you.
And, clangd version.
```shell
 clangd --version
(built by Brecht Sanders, r1) clangd version 18.1.8
Features: windows
Platform: x86_64-w64-windows-gnu; target=x86_64-w64-mingw32
```

It should be clangd version independent.

João Távora <joaotavora <at> gmail.com> 于2025年10月23日周四 23:43写道:

>
>
> On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote:
>
>> > From: ISouthRain <isouthrain <at> gmail.com>
>> > Date: Thu, 23 Oct 2025 14:51:00 +0800
>> >
>> >
>> > 维护者, 你们好!!
>> > 请原谅我使用我的母语而不是 English 来反馈这个问题.
>> > 我发现了一个问题, 有关于 eglot 启动失败的问题.
>> > 如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败.
>> > 在 cc-mode 使用 eglot 启动 clangd.
>> > 比如 buffer 内容存在:
>> > ```C
>> > // Return:  true--OK   false--Error
>> > ```
>> > 我不确定发送到你那边的邮件这个特殊编码是否存在, 它是:
>> > \5414622
>> >
>> > 因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符.
>> > 所以我想看看这是否合理.
>> >
>> > 感谢你们的工作, 真的!!
>>
>> Translation to English:
>>
>> > Because I edit files in an editor outside of Emacs, this special
>> character might be present.
>> > So I wanted to see if this is reasonable.
>> >
>> > Thanks for your work, really!
>>
>> That character (0x161992) is beyond the last character supported by
>> Unicode, which is 0x10ffff.  I suspect that the LSP you are using
>> cannot cope with non-Unicode characters.
>
>
> If this is true, then it's a server problem.
>
> But maybe Eglot could
>> somehow ignore it or translate it to some acceptable string?
>>
>
> Among the jobs of the LSP client, a particularly important one is to
> provide the server with an accurate picture of the (saved on unsaved)
> document under the clients' control, so that the server can reconstruct it
> perfectly as if it had direct access to the disk.
>
> So if Eglot is doing this job correctly (is it?) then there's nothing else
> to do. The picture we report of the file shall not differ from the picture
> we observe, however imperfect it may be.
>
> So please report this to the server developers and/or make them aware of
> this bug thread.
>
>  What version of clangd are you using? How can I insert this bizarre
> character into,say, a C++ file under c++-ts-mode?
>
> João
>
[Message part 2 (text/html, inline)]
[test.c (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Fri, 24 Oct 2025 06:26:02 GMT) Full text and rfc822 format available.

Message #20 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: João Távora <joaotavora <at> gmail.com>
Cc: isouthrain <at> gmail.com, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Fri, 24 Oct 2025 09:25:26 +0300
> From: João Távora <joaotavora <at> gmail.com>
> Date: Thu, 23 Oct 2025 16:43:25 +0100
> Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org
> 
> On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote:
> 
>  That character (0x161992) is beyond the last character supported by
>  Unicode, which is 0x10ffff.  I suspect that the LSP you are using
>  cannot cope with non-Unicode characters.  
> 
> If this is true, then it's a server problem.

Sorry, I don't follow.  The error which the OP quoted:

> > And then message buffer out:
> > ```
> > Error running timer: (wrong-type-argument json-value-p "// -*- coding: utf-8; -*-

comes from Emacs.  Specifically, it comes from json.c, when it tries
to serialize the information to be sent to the LSP server as JSON
object(s).  JSON does not allow strings with characters outside of
Unicode codespace, and the offending character, 0x161992, is such a
character (Emacs supports character codepoints up to 0x3FFFFF).  See
json.c:json_out_string, where it calls string_not_unicode.  The
problematic character comes from the program source code in the
buffer.  So how can this be a server problem, when it happens entirely
on our side?

>  But maybe Eglot could
>  somehow ignore it or translate it to some acceptable string?
> 
> Among the jobs of the LSP client, a particularly important one is to provide the server with an accurate
> picture of the (saved on unsaved) document under the clients' control, so that the server can reconstruct it
> perfectly as if it had direct access to the disk.

Perhaps we could replace such characters with a string "?", for
example, when we send the program text to the server?

If that is also impossible (i.e. will break some interaction with the
server), then I would suggest that Eglot catches json-value-p errors
and produces a more user-friendly error message, like

  Buffer text includes characters outside of Unicode codespace

Does this make sense?

>  How can I insert this bizarre character into,say, a C++ file under
> c++-ts-mode?

Easy: type "C-x 8 RET 161992 RET".




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Fri, 24 Oct 2025 16:51:03 GMT) Full text and rfc822 format available.

Message #23 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: isouthrain <at> gmail.com, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Fri, 24 Oct 2025 17:51:35 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: João Távora <joaotavora <at> gmail.com>
>> Date: Thu, 23 Oct 2025 16:43:25 +0100
>> Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org
>> 
>> On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote:
>> 
>>  That character (0x161992) is beyond the last character supported by
>>  Unicode, which is 0x10ffff.  I suspect that the LSP you are using
>>  cannot cope with non-Unicode characters.  
>> 
>> If this is true, then it's a server problem.
>
> Sorry, I don't follow.  The error which the OP quoted:
>
>> > And then message buffer out:
>> > ```
>> > Error running timer: (wrong-type-argument json-value-p "// -*-
>coding: utf-8; -*-

I must have missed this.  I was commenting on your "I suspect the the
LSP you are using cannot cope", the "LSP" in question meaning "the
server" to me.

> comes from Emacs.  Specifically, it comes from json.c, when it tries
> to serialize the information to be sent to the LSP server as JSON
> object(s).  JSON does not allow strings with characters outside of
> Unicode codespace, and the offending character, 0x161992, is such a
> character (Emacs supports character codepoints up to 0x3FFFFF).  See
> json.c:json_out_string, where it calls string_not_unicode.  The
> problematic character comes from the program source code in the
> buffer.  So how can this be a server problem, when it happens entirely
> on our side?

It's a problem of ours, you're right.  I had misunderstood the problem.
But, if you're right in the "JSON does not allow...outside Unicode"
sentence, it's _also_ a problem of the server, in fact it's a problem of
the LSP protocol itself, since it is based on JSONRPC, which obviously
is based on JSON.

>> Among the jobs of the LSP client, a particularly important one is to provide the server with an accurate
>> picture of the (saved on unsaved) document under the clients' control, so that the server can reconstruct it
>> perfectly as if it had direct access to the disk.
>
> Perhaps we could replace such characters with a string "?", for
> example, when we send the program text to the server?

Not sure that wouldn't do more harm than good.  It's proving the server
with an wrong picture of the file.  A checksum of it would be wrong, for
example.  (some servers actually have access to the file on disk and to
the representation we make of the file.)

Also, if you remember vaguely, LSP counts columns in octets or code
points or something like that.  So if we send just one byte, it could
mess up the conting.

Even though realistically your idea probably would work, it's a very big
hack.  Also why not do this hack in json.c??  That's where the
limitation is (well, that's were we're closest to the source of the
limitation, since the limitation is in JSON itself, according to you.)
It's very akward to sanitize this output in Eglot.

> If that is also impossible (i.e. will break some interaction with the
> server), then I would suggest that Eglot catches json-value-p errors
> and produces a more user-friendly error message, like
>
>   Buffer text includes characters outside of Unicode codespace

This makes more sense (sorry ISouthRain).  But to confidently mention
"buffer text" in that error we'd want to surround only some
eglot--request/eglot--notify calls, which is awkward and confusing.  I'd
say it doesn't even make sense to start Eglot in such a buffer, so the
check can happen much much earlier on, maybe in eglot--managed-mode.
But even this could be tricky to do because this mode is started
automaticaly in many situations.

> Does this make sense?

Yes, I think it's starting to make sense now. Here's another option:
can't we play outside the JSON rules and send some "illegal" byte
sequences in json.c?  Maybe some servers will grok this.  At least I
think this should be attempted and the results collected for some
servers...  I can test with 3 or 4 if you provide that json.c patch.

João






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Sat, 25 Oct 2025 04:41:01 GMT) Full text and rfc822 format available.

Message #26 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: ISouthRain <isouthrain <at> gmail.com>
To: João Távora <joaotavora <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Sat, 25 Oct 2025 12:39:43 +0800
João Távora <joaotavora <at> gmail.com> writes:

> Eli Zaretskii <eliz <at> gnu.org> writes:
>
>>> From: João Távora <joaotavora <at> gmail.com>
>>> Date: Thu, 23 Oct 2025 16:43:25 +0100
>>> Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org
>>> 
>>> On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote:
>>> 
>>>  That character (0x161992) is beyond the last character supported by
>>>  Unicode, which is 0x10ffff.  I suspect that the LSP you are using
>>>  cannot cope with non-Unicode characters.  
>>> 
>>> If this is true, then it's a server problem.
>>
>> Sorry, I don't follow.  The error which the OP quoted:
>>
>>> > And then message buffer out:
>>> > ```
>>> > Error running timer: (wrong-type-argument json-value-p "// -*-
>>coding: utf-8; -*-
>
> I must have missed this.  I was commenting on your "I suspect the the
> LSP you are using cannot cope", the "LSP" in question meaning "the
> server" to me.
>
>> comes from Emacs.  Specifically, it comes from json.c, when it tries
>> to serialize the information to be sent to the LSP server as JSON
>> object(s).  JSON does not allow strings with characters outside of
>> Unicode codespace, and the offending character, 0x161992, is such a
>> character (Emacs supports character codepoints up to 0x3FFFFF).  See
>> json.c:json_out_string, where it calls string_not_unicode.  The
>> problematic character comes from the program source code in the
>> buffer.  So how can this be a server problem, when it happens entirely
>> on our side?
>
> It's a problem of ours, you're right.  I had misunderstood the problem.
> But, if you're right in the "JSON does not allow...outside Unicode"
> sentence, it's _also_ a problem of the server, in fact it's a problem of
> the LSP protocol itself, since it is based on JSONRPC, which obviously
> is based on JSON.
>
>>> Among the jobs of the LSP client, a particularly important one is to provide the server with an accurate
>>> picture of the (saved on unsaved) document under the clients' control, so that the server can reconstruct it
>>> perfectly as if it had direct access to the disk.
>>
>> Perhaps we could replace such characters with a string "?", for
>> example, when we send the program text to the server?
>
> Not sure that wouldn't do more harm than good.  It's proving the server
> with an wrong picture of the file.  A checksum of it would be wrong, for
> example.  (some servers actually have access to the file on disk and to
> the representation we make of the file.)
>
> Also, if you remember vaguely, LSP counts columns in octets or code
> points or something like that.  So if we send just one byte, it could
> mess up the conting.
>
> Even though realistically your idea probably would work, it's a very big
> hack.  Also why not do this hack in json.c??  That's where the
> limitation is (well, that's were we're closest to the source of the
> limitation, since the limitation is in JSON itself, according to you.)
> It's very akward to sanitize this output in Eglot.
>
>> If that is also impossible (i.e. will break some interaction with the
>> server), then I would suggest that Eglot catches json-value-p errors
>> and produces a more user-friendly error message, like
>>
>>   Buffer text includes characters outside of Unicode codespace
>
> This makes more sense (sorry ISouthRain).  But to confidently mention
> "buffer text" in that error we'd want to surround only some
> eglot--request/eglot--notify calls, which is awkward and confusing.  I'd
> say it doesn't even make sense to start Eglot in such a buffer, so the
> check can happen much much earlier on, maybe in eglot--managed-mode.
> But even this could be tricky to do because this mode is started
> automaticaly in many situations.

The issues you're discussing are beyond my understanding, so please excuse my inability to help.

However, it makes sense to launch LSP even with an "erroneous character," even if that character is meaningless.

If I have a very large .c file with a lot of content, it would be frustrating if I couldn't use eglot properly just because of this "erroneous character."

Because I have to delete this "erroneous character" to start eglot normally, but I don't know where this "erroneous character" is, or what it looks like, so I can't find it and delete it...

There is no clear source for how the "erroneous character" is produced. I only know that the "erroneous character" comes from editing the file with "other editors."

>
>> Does this make sense?
>
> Yes, I think it's starting to make sense now. Here's another option:
> can't we play outside the JSON rules and send some "illegal" byte
> sequences in json.c?  Maybe some servers will grok this.  At least I
> think this should be attempted and the results collected for some
> servers...  I can test with 3 or 4 if you provide that json.c patch.
>
> João




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Sat, 25 Oct 2025 07:08:02 GMT) Full text and rfc822 format available.

Message #29 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: João Távora <joaotavora <at> gmail.com>,
 Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: isouthrain <at> gmail.com, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Sat, 25 Oct 2025 10:07:22 +0300
> From: João Távora <joaotavora <at> gmail.com>
> Cc: isouthrain <at> gmail.com,  79682 <at> debbugs.gnu.org
> Date: Fri, 24 Oct 2025 17:51:35 +0100
> 
> > Perhaps we could replace such characters with a string "?", for
> > example, when we send the program text to the server?
> 
> Not sure that wouldn't do more harm than good.  It's proving the server
> with an wrong picture of the file.  A checksum of it would be wrong, for
> example.  (some servers actually have access to the file on disk and to
> the representation we make of the file.)
> 
> Also, if you remember vaguely, LSP counts columns in octets or code
> points or something like that.  So if we send just one byte, it could
> mess up the conting.

Maybe, but only if Emacs considers the original character to take more
than one column.  So to countermand that, we could send a string of
one or 2 "?", depending on what (char-width CHARACTER) returns for the
offending CHARACTER.  In this case, the 0x161992 character is
considered by Emacs to take 1 column, so a single "?" should be okay
in this regard.  But see below.

> Even though realistically your idea probably would work, it's a very big
> hack.  Also why not do this hack in json.c??  That's where the
> limitation is (well, that's were we're closest to the source of the
> limitation, since the limitation is in JSON itself, according to you.)
> It's very akward to sanitize this output in Eglot.

Maybe.  At the time, the people who worked on json.c (Philipp, AFAIR)
were very adamant that we should flatly reject non-Unicode characters.

I suggested to do this in Eglot, because this is about LSP use of
JSON, not about JSON in general.  So what the LSP servers do and allow
is also an important factor, unlike with JSON in general.  But perhaps
we could make this behavior optional in json.c, subject to some
variable exposed to Lisp that Eglot could set?  Philipp, any comments
or ideas?

Can someone test what other LSP clients do in these situations?  For
example, what does VSCode do with such source files?  If it does
somehow allow using LPS for such source files, it would be interesting
to know what does its LSP client send to the server in those cases.
Then perhaps we could teach json.c do the same.

> > If that is also impossible (i.e. will break some interaction with the
> > server), then I would suggest that Eglot catches json-value-p errors
> > and produces a more user-friendly error message, like
> >
> >   Buffer text includes characters outside of Unicode codespace
> 
> This makes more sense (sorry ISouthRain).  But to confidently mention
> "buffer text" in that error we'd want to surround only some
> eglot--request/eglot--notify calls, which is awkward and confusing.  I'd
> say it doesn't even make sense to start Eglot in such a buffer, so the
> check can happen much much earlier on, maybe in eglot--managed-mode.
> But even this could be tricky to do because this mode is started
> automaticaly in many situations.

Well, AFAIU "not starting Eglot" is what happens already by default,
because of that error we raise.  So if we decide to do that, it means
there's nothing to do here except document the restriction better.

> > Does this make sense?
> 
> Yes, I think it's starting to make sense now. Here's another option:
> can't we play outside the JSON rules and send some "illegal" byte
> sequences in json.c?  Maybe some servers will grok this.  At least I
> think this should be attempted and the results collected for some
> servers...  I can test with 3 or 4 if you provide that json.c patch.

This would need someone who knows more about LSP servers than I do to
suggest a specific trick.  Which is why I asked above what other LSP
clients do in such cases.

In general, sending invalid sequences is a riskier solution, because
some servers might be unable to cope.  For example, I'm guessing that
servers written in Python will flatly reject such sequences because
AFAIK Python cannot support them.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Sat, 25 Oct 2025 07:29:02 GMT) Full text and rfc822 format available.

Message #32 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: ISouthRain <isouthrain <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Sat, 25 Oct 2025 08:28:00 +0100
[Message part 1 (text/plain, inline)]
>
> On Sat, Oct 25, 2025, 05:39 ISouthRain <isouthrain <at> gmail.com> wrote:

Because I have to delete this "erroneous character" to start eglot
> normally, but I don't know where this "erroneous character" is, or what it
> looks like, so I can't find it and delete it...


I'm moderately sure Emacs has facilities for sanitizing buffers, i.e.
finding and replacing codepoints  outside certain encoding ranges in the
whole buffer. I myself use this from time to time, but I'm not an expert.
Eli can help here.

João
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Sat, 25 Oct 2025 09:57:01 GMT) Full text and rfc822 format available.

Message #35 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: ISouthRain <isouthrain <at> gmail.com>
Cc: joaotavora <at> gmail.com, 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Sat, 25 Oct 2025 12:56:16 +0300
> From: ISouthRain <isouthrain <at> gmail.com>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  79682 <at> debbugs.gnu.org
> Date: Sat, 25 Oct 2025 12:39:43 +0800
> 
> However, it makes sense to launch LSP even with an "erroneous character," even if that character is meaningless.
> 
> If I have a very large .c file with a lot of content, it would be frustrating if I couldn't use eglot properly just because of this "erroneous character."

Can you see what other IDEs that support LSP do in these cases?  If
they have some way of working around the problem, we can try doing the
same.  E.g., what does VSCode do if you open such a .C file and invoke
some command that needs to send the source to an LSP server?

> Because I have to delete this "erroneous character" to start eglot normally, but I don't know where this "erroneous character" is, or what it looks like, so I can't find it and delete it...

Finding them is easy, like so:

  M-: (skip-chars-forward "\x000-\x10ffff") RET

It will stop at the first problematic character.
Keep doing this until you get to end of buffer.

You can also replace them using C-M-%.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#79682; Package emacs. (Sun, 26 Oct 2025 08:39:02 GMT) Full text and rfc822 format available.

Message #38 received at 79682 <at> debbugs.gnu.org (full text, mbox):

From: João Távora <joaotavora <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: isouthrain <at> gmail.com, Philipp Stephani <p.stephani2 <at> gmail.com>,
 79682 <at> debbugs.gnu.org
Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing
 special characters.
Date: Sun, 26 Oct 2025 08:39:57 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

> Maybe.  At the time, the people who worked on json.c (Philipp, AFAIR)
> were very adamant that we should flatly reject non-Unicode characters.

Now you mention, I do remember much adamance, but I never got the point.

> I suggested to do this in Eglot, because this is about LSP use of
> JSON, not about JSON in general.

Not quite.  This is about JSON, the data interchange format, being used
to interchange buffers or snippets of emacs buffers.  That's not an
exclusive of LSP.  Possibly (probably?) eer other jsonrpc.el applicaiton
in the wild does something like that (dape the DAP client, the countless
copilot/AI plugins).  So I really don't think Eglot is the place to put
this exception.

> Can someone test what other LSP clients do in these situations?  For
> example, what does VSCode do with such source files?  If it does
> somehow allow using LPS for such source files, it would be interesting
> to know what does its LSP client send to the server in those cases.
> Then perhaps we could teach json.c do the same.

Agree.  Also look at NeoVim, seems closer to the Emacs ecosystem.

> Well, AFAIU "not starting Eglot" is what happens already by default,
> because of that error we raise.  So if we decide to do that, it means
> there's nothing to do here except document the restriction better.

If the backtrace of the error which I missed can be reiterated here we
could maybe make it a nicer message.

> This would need someone who knows more about LSP servers than I do to
> suggest a specific trick.  Which is why I asked above what other LSP
> clients do in such cases.
>
> In general, sending invalid sequences is a riskier solution, because
> some servers might be unable to cope.  For example, I'm guessing that
> servers written in Python will flatly reject such sequences because
> AFAIK Python cannot support them.

LSP has decent error reporting support.  Eglot/jsonrpc models it well
IMO.  I think the situation of a server rejecting an LSP document would
be much cleaner.

João




This bug report was last modified 12 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.