Package: emacs;
Reported by: ISouthRain <isouthrain <at> gmail.com>
Date: Thu, 23 Oct 2025 07:09:02 UTC
Severity: normal
Found in version 31.0.50
To reply to this bug, email your comments to 79682 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Thu, 23 Oct 2025 07:09:02 GMT) Full text and rfc822 format available.ISouthRain <isouthrain <at> gmail.com>:bug-gnu-emacs <at> gnu.org.
(Thu, 23 Oct 2025 07:09:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: ISouthRain <isouthrain <at> gmail.com> To: bug-gnu-emacs <at> gnu.org Subject: 31.0.50; Eglot does not support buffers containing special characters. Date: Thu, 23 Oct 2025 14:51:00 +0800
维护者, 你们好!! 请原谅我使用我的母语而不是 English 来反馈这个问题. 我发现了一个问题, 有关于 eglot 启动失败的问题. 如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败. 在 cc-mode 使用 eglot 启动 clangd. 比如 buffer 内容存在: ```C // Return: true--OK false--Error ``` 我不确定发送到你那边的邮件这个特殊编码是否存在, 它是: \5414622 因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符. 所以我想看看这是否合理. 感谢你们的工作, 真的!! In GNU Emacs 31.0.50 (build 1, x86_64-w64-mingw32) of 2025-09-05 built on runnervmp943j Repository revision: 8589735be509190060f82398da69cf6d9d659434 Repository branch: HEAD Windowing system distributor 'Microsoft Corp.', version 10.0.19045 System Description: Microsoft Windows 10 Enterprise (v10.0.2009.19045.3324) Configured using: 'configure --prefix=/d/a/emacs-build/emacs-build/pkg/8589735-ucrt-x86_64 'CFLAGS=-O2 -fno-semantic-interposition -floop-parallelize-all -ftree-parallelize-loops=4 -g -pipe ' --disable-build-details --without-dbus --enable-build-details --with-compress-install --with-cairo --with-gif --with-gnutls --with-harfbuzz --with-jpeg --with-json --with-lcms2 --with-mps --with-native-compilation --with-png --with-rsvg --with-tiff --with-tree-sitter --with-xml2 --with-xpm --with-zlib' Configured features: ACL GIF GMP GNUTLS HARFBUZZ JPEG LCMS2 LIBXML2 MODULES MPS NATIVE_COMP NOTIFY W32NOTIFY PDUMPER PNG RSVG SOUND SQLITE3 THREADS TIFF TOOLKIT_SCROLL_BARS TREE_SITTER WEBP XPM ZLIB Important settings: value of $LANG: CHS locale-coding-system: cp936 Major mode: C// Minor modes in effect: treesit-fold-mode: t eglot-tempel-mode: t eglot-inactive-regions-mode: t eglot-booster-mode: t eglot--managed-mode: t flymake-mode: t gptel-watch-mode: t annotate-mode: t subword-mode: t which-function-mode: t server-mode: t auto-image-file-mode: t global-dict-line-mode: t dict-line-mode: t which-key-mode: t winner-mode: t indent-bars-mode: t rainbow-delimiters-mode: t eldoc-box-hover-mode: t display-time-mode: t global-auto-revert-mode: t save-place-mode: t recentf-mode: t global-hl-line-mode: t global-whitespace-mode: t whitespace-mode: t savehist-mode: t electric-pair-mode: t pixel-scroll-precision-mode: t empx-mode: t vertico-mode: t marginalia-mode: t undo-fu-session-global-mode: t undo-fu-session-mode: t super-save-mode: t global-hl-todo-mode: t hl-todo-mode: t global-page-break-lines-mode: t page-break-lines-mode: t dape-breakpoint-global-mode: t dape-many-windows: t hexl-follow-ascii: t repeat-mode: t term-keys-mode: t corfu-history-mode: t corfu-popupinfo-mode: t global-corfu-mode: t corfu-mode: t global-display-fill-column-indicator-mode: t display-fill-column-indicator-mode: t elpaca-use-package-mode: t override-global-mode: t elpaca-no-symlink-mode: t global-eldoc-mode: t eldoc-mode: t show-paren-mode: t electric-indent-mode: t mouse-wheel-mode: t file-name-shadow-mode: t global-font-lock-mode: t font-lock-mode: t blink-cursor-mode: t window-divider-mode: t minibuffer-regexp-mode: t column-number-mode: t line-number-mode: t global-visual-line-mode: t visual-line-mode: t transient-mark-mode: t auto-composition-mode: t auto-encryption-mode: t auto-compression-mode: t Load-path shadows: c:/Users/adminuirs/AppData/Roaming/.emacs.d/elpaca/builds/transient/transient hides f:/App/emacs/share/emacs/31.0.50/lisp/transient Features: (shadow sort mail-extr consult-imenu emacsbug lisp-mnt vertico-directory consult-xref markdown-mode treesit-fold treesit-fold-summary treesit-fold-parsers treesit-fold-util c++-ts-mode c-ts-mode c-ts-common eglot-tempel peg eglot-inactive-regions eglot-booster eglot external-completion diff ert ewoc flymake gptel-watch gptel-rewrite gptel-transient gptel-gemini gptel-org gptel-prompter gptel gptel-openai annotate ibuffer ibuffer-loaddefs consult bookmark undo-fu string-inflection tabify ffap ispell vc-git diff-mode vc-dispatcher org-appear oc-basic bibtex expreg cap-words superword subword tempel-collection tempel ace-window avy posframe jka-compr vertico-sort helpful cc-langs trace cl-print edebug debug backtrace info-look info f help-fns radix-tree elisp-refs dash add-log which-func imenu server org-protocol-capture-html eww track-changes vtable mule-util url-queue mm-url gnus-demon image-file image-converter gnus-agent gnus-srvr gnus-score score-mode nnvirtual gnus-msg gnus-art mm-uu mml2015 mm-view mml-smime smime dig nntp gnus-cache gnus-sum shr pixel-fill kinsoku url-file svg dom gnus-group gnus-undo gnus-start gnus-dbus dbus xml gnus-cloud nnimap nnmail browse-url mail-source utf7 nnoo gnus-spec gnus-int gnus-range message sendmail yank-media dired dired-loaddefs rfc822 mml mml-sec epa epg rfc6068 epg-config mm-decode mm-bodies mm-encode mailabbrev gmm-utils mailheader gnus-win gnus nnheader gnus-util range s org-protocol org-capture dict-line org-limit-image-size which-key winner indent-bars rainbow-delimiters eldoc-box time autorevert filenotify saveplace tramp-cache time-stamp tramp-sh recentf tree-widget hl-line whitespace savehist elec-pair pixel-scroll cua-base empx xref vertico marginalia undo-fu-session super-save hl-todo diminish page-break-lines dape jsonrpc tramp trampver tramp-integration files-x tramp-message tramp-compat tramp-loaddefs hexl gdb-mi bindat gud compile text-property-search repeat pulse face-remap color term-keys transient corfu-history corfu-popupinfo corfu pinyinlib orderless modus-operandi-tinted-theme modus-themes derived cal-julian theme-changer solar cal-dst term-keys-autoloads copilot-autoloads eca-autoloads gptel-autoloads emms-autoloads ement-autoloads persist-autoloads plz-autoloads taxy-magit-section-autoloads taxy-autoloads svg-lib-autoloads visual-fill-column-autoloads crdt-autoloads string-inflection-autoloads markdown-toc-autoloads annotate-autoloads eglot-tempel-autoloads tempel-collection-autoloads tempel-autoloads cargo-autoloads markdown-mode-autoloads rust-mode-autoloads eglot-inactive-regions-autoloads eglot-booster-autoloads eldoc-box-autoloads indent-bars-autoloads treesit-fold-autoloads dape-autoloads page-break-lines-autoloads pyim-basedict-autoloads pyim-autoloads async-autoloads xr-autoloads posframe-autoloads magit-autoloads with-editor-autoloads transient-autoloads flyspell-correct-avy-menu-autoloads avy-menu-autoloads flyspell-correct-autoloads google-translate-autoloads popup-autoloads corfu-english-helper-autoloads corfu-autoloads ace-window-autoloads avy-autoloads expreg-autoloads rainbow-delimiters-autoloads super-save-autoloads undo-fu-session-autoloads undo-fu-autoloads deadgrep-autoloads spinner-autoloads org-appear-autoloads ox-pandoc-autoloads ht-autoloads org-cliplink-autoloads org-roam-autoloads emacsql-autoloads magit-section-autoloads cond-let-autoloads llama-autoloads ox-hugo-autoloads tomelr-autoloads org-protocol-capture-html-autoloads embark-consult-autoloads embark-autoloads marginalia-autoloads consult-todo-autoloads hl-todo-autoloads consult-autoloads pinyinlib-autoloads orderless-autoloads vertico-autoloads helpful-autoloads f-autoloads elisp-refs-autoloads dash-autoloads s-autoloads theme-changer-autoloads diminish-autoloads parse-time iso8601 mail-utils gnutls network-stream url-http mail-parse rfc2231 rfc2047 rfc2045 mm-util ietf-drums mail-prsvr url-gw nsm puny url-cache url-auth elpaca-menu-melpa elpaca-menu-org ob-latex ob-python python compat pcase ob-shell shell ob-C cc-mode cc-fonts cc-guess cc-menus cc-cmds cc-styles cc-align cc-engine cc-vars cc-defs ox-odt rng-loc rng-uri rng-parse rng-match rng-dt rng-util rng-pttrn nxml-parse nxml-ns nxml-enc xmltok nxml-util ox-latex ox-icalendar org-agenda ox-html table ox-ascii ox-publish ox org-attach org-element org-persist xdg org-id org-refile org-element-ast inline avl-tree generator org ob ob-tangle ob-ref ob-lob ob-table ob-exp org-macro org-src sh-script smie treesit executable ob-comint org-pcomplete pcomplete comint ansi-osc ansi-color ring org-list org-footnote org-faces org-entities time-date noutline outline ob-emacs-lisp ob-core ob-eval org-cycle org-table ol org-fold org-fold-core org-keys oc org-loaddefs thingatpt find-func cal-menu calendar cal-loaddefs org-version org-compat org-macs format-spec project edmacro kmacro display-fill-column-indicator comp comp-cstr warnings comp-run comp-common rx cus-edit pp cus-start cus-load wid-edit cl-extra help-mode elpaca-use-package use-package use-package-ensure use-package-delight use-package-diminish use-package-bind-key bind-key easy-mmode use-package-core elpaca-use-package-autoloads elpaca-log elpaca-ui url url-proxy url-privacy url-expand url-methods url-history url-cookie generate-lisp-file url-domsuf url-util url-parse auth-source eieio eieio-core cl-macs icons password-cache json subr-x map byte-opt gv bytecomp byte-compile url-vars mailcap cl-seq elpaca elpaca-process cl-loaddefs cl-lib elpaca-autoloads china-util rmc iso-transl tooltip cconv eldoc paren electric uniquify ediff-hook vc-hooks lisp-float-type elisp-mode mwheel touch-screen dos-w32 ls-lisp term/w32-nt disp-table term/w32-win w32-win w32-vars term/common-win tool-bar dnd fontset image regexp-opt fringe tabulated-list replace newcomment text-mode lisp-mode prog-mode register page tab-bar menu-bar rfn-eshadow isearch easymenu timer select scroll-bar mouse jit-lock font-lock syntax font-core term/tty-colors frame minibuffer nadvice seq simple cl-generic indonesian philippine cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european ethiopic indian cyrillic chinese composite emoji-zwj charscript charprop case-table epa-hook jka-cmpr-hook help abbrev obarray oclosure cl-preloaded button loaddefs theme-loaddefs faces cus-face macroexp files window text-properties overlay sha1 md5 base64 format env code-pages mule custom widget keymap hashtable-print-readable backquote threads w32notify w32 lcms2 multi-tty move-toolbar make-network-process tty-child-frames native-compile mps emacs) Memory information: ((conses 24 0 0) (symbols 56 0 0) (strings 40 0 0) (string-bytes 1 0) (vectors 24 0) (vector-slots 8 0 0) (floats 24 0 0) (intervals 64 0 0) (buffers 1072 0))
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Thu, 23 Oct 2025 08:32:02 GMT) Full text and rfc822 format available.Message #8 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: ISouthRain <isouthrain <at> gmail.com>, joaotavora <at> gmail.com Cc: 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Thu, 23 Oct 2025 11:31:19 +0300
> From: ISouthRain <isouthrain <at> gmail.com> > Date: Thu, 23 Oct 2025 14:51:00 +0800 > > > 维护者, 你们好!! > 请原谅我使用我的母语而不是 English 来反馈这个问题. > 我发现了一个问题, 有关于 eglot 启动失败的问题. > 如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败. > 在 cc-mode 使用 eglot 启动 clangd. > 比如 buffer 内容存在: > ```C > // Return: true--OK false--Error > ``` > 我不确定发送到你那边的邮件这个特殊编码是否存在, 它是: > \5414622 > > 因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符. > 所以我想看看这是否合理. > > 感谢你们的工作, 真的!! Translation to English: > Hello maintainers!! > Please forgive me for reporting this issue in my native language instead of English. > I've discovered an issue with eglot failing to start. > If the buffer contains special characters, eglot will fail to start. > Use eglot in cc-mode to start clangd. > For example, if the buffer contains: > ```C > // Return: true--OK false--Error > ``` > I'm not sure if the special encoding used in the email sent to you is: > \5414622 > > Because I edit files in an editor outside of Emacs, this special character might be present. > So I wanted to see if this is reasonable. > > Thanks for your work, really! That character (0x161992) is beyond the last character supported by Unicode, which is 0x10ffff. I suspect that the LSP you are using cannot cope with non-Unicode characters. But maybe Eglot could somehow ignore it or translate it to some acceptable string? Can you show what does "eglot will fail to start" really mean? Do you see some error message or something? Joao, any suggestions?
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Thu, 23 Oct 2025 13:30:02 GMT) Full text and rfc822 format available.Message #11 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: ISouthRain <isouthrain <at> gmail.com> Cc: joaotavora <at> gmail.com, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Thu, 23 Oct 2025 16:29:15 +0300
[Please always use Reply to All to reply, to keep everyone CC'ed.]
> From: ISouthRain <isouthrain <at> gmail.com>
> Date: Thu, 23 Oct 2025 21:16:32 +0800
>
> Eli Zaretskii <eliz <at> gnu.org> writes:
>
> > That character (0x161992) is beyond the last character supported by
> > Unicode, which is 0x10ffff. I suspect that the LSP you are using
> > cannot cope with non-Unicode characters. But maybe Eglot could
> > somehow ignore it or translate it to some acceptable string?
> >
> > Can you show what does "eglot will fail to start" really mean? Do you
> > see some error message or something?
> >
> > Joao, any suggestions?
>
> Start Test:
>
> emacs -Q
>
> Open test.c:
> ```C
> // -*- coding: utf-8; -*-
>
> // Return: true--OK false--Error
> ```
> And then message buffer out:
> ```
> Error running timer: (wrong-type-argument json-value-p "// -*- coding: utf-8; -*-
This is expected: JSON cannot handle non-Unicode characters.
>
> // Return: true--OK false--Error
> ")
> ```
>
> In them mode-line show [eglot:MyProject/error]
>
> Here's the content of the `eglot-events-buffer`:
> ```
> [jsonrpc] D[21:06:15.844] Running language server: f:/App/Scoop/apps/mingw-winlibs-llvm-ucrt/current/bin/clangd.exe
> [jsonrpc] e[21:06:15.846] --> initialize[1] {"jsonrpc":"2.0","id":1,"method":"initialize","params":{"processId":13204,"clientInfo":{"name":"Eglot","version":"1.18"},"rootPath":"c:/Users/Jack/Desktop/C/","rootUri":"file:///c%3A/Users/Jack/Desktop/C","initializationOptions":{},"capabilities":{"workspace":{"applyEdit":true,"executeCommand":{"dynamicRegistration":false},"workspaceEdit":{"documentChanges":true},"didChangeWatchedFiles":{"dynamicRegistration":true},"symbol":{"dynamicRegistration":false},"configuration":true,"workspaceFolders":true},"textDocument":{"synchronization":{"dynamicRegistration":false,"willSave":true,"willSaveWaitUntil":true,"didSave":true},"completion":{"dynamicRegistration":false,"completionItem":{"snippetSupport":false,"deprecatedSupport":true,"resolveSupport":{"properties":["documentation","details","additionalTextEdits"]},"tagSupport":{"valueSet":[1]},"insertReplaceSupport":true},"contextSupport":true},"hover":{"dynamicRegistration":false,"contentFormat":["plaintext"]},"signatureHelp":{"dynamicRegistration":false,"signatureInformation":{"parameterInformation":{"labelOffsetSupport":true},"documentationFormat":["plaintext"],"activeParameterSupport":true}},"references":{"dynamicRegistration":false},"definition":{"dynamicRegistration":false,"linkSupport":true},"declaration":{"dynamicRegistration":false,"linkSupport":true},"implementation":{"dynamicRegistration":false,"linkSupport":true},"typeDefinition":{"dynamicRegistration":false,"linkSupport":true},"documentSymbol":{"dynamicRegistration":false,"hierarchicalDocumentSymbolSupport":true,"symbolKind":{"valueSet":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]}},"documentHighlight":{"dynamicRegistration":false},"codeAction":{"dynamicRegistration":false,"resolveSupport":{"properties":["edit","command"]},"dataSupport":true,"codeActionLiteralSupport":{"codeActionKind":{"valueSet":["quickfix","refactor","refactor.extract","refactor.inline","refactor.rewrite","source","source.organizeImports"]}},"isPreferredSupport":true},"formatting":{"dynamicRegistration":false},"rangeFormatting":{"dynamicRegistration":false},"rename":{"dynamicRegistration":false},"inlayHint":{"dynamicRegistration":false},"callHierarchy":{"dynamicRegistration":false},"typeHierarchy":{"dynamicRegistration":false},"publishDiagnostics":{"relatedInformation":false,"versionSupport":true,"codeDescriptionSupport":false,"tagSupport":{"valueSet":[1,2]}}},"window":{"showDocument":{"support":true},"showMessage":{"messageActionItem":{"additionalPropertiesSupport":true}},"workDoneProgress":true},"general":{"positionEncodings":["utf-32","utf-8","utf-16"]},"experimental":{}},"workspaceFolders":[{"uri":"file:///c%3A/Users/Jack/Desktop/C","name":"c:/Users/Jack/Desktop/C/"}]}}
> [stderr] I[21:06:15.884] (built by Brecht Sanders, r1) clangd version 18.1.8
> [stderr] I[21:06:15.884] Features: windows
> [stderr] I[21:06:15.884] PID: 18176
> [stderr] I[21:06:15.884] Working directory: c:/Users/Jack/Desktop/C
> [stderr] I[21:06:15.884] argv[0]: f:/App/Scoop/apps/mingw-winlibs-llvm-ucrt/current/bin/clangd.exe
> [stderr] I[21:06:15.886] Starting LSP over stdin/stdout
> [stderr] I[21:06:15.886] <-- initialize(1)
> [stderr] I[21:06:15.888] --> reply:initialize(1) 1 ms
> [jsonrpc] e[21:06:15.888] <-- initialize[1] {"id":1,"jsonrpc":"2.0","result":{"capabilities":{"astProvider":true,"callHierarchyProvider":true,"clangdInlayHintsProvider":true,"codeActionProvider":{"codeActionKinds":["quickfix","refactor","info"]},"compilationDatabase":{"automaticReload":true},"completionProvider":{"resolveProvider":false,"triggerCharacters":[".","<",">",":","\"","/","*"]},"declarationProvider":true,"definitionProvider":true,"documentFormattingProvider":true,"documentHighlightProvider":true,"documentLinkProvider":{"resolveProvider":false},"documentOnTypeFormattingProvider":{"firstTriggerCharacter":"\n","moreTriggerCharacter":[]},"documentRangeFormattingProvider":true,"documentSymbolProvider":true,"executeCommandProvider":{"commands":["clangd.applyFix","clangd.applyTweak"]},"foldingRangeProvider":true,"hoverProvider":true,"implementationProvider":true,"inactiveRegionsProvider":true,"inlayHintProvider":true,"memoryUsageProvider":true,"referencesProvider":true,"renameProvider":true,"selectionRangeProvider":true,"semanticTokensProvider":{"full":{"delta":true},"legend":{"tokenModifiers":["declaration","definition","deprecated","deduced","readonly","static","abstract","virtual","dependentName","defaultLibrary","usedAsMutableReference","usedAsMutablePointer","constructorOrDestructor","userDefined","functionScope","classScope","fileScope","globalScope"],"tokenTypes":["variable","variable","parameter","function","method","function","property","variable","class","interface","enum","enumMember","type","type","unknown","namespace","typeParameter","concept","type","macro","modifier","operator","bracket","label","comment"]},"range":false},"signatureHelpProvider":{"triggerCharacters":["(",")","{","}","<",">",","]},"standardTypeHierarchyProvider":true,"textDocumentSync":{"change":2,"openClose":true,"save":true},"typeDefinitionProvider":true,"typeHierarchyProvider":true,"workspaceSymbolProvider":true},"serverInfo":{"name":"clangd","version":"(built by Brecht Sanders, r1) clangd version 18.1.8 windows x86_64-w64-windows-gnu; target=x86_64-w64-mingw32"}}}
> [jsonrpc] e[21:06:15.888] --> initialized {"jsonrpc":"2.0","method":"initialized","params":{}}
> [stderr] I[21:06:15.888] <-- initialized
> [jsonrpc] e[21:06:16.438] --> textDocument/codeAction[2] {"jsonrpc":"2.0","id":2,"method":"textDocument/codeAction","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"range":{"start":{"line":0,"character":0},"end":{"line":3,"character":0}},"context":{"diagnostics":[],"triggerKind":2}}}
> [jsonrpc] e[21:06:16.438] --> textDocument/documentHighlight[3] {"jsonrpc":"2.0","id":3,"method":"textDocument/documentHighlight","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:16.438] --> textDocument/hover[4] {"jsonrpc":"2.0","id":4,"method":"textDocument/hover","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:16.438] --> textDocument/signatureHelp[5] {"jsonrpc":"2.0","id":5,"method":"textDocument/signatureHelp","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:16.440] <-- textDocument/codeAction[2] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":2,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:16.440] <-- textDocument/documentHighlight[3] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":3,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:16.440] <-- textDocument/hover[4] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":4,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:16.440] <-- textDocument/signatureHelp[5] {"error":{"code":-32602,"message":"trying to get preamble for non-added document"},"id":5,"jsonrpc":"2.0"}
> [stderr] I[21:06:16.438] <-- textDocument/codeAction(2)
> [stderr] I[21:06:16.438] --> reply:textDocument/codeAction(2) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:16.438] <-- textDocument/documentHighlight(3)
> [stderr] I[21:06:16.438] --> reply:textDocument/documentHighlight(3) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:16.438] <-- textDocument/hover(4)
> [stderr] I[21:06:16.438] --> reply:textDocument/hover(4) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:16.438] <-- textDocument/signatureHelp(5)
> [stderr] I[21:06:16.438] --> reply:textDocument/signatureHelp(5) 0 ms, error: -32602: trying to get preamble for non-added document
> [jsonrpc] e[21:06:24.039] --> textDocument/codeAction[6] {"jsonrpc":"2.0","id":6,"method":"textDocument/codeAction","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"range":{"start":{"line":0,"character":0},"end":{"line":3,"character":0}},"context":{"diagnostics":[],"triggerKind":2}}}
> [jsonrpc] e[21:06:24.039] --> textDocument/documentHighlight[7] {"jsonrpc":"2.0","id":7,"method":"textDocument/documentHighlight","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:24.039] --> textDocument/hover[8] {"jsonrpc":"2.0","id":8,"method":"textDocument/hover","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:24.039] --> textDocument/signatureHelp[9] {"jsonrpc":"2.0","id":9,"method":"textDocument/signatureHelp","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":0,"character":3}}}
> [jsonrpc] e[21:06:24.060] <-- textDocument/codeAction[6] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":6,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:24.060] <-- textDocument/documentHighlight[7] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":7,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:24.060] <-- textDocument/hover[8] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":8,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:24.060] <-- textDocument/signatureHelp[9] {"error":{"code":-32602,"message":"trying to get preamble for non-added document"},"id":9,"jsonrpc":"2.0"}
> [stderr] I[21:06:24.039] <-- textDocument/codeAction(6)
> [stderr] I[21:06:24.039] --> reply:textDocument/codeAction(6) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:24.039] <-- textDocument/documentHighlight(7)
> [stderr] I[21:06:24.039] --> reply:textDocument/documentHighlight(7) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:24.039] <-- textDocument/hover(8)
> [stderr] I[21:06:24.039] --> reply:textDocument/hover(8) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:24.039] <-- textDocument/signatureHelp(9)
> [stderr] I[21:06:24.039] --> reply:textDocument/signatureHelp(9) 0 ms, error: -32602: trying to get preamble for non-added document
> [jsonrpc] e[21:06:33.885] --> textDocument/codeAction[10] {"jsonrpc":"2.0","id":10,"method":"textDocument/codeAction","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"range":{"start":{"line":0,"character":0},"end":{"line":3,"character":0}},"context":{"diagnostics":[],"triggerKind":2}}}
> [jsonrpc] e[21:06:33.885] --> textDocument/documentHighlight[11] {"jsonrpc":"2.0","id":11,"method":"textDocument/documentHighlight","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":3,"character":0}}}
> [jsonrpc] e[21:06:33.885] --> textDocument/hover[12] {"jsonrpc":"2.0","id":12,"method":"textDocument/hover","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":3,"character":0}}}
> [jsonrpc] e[21:06:33.885] --> textDocument/signatureHelp[13] {"jsonrpc":"2.0","id":13,"method":"textDocument/signatureHelp","params":{"textDocument":{"uri":"file:///c%3A/Users/Jack/Desktop/C/test.c"},"position":{"line":3,"character":0}}}
> [jsonrpc] e[21:06:33.923] <-- textDocument/codeAction[10] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":10,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:33.923] <-- textDocument/documentHighlight[11] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":11,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:33.923] <-- textDocument/hover[12] {"error":{"code":-32602,"message":"trying to get AST for non-added document"},"id":12,"jsonrpc":"2.0"}
> [jsonrpc] e[21:06:33.923] <-- textDocument/signatureHelp[13] {"error":{"code":-32602,"message":"trying to get preamble for non-added document"},"id":13,"jsonrpc":"2.0"}
> [stderr] I[21:06:33.885] <-- textDocument/codeAction(10)
> [stderr] I[21:06:33.885] --> reply:textDocument/codeAction(10) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:33.885] <-- textDocument/documentHighlight(11)
> [stderr] I[21:06:33.885] --> reply:textDocument/documentHighlight(11) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:33.885] <-- textDocument/hover(12)
> [stderr] I[21:06:33.885] --> reply:textDocument/hover(12) 0 ms, error: -32602: trying to get AST for non-added document
> [stderr] I[21:06:33.885] <-- textDocument/signatureHelp(13)
> [stderr] I[21:06:33.885] --> reply:textDocument/signatureHelp(13) 0 ms, error: -32602: trying to get preamble for non-added document
>
> ```
>
> So, I think the problem might be caused by jsonrpc.el????
I think Eglot should filter the stuff it sends to remove non-Unicode
characters. jsonrpc.el is too low-level to do that. Let's hear what
Joao thinks about this.
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Thu, 23 Oct 2025 15:44:02 GMT) Full text and rfc822 format available.Message #14 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: João Távora <joaotavora <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Thu, 23 Oct 2025 16:43:25 +0100
[Message part 1 (text/plain, inline)]
On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote: > > From: ISouthRain <isouthrain <at> gmail.com> > > Date: Thu, 23 Oct 2025 14:51:00 +0800 > > > > > > 维护者, 你们好!! > > 请原谅我使用我的母语而不是 English 来反馈这个问题. > > 我发现了一个问题, 有关于 eglot 启动失败的问题. > > 如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败. > > 在 cc-mode 使用 eglot 启动 clangd. > > 比如 buffer 内容存在: > > ```C > > // Return: true--OK false--Error > > ``` > > 我不确定发送到你那边的邮件这个特殊编码是否存在, 它是: > > \5414622 > > > > 因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符. > > 所以我想看看这是否合理. > > > > 感谢你们的工作, 真的!! > > Translation to English: > > > Because I edit files in an editor outside of Emacs, this special > character might be present. > > So I wanted to see if this is reasonable. > > > > Thanks for your work, really! > > That character (0x161992) is beyond the last character supported by > Unicode, which is 0x10ffff. I suspect that the LSP you are using > cannot cope with non-Unicode characters. If this is true, then it's a server problem. But maybe Eglot could > somehow ignore it or translate it to some acceptable string? > Among the jobs of the LSP client, a particularly important one is to provide the server with an accurate picture of the (saved on unsaved) document under the clients' control, so that the server can reconstruct it perfectly as if it had direct access to the disk. So if Eglot is doing this job correctly (is it?) then there's nothing else to do. The picture we report of the file shall not differ from the picture we observe, however imperfect it may be. So please report this to the server developers and/or make them aware of this bug thread. What version of clangd are you using? How can I insert this bizarre character into,say, a C++ file under c++-ts-mode? João
[Message part 2 (text/html, inline)]
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Fri, 24 Oct 2025 05:13:03 GMT) Full text and rfc822 format available.Message #17 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: Rain ISouth <isouthrain <at> gmail.com> To: João Távora <joaotavora <at> gmail.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Fri, 24 Oct 2025 08:46:50 +0800
[Message part 1 (text/plain, inline)]
I send `test.c` file to you. And, clangd version. ```shell clangd --version (built by Brecht Sanders, r1) clangd version 18.1.8 Features: windows Platform: x86_64-w64-windows-gnu; target=x86_64-w64-mingw32 ``` It should be clangd version independent. João Távora <joaotavora <at> gmail.com> 于2025年10月23日周四 23:43写道: > > > On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote: > >> > From: ISouthRain <isouthrain <at> gmail.com> >> > Date: Thu, 23 Oct 2025 14:51:00 +0800 >> > >> > >> > 维护者, 你们好!! >> > 请原谅我使用我的母语而不是 English 来反馈这个问题. >> > 我发现了一个问题, 有关于 eglot 启动失败的问题. >> > 如果 buffer 内容存在 特殊字符, 那么 eglot 启动失败. >> > 在 cc-mode 使用 eglot 启动 clangd. >> > 比如 buffer 内容存在: >> > ```C >> > // Return: true--OK false--Error >> > ``` >> > 我不确定发送到你那边的邮件这个特殊编码是否存在, 它是: >> > \5414622 >> > >> > 因为我会在 Emacs 外部编辑器编辑文件, 所以有可能产生这个 特殊字符. >> > 所以我想看看这是否合理. >> > >> > 感谢你们的工作, 真的!! >> >> Translation to English: >> >> > Because I edit files in an editor outside of Emacs, this special >> character might be present. >> > So I wanted to see if this is reasonable. >> > >> > Thanks for your work, really! >> >> That character (0x161992) is beyond the last character supported by >> Unicode, which is 0x10ffff. I suspect that the LSP you are using >> cannot cope with non-Unicode characters. > > > If this is true, then it's a server problem. > > But maybe Eglot could >> somehow ignore it or translate it to some acceptable string? >> > > Among the jobs of the LSP client, a particularly important one is to > provide the server with an accurate picture of the (saved on unsaved) > document under the clients' control, so that the server can reconstruct it > perfectly as if it had direct access to the disk. > > So if Eglot is doing this job correctly (is it?) then there's nothing else > to do. The picture we report of the file shall not differ from the picture > we observe, however imperfect it may be. > > So please report this to the server developers and/or make them aware of > this bug thread. > > What version of clangd are you using? How can I insert this bizarre > character into,say, a C++ file under c++-ts-mode? > > João >
[Message part 2 (text/html, inline)]
[test.c (text/plain, attachment)]
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Fri, 24 Oct 2025 06:26:02 GMT) Full text and rfc822 format available.Message #20 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: João Távora <joaotavora <at> gmail.com> Cc: isouthrain <at> gmail.com, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Fri, 24 Oct 2025 09:25:26 +0300
> From: João Távora <joaotavora <at> gmail.com> > Date: Thu, 23 Oct 2025 16:43:25 +0100 > Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org > > On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote: > > That character (0x161992) is beyond the last character supported by > Unicode, which is 0x10ffff. I suspect that the LSP you are using > cannot cope with non-Unicode characters. > > If this is true, then it's a server problem. Sorry, I don't follow. The error which the OP quoted: > > And then message buffer out: > > ``` > > Error running timer: (wrong-type-argument json-value-p "// -*- coding: utf-8; -*- comes from Emacs. Specifically, it comes from json.c, when it tries to serialize the information to be sent to the LSP server as JSON object(s). JSON does not allow strings with characters outside of Unicode codespace, and the offending character, 0x161992, is such a character (Emacs supports character codepoints up to 0x3FFFFF). See json.c:json_out_string, where it calls string_not_unicode. The problematic character comes from the program source code in the buffer. So how can this be a server problem, when it happens entirely on our side? > But maybe Eglot could > somehow ignore it or translate it to some acceptable string? > > Among the jobs of the LSP client, a particularly important one is to provide the server with an accurate > picture of the (saved on unsaved) document under the clients' control, so that the server can reconstruct it > perfectly as if it had direct access to the disk. Perhaps we could replace such characters with a string "?", for example, when we send the program text to the server? If that is also impossible (i.e. will break some interaction with the server), then I would suggest that Eglot catches json-value-p errors and produces a more user-friendly error message, like Buffer text includes characters outside of Unicode codespace Does this make sense? > How can I insert this bizarre character into,say, a C++ file under > c++-ts-mode? Easy: type "C-x 8 RET 161992 RET".
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Fri, 24 Oct 2025 16:51:03 GMT) Full text and rfc822 format available.Message #23 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: João Távora <joaotavora <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: isouthrain <at> gmail.com, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Fri, 24 Oct 2025 17:51:35 +0100
Eli Zaretskii <eliz <at> gnu.org> writes: >> From: João Távora <joaotavora <at> gmail.com> >> Date: Thu, 23 Oct 2025 16:43:25 +0100 >> Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org >> >> On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote: >> >> That character (0x161992) is beyond the last character supported by >> Unicode, which is 0x10ffff. I suspect that the LSP you are using >> cannot cope with non-Unicode characters. >> >> If this is true, then it's a server problem. > > Sorry, I don't follow. The error which the OP quoted: > >> > And then message buffer out: >> > ``` >> > Error running timer: (wrong-type-argument json-value-p "// -*- >coding: utf-8; -*- I must have missed this. I was commenting on your "I suspect the the LSP you are using cannot cope", the "LSP" in question meaning "the server" to me. > comes from Emacs. Specifically, it comes from json.c, when it tries > to serialize the information to be sent to the LSP server as JSON > object(s). JSON does not allow strings with characters outside of > Unicode codespace, and the offending character, 0x161992, is such a > character (Emacs supports character codepoints up to 0x3FFFFF). See > json.c:json_out_string, where it calls string_not_unicode. The > problematic character comes from the program source code in the > buffer. So how can this be a server problem, when it happens entirely > on our side? It's a problem of ours, you're right. I had misunderstood the problem. But, if you're right in the "JSON does not allow...outside Unicode" sentence, it's _also_ a problem of the server, in fact it's a problem of the LSP protocol itself, since it is based on JSONRPC, which obviously is based on JSON. >> Among the jobs of the LSP client, a particularly important one is to provide the server with an accurate >> picture of the (saved on unsaved) document under the clients' control, so that the server can reconstruct it >> perfectly as if it had direct access to the disk. > > Perhaps we could replace such characters with a string "?", for > example, when we send the program text to the server? Not sure that wouldn't do more harm than good. It's proving the server with an wrong picture of the file. A checksum of it would be wrong, for example. (some servers actually have access to the file on disk and to the representation we make of the file.) Also, if you remember vaguely, LSP counts columns in octets or code points or something like that. So if we send just one byte, it could mess up the conting. Even though realistically your idea probably would work, it's a very big hack. Also why not do this hack in json.c?? That's where the limitation is (well, that's were we're closest to the source of the limitation, since the limitation is in JSON itself, according to you.) It's very akward to sanitize this output in Eglot. > If that is also impossible (i.e. will break some interaction with the > server), then I would suggest that Eglot catches json-value-p errors > and produces a more user-friendly error message, like > > Buffer text includes characters outside of Unicode codespace This makes more sense (sorry ISouthRain). But to confidently mention "buffer text" in that error we'd want to surround only some eglot--request/eglot--notify calls, which is awkward and confusing. I'd say it doesn't even make sense to start Eglot in such a buffer, so the check can happen much much earlier on, maybe in eglot--managed-mode. But even this could be tricky to do because this mode is started automaticaly in many situations. > Does this make sense? Yes, I think it's starting to make sense now. Here's another option: can't we play outside the JSON rules and send some "illegal" byte sequences in json.c? Maybe some servers will grok this. At least I think this should be attempted and the results collected for some servers... I can test with 3 or 4 if you provide that json.c patch. João
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Sat, 25 Oct 2025 04:41:01 GMT) Full text and rfc822 format available.Message #26 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: ISouthRain <isouthrain <at> gmail.com> To: João Távora <joaotavora <at> gmail.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Sat, 25 Oct 2025 12:39:43 +0800
João Távora <joaotavora <at> gmail.com> writes: > Eli Zaretskii <eliz <at> gnu.org> writes: > >>> From: João Távora <joaotavora <at> gmail.com> >>> Date: Thu, 23 Oct 2025 16:43:25 +0100 >>> Cc: ISouthRain <isouthrain <at> gmail.com>, 79682 <at> debbugs.gnu.org >>> >>> On Thu, Oct 23, 2025, 09:31 Eli Zaretskii <eliz <at> gnu.org> wrote: >>> >>> That character (0x161992) is beyond the last character supported by >>> Unicode, which is 0x10ffff. I suspect that the LSP you are using >>> cannot cope with non-Unicode characters. >>> >>> If this is true, then it's a server problem. >> >> Sorry, I don't follow. The error which the OP quoted: >> >>> > And then message buffer out: >>> > ``` >>> > Error running timer: (wrong-type-argument json-value-p "// -*- >>coding: utf-8; -*- > > I must have missed this. I was commenting on your "I suspect the the > LSP you are using cannot cope", the "LSP" in question meaning "the > server" to me. > >> comes from Emacs. Specifically, it comes from json.c, when it tries >> to serialize the information to be sent to the LSP server as JSON >> object(s). JSON does not allow strings with characters outside of >> Unicode codespace, and the offending character, 0x161992, is such a >> character (Emacs supports character codepoints up to 0x3FFFFF). See >> json.c:json_out_string, where it calls string_not_unicode. The >> problematic character comes from the program source code in the >> buffer. So how can this be a server problem, when it happens entirely >> on our side? > > It's a problem of ours, you're right. I had misunderstood the problem. > But, if you're right in the "JSON does not allow...outside Unicode" > sentence, it's _also_ a problem of the server, in fact it's a problem of > the LSP protocol itself, since it is based on JSONRPC, which obviously > is based on JSON. > >>> Among the jobs of the LSP client, a particularly important one is to provide the server with an accurate >>> picture of the (saved on unsaved) document under the clients' control, so that the server can reconstruct it >>> perfectly as if it had direct access to the disk. >> >> Perhaps we could replace such characters with a string "?", for >> example, when we send the program text to the server? > > Not sure that wouldn't do more harm than good. It's proving the server > with an wrong picture of the file. A checksum of it would be wrong, for > example. (some servers actually have access to the file on disk and to > the representation we make of the file.) > > Also, if you remember vaguely, LSP counts columns in octets or code > points or something like that. So if we send just one byte, it could > mess up the conting. > > Even though realistically your idea probably would work, it's a very big > hack. Also why not do this hack in json.c?? That's where the > limitation is (well, that's were we're closest to the source of the > limitation, since the limitation is in JSON itself, according to you.) > It's very akward to sanitize this output in Eglot. > >> If that is also impossible (i.e. will break some interaction with the >> server), then I would suggest that Eglot catches json-value-p errors >> and produces a more user-friendly error message, like >> >> Buffer text includes characters outside of Unicode codespace > > This makes more sense (sorry ISouthRain). But to confidently mention > "buffer text" in that error we'd want to surround only some > eglot--request/eglot--notify calls, which is awkward and confusing. I'd > say it doesn't even make sense to start Eglot in such a buffer, so the > check can happen much much earlier on, maybe in eglot--managed-mode. > But even this could be tricky to do because this mode is started > automaticaly in many situations. The issues you're discussing are beyond my understanding, so please excuse my inability to help. However, it makes sense to launch LSP even with an "erroneous character," even if that character is meaningless. If I have a very large .c file with a lot of content, it would be frustrating if I couldn't use eglot properly just because of this "erroneous character." Because I have to delete this "erroneous character" to start eglot normally, but I don't know where this "erroneous character" is, or what it looks like, so I can't find it and delete it... There is no clear source for how the "erroneous character" is produced. I only know that the "erroneous character" comes from editing the file with "other editors." > >> Does this make sense? > > Yes, I think it's starting to make sense now. Here's another option: > can't we play outside the JSON rules and send some "illegal" byte > sequences in json.c? Maybe some servers will grok this. At least I > think this should be attempted and the results collected for some > servers... I can test with 3 or 4 if you provide that json.c patch. > > João
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Sat, 25 Oct 2025 07:08:02 GMT) Full text and rfc822 format available.Message #29 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: João Távora <joaotavora <at> gmail.com>, Philipp Stephani <p.stephani2 <at> gmail.com> Cc: isouthrain <at> gmail.com, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Sat, 25 Oct 2025 10:07:22 +0300
> From: João Távora <joaotavora <at> gmail.com> > Cc: isouthrain <at> gmail.com, 79682 <at> debbugs.gnu.org > Date: Fri, 24 Oct 2025 17:51:35 +0100 > > > Perhaps we could replace such characters with a string "?", for > > example, when we send the program text to the server? > > Not sure that wouldn't do more harm than good. It's proving the server > with an wrong picture of the file. A checksum of it would be wrong, for > example. (some servers actually have access to the file on disk and to > the representation we make of the file.) > > Also, if you remember vaguely, LSP counts columns in octets or code > points or something like that. So if we send just one byte, it could > mess up the conting. Maybe, but only if Emacs considers the original character to take more than one column. So to countermand that, we could send a string of one or 2 "?", depending on what (char-width CHARACTER) returns for the offending CHARACTER. In this case, the 0x161992 character is considered by Emacs to take 1 column, so a single "?" should be okay in this regard. But see below. > Even though realistically your idea probably would work, it's a very big > hack. Also why not do this hack in json.c?? That's where the > limitation is (well, that's were we're closest to the source of the > limitation, since the limitation is in JSON itself, according to you.) > It's very akward to sanitize this output in Eglot. Maybe. At the time, the people who worked on json.c (Philipp, AFAIR) were very adamant that we should flatly reject non-Unicode characters. I suggested to do this in Eglot, because this is about LSP use of JSON, not about JSON in general. So what the LSP servers do and allow is also an important factor, unlike with JSON in general. But perhaps we could make this behavior optional in json.c, subject to some variable exposed to Lisp that Eglot could set? Philipp, any comments or ideas? Can someone test what other LSP clients do in these situations? For example, what does VSCode do with such source files? If it does somehow allow using LPS for such source files, it would be interesting to know what does its LSP client send to the server in those cases. Then perhaps we could teach json.c do the same. > > If that is also impossible (i.e. will break some interaction with the > > server), then I would suggest that Eglot catches json-value-p errors > > and produces a more user-friendly error message, like > > > > Buffer text includes characters outside of Unicode codespace > > This makes more sense (sorry ISouthRain). But to confidently mention > "buffer text" in that error we'd want to surround only some > eglot--request/eglot--notify calls, which is awkward and confusing. I'd > say it doesn't even make sense to start Eglot in such a buffer, so the > check can happen much much earlier on, maybe in eglot--managed-mode. > But even this could be tricky to do because this mode is started > automaticaly in many situations. Well, AFAIU "not starting Eglot" is what happens already by default, because of that error we raise. So if we decide to do that, it means there's nothing to do here except document the restriction better. > > Does this make sense? > > Yes, I think it's starting to make sense now. Here's another option: > can't we play outside the JSON rules and send some "illegal" byte > sequences in json.c? Maybe some servers will grok this. At least I > think this should be attempted and the results collected for some > servers... I can test with 3 or 4 if you provide that json.c patch. This would need someone who knows more about LSP servers than I do to suggest a specific trick. Which is why I asked above what other LSP clients do in such cases. In general, sending invalid sequences is a riskier solution, because some servers might be unable to cope. For example, I'm guessing that servers written in Python will flatly reject such sequences because AFAIK Python cannot support them.
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Sat, 25 Oct 2025 07:29:02 GMT) Full text and rfc822 format available.Message #32 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: João Távora <joaotavora <at> gmail.com> To: ISouthRain <isouthrain <at> gmail.com> Cc: Eli Zaretskii <eliz <at> gnu.org>, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Sat, 25 Oct 2025 08:28:00 +0100
[Message part 1 (text/plain, inline)]
> > On Sat, Oct 25, 2025, 05:39 ISouthRain <isouthrain <at> gmail.com> wrote: Because I have to delete this "erroneous character" to start eglot > normally, but I don't know where this "erroneous character" is, or what it > looks like, so I can't find it and delete it... I'm moderately sure Emacs has facilities for sanitizing buffers, i.e. finding and replacing codepoints outside certain encoding ranges in the whole buffer. I myself use this from time to time, but I'm not an expert. Eli can help here. João
[Message part 2 (text/html, inline)]
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Sat, 25 Oct 2025 09:57:01 GMT) Full text and rfc822 format available.Message #35 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: Eli Zaretskii <eliz <at> gnu.org> To: ISouthRain <isouthrain <at> gmail.com> Cc: joaotavora <at> gmail.com, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Sat, 25 Oct 2025 12:56:16 +0300
> From: ISouthRain <isouthrain <at> gmail.com> > Cc: Eli Zaretskii <eliz <at> gnu.org>, 79682 <at> debbugs.gnu.org > Date: Sat, 25 Oct 2025 12:39:43 +0800 > > However, it makes sense to launch LSP even with an "erroneous character," even if that character is meaningless. > > If I have a very large .c file with a lot of content, it would be frustrating if I couldn't use eglot properly just because of this "erroneous character." Can you see what other IDEs that support LSP do in these cases? If they have some way of working around the problem, we can try doing the same. E.g., what does VSCode do if you open such a .C file and invoke some command that needs to send the source to an LSP server? > Because I have to delete this "erroneous character" to start eglot normally, but I don't know where this "erroneous character" is, or what it looks like, so I can't find it and delete it... Finding them is easy, like so: M-: (skip-chars-forward "\x000-\x10ffff") RET It will stop at the first problematic character. Keep doing this until you get to end of buffer. You can also replace them using C-M-%.
bug-gnu-emacs <at> gnu.org:bug#79682; Package emacs.
(Sun, 26 Oct 2025 08:39:02 GMT) Full text and rfc822 format available.Message #38 received at 79682 <at> debbugs.gnu.org (full text, mbox):
From: João Távora <joaotavora <at> gmail.com> To: Eli Zaretskii <eliz <at> gnu.org> Cc: isouthrain <at> gmail.com, Philipp Stephani <p.stephani2 <at> gmail.com>, 79682 <at> debbugs.gnu.org Subject: Re: bug#79682: 31.0.50; Eglot does not support buffers containing special characters. Date: Sun, 26 Oct 2025 08:39:57 +0000
Eli Zaretskii <eliz <at> gnu.org> writes: > Maybe. At the time, the people who worked on json.c (Philipp, AFAIR) > were very adamant that we should flatly reject non-Unicode characters. Now you mention, I do remember much adamance, but I never got the point. > I suggested to do this in Eglot, because this is about LSP use of > JSON, not about JSON in general. Not quite. This is about JSON, the data interchange format, being used to interchange buffers or snippets of emacs buffers. That's not an exclusive of LSP. Possibly (probably?) eer other jsonrpc.el applicaiton in the wild does something like that (dape the DAP client, the countless copilot/AI plugins). So I really don't think Eglot is the place to put this exception. > Can someone test what other LSP clients do in these situations? For > example, what does VSCode do with such source files? If it does > somehow allow using LPS for such source files, it would be interesting > to know what does its LSP client send to the server in those cases. > Then perhaps we could teach json.c do the same. Agree. Also look at NeoVim, seems closer to the Emacs ecosystem. > Well, AFAIU "not starting Eglot" is what happens already by default, > because of that error we raise. So if we decide to do that, it means > there's nothing to do here except document the restriction better. If the backtrace of the error which I missed can be reiterated here we could maybe make it a nicer message. > This would need someone who knows more about LSP servers than I do to > suggest a specific trick. Which is why I asked above what other LSP > clients do in such cases. > > In general, sending invalid sequences is a riskier solution, because > some servers might be unable to cope. For example, I'm guessing that > servers written in Python will flatly reject such sequences because > AFAIK Python cannot support them. LSP has decent error reporting support. Eglot/jsonrpc models it well IMO. I think the situation of a server rejecting an LSP document would be much cleaner. João
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.