GNU bug report logs -
#61350
Eglot over Tramp freezes with large project
Previous Next
To reply to this bug, email your comments to 61350 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Feb 2023 18:49:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Thomas Koch <thomas <at> koch.ro>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Tue, 07 Feb 2023 18:49:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
This problem has already been reported to eglot, however it is probably
a problem with Tramp:
https://github.com/joaotavora/eglot/discussions/883
I've reproduced the problem with Emacs 27.1 from Debian and a minimal
Emacs 28.2. installed via nixpkg. The follwing describes a hopefully
reliable reproduction.
1. Have a client system with Emacs + eglot and an SSH server.
2. On the server:
git clone --depth 1 --no-tags --single-branch \
-b eglot-tramp-freeze-repro \
https://github.com/thkoch2001/yacy_search_server
3. On the server download and untar
https://download.eclipse.org/jdtls/milestones/1.19.0/jdt-language-server-1.19.0-202301171536.tar.gz
4. Still on server make a symlink from ~/bin/jdtls to bin/jdtls from the
tarball.
5. Install eglot and use this emacs config on client:
(custom-set-variables
'(tramp-remote-path
'("/run/current-system/sw/bin" "/bin" "/usr/bin" "/sbin" "/usr/sbin" "/usr/local/bin" "/usr/local/sbin" "/local/bin" tramp-own-remote-path))
'(tramp-use-ssh-controlmaster-options nil)
;; problem exists also without previous line
)
6. Open source/net/yacy/yacy.java over Tramp with ssh:
7. M-x eglot
Freeze should happen after a few seconds.
The project works fine with eglot with a local Emacs.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Fri, 17 Feb 2023 09:55:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
Hi Thomas,
> I've reproduced the problem with Emacs 27.1 from Debian and a minimal
> Emacs 28.2. installed via nixpkg. The follwing describes a hopefully
> reliable reproduction.
>
> 1. Have a client system with Emacs + eglot and an SSH server.
>
> 2. On the server:
>
> git clone --depth 1 --no-tags --single-branch \
> -b eglot-tramp-freeze-repro \
> https://github.com/thkoch2001/yacy_search_server
>
> 3. On the server download and untar
> https://download.eclipse.org/jdtls/milestones/1.19.0/jdt-language-server-1.19.0-202301171536.tar.gz
>
> 4. Still on server make a symlink from ~/bin/jdtls to bin/jdtls from the
> tarball.
>
> 5. Install eglot and use this emacs config on client:
>
> (custom-set-variables
> '(tramp-remote-path
> '("/run/current-system/sw/bin" "/bin" "/usr/bin" "/sbin" "/usr/sbin" "/usr/local/bin" "/usr/local/sbin" "/local/bin" tramp-own-remote-path))
> '(tramp-use-ssh-controlmaster-options nil)
> ;; problem exists also without previous line
> )
>
> 6. Open source/net/yacy/yacy.java over Tramp with ssh:
>
> 7. M-x eglot
Thanks for the concise recipe. Since I have used a virgin Ubuntu VM as
test target, one additional step is needed: Install Java.
> Freeze should happen after a few seconds.
Yep, I could reproduce it with both Emacs 29.0.60 and 30.0.50. And the
backtrace is
--8<---------------cut here---------------start------------->8---
backtrace()
tramp-error((tramp-file-name "ssh" nil nil "ubuntu-2204" nil "/home/albinus/src/yacy_search_server/examples/Simp..." nil) quit "")
tramp-signal-hook-function(quit nil)
accept-process-output(#<process *tramp/ssh ubuntu-2204*> nil nil t)
tramp-accept-process-output(#<process *tramp/ssh ubuntu-2204*>)
tramp-wait-for-regexp(#<process *tramp/ssh ubuntu-2204*> nil "\\(?:^\\|\0\\)\\(?:[^\n#$]*///3f62f1d55bd34cba7fe730c633...")
tramp-wait-for-output(#<process *tramp/ssh ubuntu-2204*>)
tramp-send-command((tramp-file-name "ssh" nil nil "ubuntu-2204" nil "/home/albinus/src/yacy_search_server/examples/Simp..." nil) "\\readlink --canonicalize-missing /home/albinus/src...")
tramp-send-command-and-check((tramp-file-name "ssh" nil nil "ubuntu-2204" nil "/home/albinus/src/yacy_search_server/examples/Simp..." nil) "\\readlink --canonicalize-missing /home/albinus/src...")
tramp-sh-handle-file-truename("/ssh:ubuntu-2204:/home/albinus/src/yacy_search_ser...")
apply(tramp-sh-handle-file-truename "/ssh:ubuntu-2204:/home/albinus/src/yacy_search_ser...")
tramp-sh-file-name-handler(file-truename "/ssh:ubuntu-2204:/home/albinus/src/yacy_search_ser...")
apply(tramp-sh-file-name-handler file-truename "/ssh:ubuntu-2204:/home/albinus/src/yacy_search_ser...")
tramp-file-name-handler(file-truename "/ssh:ubuntu-2204:/home/albinus/src/yacy_search_ser...")
file-truename("/ssh:ubuntu-2204:/home/albinus/src/yacy_search_ser...")
find-buffer-visiting("/ssh:ubuntu-2204:/home/albinus/src/yacy_search_ser...")
#f(compiled-function (arg1 arg2 &rest rest) "Handle notification publishDiagnostics." #<bytecode 0x5bbf29480580234>)(#<eglot-lsp-server eglot-lsp-server-bfebda> textDocument/publishDiagnostics :uri "file:///home/albinus/src/yacy_search_server/exampl..." :diagnostics [])
apply(#f(compiled-function (arg1 arg2 &rest rest) "Handle notification publishDiagnostics." #<bytecode 0x5bbf29480580234>) #<eglot-lsp-server eglot-lsp-server-bfebda> textDocument/publishDiagnostics (:uri "file:///home/albinus/src/yacy_search_server/exampl..." :diagnostics []))
eglot-handle-notification(#<eglot-lsp-server eglot-lsp-server-bfebda> textDocument/publishDiagnostics :uri "file:///home/albinus/src/yacy_search_server/exampl..." :diagnostics [])
apply(eglot-handle-notification #<eglot-lsp-server eglot-lsp-server-bfebda> textDocument/publishDiagnostics (:uri "file:///home/albinus/src/yacy_search_server/exampl..." :diagnostics []))
#f(compiled-function (server method params) #<bytecode -0xa39f6ed8241f361>)(#<eglot-lsp-server eglot-lsp-server-bfebda> textDocument/publishDiagnostics (:uri "file:///home/albinus/src/yacy_search_server/exampl..." :diagnostics []))
jsonrpc-connection-receive(#<eglot-lsp-server eglot-lsp-server-bfebda> (:jsonrpc "2.0" :method "textDocument/publishDiagnostics" :params (:uri "file:///home/albinus/src/yacy_search_server/exampl..." :diagnostics [])))
jsonrpc--process-filter(#<process EGLOT (yacy_search_server/(java-mode java-ts-mode))> "pc\":\"2.0\",\"method\":\"$/progress\",\"params\":{\"token\":...")
--8<---------------cut here---------------end--------------->8---
The problem is clear. We are in jsonrpc--process-filter (accepting process
output). This runs eglot-handle-notification, and that runs
file-truename. The latter needs to check remote, so Tramp sends a remote
command, and waits for that output (calling accept-process-output again).
Since jsonrpc always accepts output from *all* running processes, there
could be (and is!) the constellation, that process output has been read
already, and Tramp didn't get it, waiting for the output forever. I
could mitigate the problem partly by changing the arguments of
accept-process-output in jsonrpc-request, and eglot could work
successfully, sometimes. And sometimes not. And even then, I ran into
the famous "Forbidden reentrant call" error, see bug#60534.
I will continue to investigate, but after playing with this for some
days, I feel we must add threads to Tramp, when there are several
concurrent processes. I've tried this already some years agao, and I
failed. Maybe it is time now for a new attempt.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Fri, 17 Feb 2023 15:24:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thanks a lot Michael for the analysis.
I don't know anything about Emacs internals, but I have a healthy fear of threads and could understand that it might be a can of worms.
Maybe there are ideas that could be taken from JavaScript, which also has only a single thread. As I understand, any threading problem can be translated into event driven code. Additionally, browsers now support web workers which (in theory?) should avoid locks by not sharing context with the main thread:
https://developer.mozilla.org/en-US/docs/Web/API/Worker
I forwarded your analysis to the GitHub issue:
https://github.com/joaotavora/eglot/discussions/883#discussioncomment-5008509
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 18 Feb 2023 11:11:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 61350 <at> debbugs.gnu.org (full text, mbox):
CC eglot maintainer as requested by him.
Please bare with me Michael, if I ask a stupid question although you're a long time Emacs contributor. I saw now that concurrency in Emacs is an age old problem: https://www.emacswiki.org/emacs/ConcurrentEmacs
There are emacs-async and emacs-deferred. Couldn't one of them be used to write a non-blocking wrapper for file-io? NodeJS made this programming model popular when it appeared in 2009. I can't believe that nobody discussed this before for Emacs?
(Of course I otherwise despise JavaScript and NodeJS in particular with passion.)
Meanwhile I could hack eglot to not block so far by replacing find-buffer-visiting with get-file-buffer in eglot-handle-notification. Of course there will probably be other places that cause problems and I could also only use this hack because my project files are not opened via symlinks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 18 Feb 2023 12:08:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
Hi Thomas,
> Please bare with me Michael, if I ask a stupid question although
> you're a long time Emacs contributor. I saw now that concurrency in
> Emacs is an age old problem:
> https://www.emacswiki.org/emacs/ConcurrentEmacs
Don't worry. You don't ask stupid questions, often it is useful to think
differently.
> There are emacs-async and emacs-deferred. Couldn't one of them be used
> to write a non-blocking wrapper for file-io? NodeJS made this
> programming model popular when it appeared in 2009. I can't believe
> that nobody discussed this before for Emacs?
emacs-async uses a second Emacs instance, AFAIU. I doubt that this
approach is helpful for Tramp, which needs to read files into buffers
and alike.
emacs-deferred I don't know.
> Meanwhile I could hack eglot to not block so far by replacing
> find-buffer-visiting with get-file-buffer in
> eglot-handle-notification. Of course there will probably be other
> places that cause problems and I could also only use this hack because
> my project files are not opened via symlinks.
Yes, that's similar to what I've done while trying to mitigate the
problem. But then, later, there was still the famous "Forbidden
reentrant call of Tramp", which we must fix anyway.
In fact, the threaded Tramp some years ago wasn't that bad. It worked in
many cases. The major problem with threads in Emacs is user
interaction. When there is a dialogue with a user, for example asking
for a password in the minibuffer, it happened that the user input didn't
return to the correct thread. Something like a UI-thread is needed in
Emacs, I believe. That is, all threads in Emacs delegate user
interaction to that UI-thread, and then wait for the result via a
mutex. Something like this.
For the new attempt to bring threads to Tramp, I plan another
approach. Not a whole operation (like find-file) shall be located into
an own thread. Instead, just the process communication,
process-send-string and accept-process-output, shall be delegated to
another thread. This are the places we have problems. Let's see how far
I could go. But pls don't hold your breath, it will take time until I
could show results.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 23 Feb 2023 11:56:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 61350 <at> debbugs.gnu.org (full text, mbox):
> Michael Albinus <michael.albinus <at> gmx.de> hat am 18.02.2023 14:07 EET geschrieben:
> Don't worry. You don't ask stupid questions, often it is useful to think
> differently.
Another [...] question:
Would it be possible for Tramp to open and close a new SSH connection for every new command?
Would this mitigate the problem as an immediate workaround?
I use tramp to connect to a local VM or a machine in my home network. Thus opening a new SSH connection with ControlMaster is fast.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 23 Feb 2023 12:16:01 GMT)
Full text and
rfc822 format available.
Message #23 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> wrote:
> The problem is clear. We are in jsonrpc--process-filter (accepting process
> output). This runs eglot-handle-notification, and that runs
> file-truename. The latter needs to check remote, so Tramp sends a remote
> command, and waits for that output (calling accept-process-output again).
>
> Since jsonrpc always accepts output from *all* running processes, there
> could be (and is!) the constellation, that process output has been read
> already, and Tramp didn't get it, waiting for the output forever. I
> could mitigate the problem partly by changing the arguments of
> accept-process-output in jsonrpc-request, and eglot could work
> successfully, sometimes. And sometimes not. And even then, I ran into
> the famous "Forbidden reentrant call" error, see bug#60534.
I see the problem more or less, indeed, thanks for reproducing it and
clarifying it.
Can you try your experiment again after evaluating these two lines?
(cl-defmethod eglot-handle-notification :around (_ _ &key &allow-other-keys)
(run-at-time 0 nil (lambda () (cl-call-next-method))))
This makes the notification handlers execute "outside" the LSP server's
process filter, so maybe it fix it.
I've given it very little testing and it might have other monumentally
severe consequences in non-TRAMP cases, so take it with a grain of salt.
If the fix works, maybe it could be done upstream in jsonrpc.el itself.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 23 Feb 2023 14:19:01 GMT)
Full text and
rfc822 format available.
Message #26 received at 61350 <at> debbugs.gnu.org (full text, mbox):
> Can you try your experiment again after evaluating these two lines?
> (cl-defmethod eglot-handle-notification :around (_ _ &key &allow-other-keys)
> (run-at-time 0 nil (lambda () (cl-call-next-method))))
I forgot to mention that one should evaluate that in a
buffer were lexical-binding is t.
Better try this jsonrpc.el patch instead which does basically
the same, but won't have this pitfall.
diff --git a/lisp/jsonrpc.el b/lisp/jsonrpc.el
index f583d116d20..45e3d97ca41 100644
--- a/lisp/jsonrpc.el
+++ b/lisp/jsonrpc.el
@@ -184,8 +184,8 @@ jsonrpc-connection-receive
(apply #'jsonrpc--reply connection id reply)))
(;; A remote notification
method
- (funcall (jsonrpc--notification-dispatcher connection)
- connection (intern method) params))
+ (run-at-time 0 nil (jsonrpc--notification-dispatcher connection)
+ connection (intern method) params))
(;; A remote response
(setq continuations
(and id (gethash id (jsonrpc--request-continuations
connection))))
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 23 Feb 2023 14:48:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thanks João, but with the jsonrpc patch it is the same as with the previous patch: Emacs still hangs. This time I could unfreeze it with C-g, but it hangs again a moment later.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 23 Feb 2023 15:23:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Thu, Feb 23, 2023 at 2:47 PM Thomas Koch <thomas <at> koch.ro> wrote:
>
> Thanks João, but with the jsonrpc patch it is the same as with the previous patch: Emacs still hangs. This time I could unfreeze it with C-g, but it hangs again a moment later.
I'll need you to describe the conditions that lead Emacs to hang.
An Emacs -Q run, a server, an example file, all in all, a MRE. See
how this user informally but very effectively reported a bug in Eglot:
https://lists.gnu.org/archive/html/emacs-devel/2023-02/msg00851.html
Here, I don't even understand whether you're trying this with TRAMP
or locally.
If I can't see the error you're facing in my machine, it's likely
I won't be of any help. I can report I've been working for a few
hours with that patch with the Clangd server with no problems using
a local project (no TRAMP).
We can also wait for Michael to try out the patch, as I'm sure he
can report any problems just as clearly as he last did in this
thread.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Fri, 24 Feb 2023 17:20:02 GMT)
Full text and
rfc822 format available.
Message #35 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi,
sorry for the delay, I have been out of order the last days :-(
>> Thanks João, but with the jsonrpc patch it is the same as with the previous patch: Emacs still hangs. This time I could unfreeze it with C-g, but it hangs again a moment later.
I confirm, that with the jsonrpc patch the problem still happens to
me. I'm using the emacs-29 branch for test.
> I'll need you to describe the conditions that lead Emacs to hang.
> An Emacs -Q run, a server, an example file, all in all, a MRE. See
> how this user informally but very effectively reported a bug in Eglot:
>
> https://lists.gnu.org/archive/html/emacs-devel/2023-02/msg00851.html
>
> Here, I don't even understand whether you're trying this with TRAMP
> or locally.
I have prepared the test as described by Thomas in
<https://debbugs.gnu.org/cgi/bugreport.cgi?bug=61350#5>. After setting
up everything on the remote side, I have called
--8<---------------cut here---------------start------------->8---
# ~/src/emacs-29/src/emacs -Q /ssh:ubuntu-2204:/home/albinus/src/yacy_search_server/source/net/yacy/yacy.java -f eglot
--8<---------------cut here---------------end--------------->8---
The remote machine is a virgin Ubuntu VM.
> If I can't see the error you're facing in my machine, it's likely
> I won't be of any help. I can report I've been working for a few
> hours with that patch with the Clangd server with no problems using
> a local project (no TRAMP).
>
> We can also wait for Michael to try out the patch, as I'm sure he
> can report any problems just as clearly as he last did in this
> thread.
As said above, same result: eglot hangs. Your patch doesn't change the
game. You don't call jsonrpc--notification-dispatcher anymore in the
process filter directly. But calling it by a timer has the same effect:
The process filter (accepting output) is still active, and the called
function in the timer still uses file-truename, which tries to read
output from the Tramp process.
Again, I still believe we need a general solution in Tramp, using threads.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Fri, 24 Feb 2023 17:47:01 GMT)
Full text and
rfc822 format available.
Message #38 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Fri, Feb 24, 2023 at 5:19 PM Michael Albinus <michael.albinus <at> gmx.de> wrote:
> sorry for the delay, I have been out of order the last days :-(
No problem. Hope you're feeling better!
> As said above, same result: eglot hangs. Your patch doesn't change the
> game.
Sorry to hear about that, I thought it was problem related to processes
inside processes.
> You don't call jsonrpc--notification-dispatcher anymore in the
> process filter directly. But calling it by a timer has the same effect:
> The process filter (accepting output) is still active,
What exactly do you mean by "still active" and what process are you
referring to? Jsonrpc's or Tramp?
> and the called
> function in the timer still uses file-truename, which tries to read
> output from the Tramp process.
This part I can understand. But I don't understand this that you wrote
in the original message:
> > Since jsonrpc always accepts output from *all* running processes, there
> > could be (and is!) the constellation, that process output has been read
> > already, and Tramp didn't get it, waiting for the output forever.
I could understand if we were talking C and read() here, but the process
output of other processes read by jsonrpc's call to accept-process-output
surely must have run through some filter function, it wasn't just lost
AFAIK. You've probably seen this trick a mililon times, but markers
in the process buffer is what is used commonly.
> Again, I still believe we need a general solution in Tramp, using threads.
I don't understand what is conceptually impossible (or very hard) to
achieve with evented IO like we have in Emacs.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 25 Feb 2023 14:28:02 GMT)
Full text and
rfc822 format available.
Message #41 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> You don't call jsonrpc--notification-dispatcher anymore in the
>> process filter directly. But calling it by a timer has the same effect:
>> The process filter (accepting output) is still active,
>
> What exactly do you mean by "still active" and what process are you
> referring to? Jsonrpc's or Tramp?
Your timer is called immediately, while you're still in
jsonrpc--process-filter I believe.
>> > Since jsonrpc always accepts output from *all* running processes, there
>> > could be (and is!) the constellation, that process output has been read
>> > already, and Tramp didn't get it, waiting for the output forever.
>
> I could understand if we were talking C and read() here, but the process
> output of other processes read by jsonrpc's call to accept-process-output
> surely must have run through some filter function, it wasn't just lost
> AFAIK. You've probably seen this trick a mililon times, but markers
> in the process buffer is what is used commonly.
It wasn't lost. The process output was retrieved and placed into the
Tramp buffer, w/o Tramp's interaction.
Tramp doesn't use a process filter for its own connections. The problem
is rather, that Tramp must know where the output to be parsed starts in
the buffer.
If another process has consumed the output, even if it is pushed into
the correct buffer, Tramp doesn't know.
>> Again, I still believe we need a general solution in Tramp, using threads.
>
> I don't understand what is conceptually impossible (or very hard) to
> achieve with evented IO like we have in Emacs.
I've tried different patches, mainly in tramp-accept-process-output. It
improves the situation a little bit, but not reliably. Sometimes the
test works, sometimes it blocks. And even if it doesn't block, a while
later we run into the "Forbidden reentrant call of Tramp" error.
Honestly, I still don't understand really what's up. Let's see whether
adding threads to Tramp helps.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 25 Feb 2023 14:37:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
Hi Thomas,
> Another [...] question:
>
> Would it be possible for Tramp to open and close a new SSH connection for every new command?
> Would this mitigate the problem as an immediate workaround?
It might help, don't know. But the concurrent accessing of process
output by different processes still exist, and I don't know exactly
(yet) why it doesn't work as expected.
> I use tramp to connect to a local VM or a machine in my home network. Thus opening a new SSH connection with ControlMaster is fast.
Yes, but I expect performance problems. We have already the Tramp "sudo"
method, which works like an ssh connection: open a process to the
target, and send commands via this process. Since it is a local
connection, this is even faster than ssh.
But there are security aware users, who dislike a permanent Tramp sudo
process, which would allow attacks with root permissions. For these
users, Tramp offers the "sudoedit" method. The difference is, that every
single command Tramp sends, is a new process, similar to what you
propose with ssh.
The performance of Tramp "sudoedit", compared with "sudo", is much
worse. Likely to be expected for a similar "ssh" approach.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 25 Feb 2023 23:08:01 GMT)
Full text and
rfc822 format available.
Message #47 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> Hi João,
>
>>> You don't call jsonrpc--notification-dispatcher anymore in the
>>> process filter directly. But calling it by a timer has the same effect:
>>> The process filter (accepting output) is still active,
>>
>> What exactly do you mean by "still active" and what process are you
>> referring to? Jsonrpc's or Tramp?
>
> Your timer is called immediately, while you're still in
> jsonrpc--process-filter I believe.
I don't think so. Any timer function runs in its stack, after the stack
where jsonrpc--process-filter happened has unwound. But you've answered
my question: by "active" you mean "under the stack frame of
jsonrpc--process-filter".
>>> > Since jsonrpc always accepts output from *all* running processes, there
>>> > could be (and is!) the constellation, that process output has been read
>>> > already, and Tramp didn't get it, waiting for the output forever.
>>
>> I could understand if we were talking C and read() here, but the process
>> output of other processes read by jsonrpc's call to accept-process-output
>> surely must have run through some filter function, it wasn't just lost
>> AFAIK. You've probably seen this trick a mililon times, but markers
>> in the process buffer is what is used commonly.
>
> It wasn't lost. The process output was retrieved and placed into the
> Tramp buffer, w/o Tramp's interaction.
That's great and quite normal.
> Tramp doesn't use a process filter for its own connections.
Every process in Emacs has a process filter, though it might not be a
custom process filter. If you don't give it one, it is assigned
internal-default-process-filter, which as the docstring explains, simply
inserts process output into the buffer (of course you probably know
this, I'm just stating for other readers).
> is rather, that Tramp must know where the output to be parsed starts in
> the buffer.
Right, and this is where 'point' and 'process-mark' come in. Process
mark is where the internal filter last left the insertion, and point is
where you the programmer last left your parsing.
> If another process has consumed the output, even if it is pushed into
> the correct buffer, Tramp doesn't know.
Why? May I ask -- perhaps naively -- why can't you can't just
(with-current-buffer proc (= (point) (process-mark)))
to "know"?
In my experience, something like this is usually sufficient. One parses
the buffer for incoming messages looking for regexps, magic byte
sequences, etc. One always leaves point after a successful search
(search-forward-regexp has this behaviour and it's very convenient).
Eventually, point may be left exactly at process-mark, or not, depending
on how much data came in, a full message, multiple full messages, or
some fractional message.
Regardless, next time you want to get messages from the process, you
perform a check before you go on a potentially blocking call to fetch
more output. The check is usually "has process-mark advanced behind my
back from other libraries I don't control?" Here, jsonrpc.el's a-p-o
calls are the "behing your back". After checking, you know if you have
enough data to form a full message, or if you need to go on a
potentially blocking a-p-o call.
But somehow I suspect you know all this by heart already!! In fact,
from the backtrace you posted Fri, 17 Feb 2023, it's clear the hang
happend in 'tramp-wait-for-regexp' whic starts
(defun tramp-wait-for-regexp (proc timeout regexp)
...
(let ((found (tramp-check-for-regexp proc regexp)))
(cond (timeout
(with-timeout (timeout)
(while (not found)
(tramp-accept-process-output proc)
Here, exactly how I imaged, you first check for "found" before venturing
into the blocking tramp-accept-process-output call. So it looks to me
that you're doing conceptually the right thing, but it's
tramp-check-for-regexp who is missing the fact that there is a perfectly
good message in process buffer already. At least according to what I
understood from your account of the problem.
So my suspicion is in tramp-check-for-regexp. I found it a bit hard to
read to find an obvious culprit, and I still haven't a working setup to
reproduce this...
>>> Again, I still believe we need a general solution in Tramp, using threads.
>>
>> I don't understand what is conceptually impossible (or very hard) to
>> achieve with evented IO like we have in Emacs.
>
> I've tried different patches, mainly in tramp-accept-process-output. It
> improves the situation a little bit, but not reliably. Sometimes the
> test works, sometimes it blocks. And even if it doesn't block, a while
> later we run into the "Forbidden reentrant call of Tramp" error.
I had recently a problem with reentrant calls in
jsonrpc--process-filter. This is why you find a run-with-timer call
there, right at the beginning. This removes the reentrancy because as
written before, timers run in their own stack. This was the fix to a
nasty Eglot bug#60088 with similar "hanging" behaviour.
> Honestly, I still don't understand really what's up. Let's see whether
> adding threads to Tramp helps.
I'll try to setup a VM myself with the reproduction recipe that Thomas
used. I'm reasonably confident that two process-based extensions such
as Jsonrpc.el and TRAMP can coeexist if each follows process etiquette.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 26 Feb 2023 10:25:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael and I met yesterday for a debugging session.
Eventually we tried without SSH ControlMaster (via tramp-use-ssh-controlmaster-options). Eglot did not freeze anymore.
I actually found the hint to ControlMaster already early on but probably made a mistake when trying it:
- Either I set tramp-ssh-controlmaster-options to nil, which is the default and does not have any effect.
- Or SSH still used ControlMaster due to my ssh config:
Host *
ControlMaster auto
I still see one
error in process filter: tramp-error: peculiar error: "Forbidden reentrant call of Tramp"
I don't consider this a solution but just a workaround. After all I had tramp freeze my emacs also before I used eglot. After reading jsonrpc and tramp source code yesterday I also believe that Joãos proposal might help to make Tramp more stable.
Unfortunately I've not done any elisp programming yet and thus can't help with coding.
I was thinking whether there could be an abstract (and reliable) process interaction library to be used by tramp, jsonrpc and others. In my init.el I found at least magit (magit-process.el) and maybe org-export (ox.el) which also call external processes and need to reliably parse process output.
jsonrpc is special since it needs to match process output by message id to previous process input.
Otherwise it seems to me that there is a lot of shared logic?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 26 Feb 2023 15:59:01 GMT)
Full text and
rfc822 format available.
Message #53 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
Hi Thomas,
> Michael and I met yesterday for a debugging session.
>
> Eventually we tried without SSH ControlMaster (via tramp-use-ssh-controlmaster-options). Eglot did not freeze anymore.
Yep. It looks, like neither Eglot nor Tramp do it wrong.
> I actually found the hint to ControlMaster already early on but probably made a mistake when trying it:
>
> - Either I set tramp-ssh-controlmaster-options to nil, which is the default and does not have any effect.
Do not use this variable, it is Tramp internal. Instead, set the user
option tramp-use-ssh-controlmaster-options to nil *prior* establishing a
connection, as explained in the Tramp manual.
> - Or SSH still used ControlMaster due to my ssh config:
>
> Host *
> ControlMaster auto
This shouldn't be set for hosts you're communicating via Eglot.
> I still see one
>
> error in process filter: tramp-error: peculiar error: "Forbidden reentrant call of Tramp"
This is *another* error, it doesn't block Emacs as described in this
report. And since we have other bug reports about (bug#49954,
bug#60534), I prefer to not discuss it in *this* bug report.
> I don't consider this a solution but just a workaround. After all I
> had tramp freeze my emacs also before I used eglot.
I don't believe it is a workaround only. Using ssh's ControlMaster seems
to be problematic for large data to be transferred, for whatever
reason. It doesn't look like it is an error in Emacs. I might be wrong,
but as I said already, after reading the code again and again (and also
after extensive debugging) I have no clue what could be wrong in Eglot
or Tramp.
Therefore, I recommend that Eglot and Tramp manuals warn about using ssh
ControlMaster option when large data are on the wire.
> After reading jsonrpc and tramp source code yesterday I also believe
> that Joãos proposal might help to make Tramp more stable.
I'm not against changes in Eglot or jsonrpc. But I believe this won't
help with the "Forbidden reentrant call of Tramp" problem.
> I was thinking whether there could be an abstract (and reliable)
> process interaction library to be used by tramp, jsonrpc and
> others. In my init.el I found at least magit (magit-process.el) and
> maybe org-export (ox.el) which also call external processes and need
> to reliably parse process output.
magit-process.el offers some wrappers for process-file,
start-file-process, and even for tramp-sh-handle-process-file and
tramp-sh-handle-start-file-process. So the real work is still done in
Tramp.
In ox.el, start-process is used with a sentinel in
org-export-async-start. However, I don't see where accept-process-output
is called here, it might be distributed over the Org source tree (which
is much too large for me to read).
> jsonrpc is special since it needs to match process output by message
> id to previous process input.
>
> Otherwise it seems to me that there is a lot of shared logic?
What do you mean by this?
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 26 Feb 2023 17:24:01 GMT)
Full text and
rfc822 format available.
Message #56 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> is rather, that Tramp must know where the output to be parsed starts in
>> the buffer.
>
> Right, and this is where 'point' and 'process-mark' come in. Process
> mark is where the internal filter last left the insertion, and point is
> where you the programmer last left your parsing.
Yes, but Tramp doesn't get the whole output with one
accept-process-output call. It must use a while loop, until everything
it expects has been arrived. So the process-mark also moves again and
again, and cannot be used by Tramp directly.
>> If another process has consumed the output, even if it is pushed into
>> the correct buffer, Tramp doesn't know.
>
> Why? May I ask -- perhaps naively -- why can't you can't just
>
> (with-current-buffer proc (= (point) (process-mark)))
>
> to "know"?
See above.
> In my experience, something like this is usually sufficient. One parses
> the buffer for incoming messages looking for regexps, magic byte
> sequences, etc. One always leaves point after a successful search
> (search-forward-regexp has this behaviour and it's very convenient).
> Eventually, point may be left exactly at process-mark, or not, depending
> on how much data came in, a full message, multiple full messages, or
> some fractional message.
>
> Regardless, next time you want to get messages from the process, you
> perform a check before you go on a potentially blocking call to fetch
> more output. The check is usually "has process-mark advanced behind my
> back from other libraries I don't control?" Here, jsonrpc.el's a-p-o
> calls are the "behing your back". After checking, you know if you have
> enough data to form a full message, or if you need to go on a
> potentially blocking a-p-o call.
Everything correct. But the problem was that not the whole expected
output hasn't arrived the buffer. I thought it was "stolen" by another
process. But yesterday's debugging has shown, that ssh ControlMaster
seems to be guilty; it cannot handle large amount of data reliably.
> But somehow I suspect you know all this by heart already!! In fact,
> from the backtrace you posted Fri, 17 Feb 2023, it's clear the hang
> happend in 'tramp-wait-for-regexp' whic starts
>
> (defun tramp-wait-for-regexp (proc timeout regexp)
> ...
> (let ((found (tramp-check-for-regexp proc regexp)))
> (cond (timeout
> (with-timeout (timeout)
> (while (not found)
> (tramp-accept-process-output proc)
>
> Here, exactly how I imaged, you first check for "found" before venturing
> into the blocking tramp-accept-process-output call. So it looks to me
> that you're doing conceptually the right thing, but it's
> tramp-check-for-regexp who is missing the fact that there is a perfectly
> good message in process buffer already. At least according to what I
> understood from your account of the problem.
Yes, that's the place. Tramp waits for a trailing shell prompt that it
tells that everything has been sent from remote. But this shell prompt
didn't arrive, and Tramp hangs in waiting for this.
>> Honestly, I still don't understand really what's up. Let's see whether
>> adding threads to Tramp helps.
>
> I'll try to setup a VM myself with the reproduction recipe that Thomas
> used. I'm reasonably confident that two process-based extensions such
> as Jsonrpc.el and TRAMP can coeexist if each follows process etiquette.
They will cooperate, definitely!
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 26 Feb 2023 21:13:02 GMT)
Full text and
rfc822 format available.
Message #59 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> Yes, but Tramp doesn't get the whole output with one
> accept-process-output call. It must use a while loop, until everything
> it expects has been arrived. So the process-mark also moves again and
> again, and cannot be used by Tramp directly.
Sure, but you wrote: "Tramp must know where the output to be parsed
starts in the buffer". And I suggested that you arrange for that
_starting point_ to be given by the Lisp function 'point', not
'process-mark'.
> Everything correct. But the problem was that not the whole expected
> output hasn't arrived the buffer. I thought it was "stolen" by another
> process.
OK, but earlier you said that the process _had_ arrived in Tramp's
buffer (though this arrival was triggered by jsonrpc.el) but that Tramp
couldn't be made aware of it. You wrote yesterday:
>>> It wasn't lost. The process output was retrieved and placed into the
>>> Tramp buffer, w/o Tramp's interaction.
Can you confirm that you no longer believe this to be the case?
> But yesterday's debugging has shown, that ssh ControlMaster
> seems to be guilty; it cannot handle large amount of data reliably.
So maybe one should disable it by default. How does one disable it? I
have trouble understanding from tramp-use-ssh-controlmaster-options and
tramp-ssh-controlmaster-options.
> Yes, that's the place. Tramp waits for a trailing shell prompt that it
> tells that everything has been sent from remote. But this shell prompt
> didn't arrive, and Tramp hangs in waiting for this.
So the problem is now that the shell prompt didn't arrive after all,
because of controlmaster, not because of jsonrpc.el. Can you confirm
this?
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 26 Feb 2023 21:45:01 GMT)
Full text and
rfc822 format available.
Message #62 received at 61350 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
João Távora <joaotavora <at> gmail.com> writes:
> So maybe one should disable it by default. How does one disable it? I
> have trouble understanding from tramp-use-ssh-controlmaster-options and
> tramp-ssh-controlmaster-options.
In the meantime I have finally, with great Docker-relearning hardship,
setup a reproduction environment. You'll see it follows exactly
Thomas's reproduction recipe almost until the end. It sets up a server
on port 2022
I attach the dockerfile.
$ docker build -t sshubuntu .
$ docker run -d -p 2022:22 sshubuntu
$ /path/to/emacs -Q /ssh:sshuser <at> localhost#2022:/home/sshuser/yacy_search_server/source/net/yacy/yacy.java -f eglot
Now the odd bit. Sometimes this works, sometimes it doesn't!!!
Sometimes it finds '~/bin/jtdls' in the remote's PATH and sometimes it
doesn't! I have no idea why. Maybe someone can help me debug: how is
PATH to work? In the server I can see that on simple ssh login,
~/bin/jdtls _is_ found.
Anyway, when I _do_ manage to get it to find sshuser's '~/bin/jtdls':
* I do experience the hang regularly, but not always at the same point
in time.
* I don't know how to obtain the pretty Backtrace that you Michael got.
Michael how do you get that?? Maybe you're using some tramp flag,
because debug-on-error and debug-on-quit don't work.
* If I set
(setq tramp-ssh-controlmaster-options nil
tramp-use-ssh-controlmaster-options nil)
I _no longer_ experience the hang. This is after visiting 5 or 6
files, trying some completion, M-. xref-find-definitions, etc.
Michael can you confirm that these are per-buffer values? Would it
make sense for Eglot to set them like so in its minor mode hook?
João
[dockerfile (text/plain, inline)]
FROM ubuntu:latest
RUN apt update && apt install openssh-server sudo openjdk-17-jdk openjdk-17-jre git wget -y
# Create a user “sshuser” and group “sshgroup”
RUN groupadd sshgroup && useradd -ms /bin/bash -g sshgroup sshuser
# Create sshuser directory in home
RUN mkdir -p /home/sshuser/.ssh
# Copy the ssh public key in the authorized_keys file. The idkey.pub below is a public key file you get from ssh-keygen. They are under ~/.ssh directory by default.
COPY id_rsa.pub /home/sshuser/.ssh/authorized_keys
# change ownership of the key file.
RUN chown sshuser:sshgroup /home/sshuser/.ssh/authorized_keys && chmod 600 /home/sshuser/.ssh/authorized_keys
# Start SSH service
RUN service ssh start
# Expose docker port 22
EXPOSE 22
USER sshuser
WORKDIR /home/sshuser
RUN git clone --depth 1 --no-tags --single-branch -b eglot-tramp-freeze-repro https://github.com/thkoch2001/yacy_search_server
RUN wget https://download.eclipse.org/jdtls/milestones/1.19.0/jdt-language-server-1.19.0-202301171536.tar.gz
RUN mkdir jdtls
RUN tar xvfz jdt-language-server-1.19.0-202301171536.tar.gz -C jdtls
RUN mkdir bin
RUN ln -sf ~/jdtls/bin/jdtls ~/bin/jdtls
USER root
CMD ["/usr/sbin/sshd","-D"]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 27 Feb 2023 07:48:02 GMT)
Full text and
rfc822 format available.
Message #65 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>>>> It wasn't lost. The process output was retrieved and placed into the
>>>> Tramp buffer, w/o Tramp's interaction.
>
> Can you confirm that you no longer believe this to be the case?
I don't know whether it was Tramp's accept-process-output call, or
another one. But yes, this doesn't seem to be a problem.
>> But yesterday's debugging has shown, that ssh ControlMaster
>> seems to be guilty; it cannot handle large amount of data reliably.
>
> So maybe one should disable it by default.
No, not by default. It improves performance.
> How does one disable it? I have trouble understanding from
> tramp-use-ssh-controlmaster-options and
> tramp-ssh-controlmaster-options.
RTFM. The Tramp manual says
--8<---------------cut here---------------start------------->8---
If the ‘~/.ssh/config’ file is configured appropriately for the above
behavior, then any changes to ‘ssh’ can be suppressed with this ‘nil’
setting:
(customize-set-variable 'tramp-use-ssh-controlmaster-options nil)
This should also be set to ‘nil’ if you use the ‘ProxyCommand’ or
‘ProxyJump’ options in your ‘ssh’ configuration.
--8<---------------cut here---------------end--------------->8---
I will extend this, descibing possible problems with a large amount of
data.
>> Yes, that's the place. Tramp waits for a trailing shell prompt that it
>> tells that everything has been sent from remote. But this shell prompt
>> didn't arrive, and Tramp hangs in waiting for this.
>
> So the problem is now that the shell prompt didn't arrive after all,
> because of controlmaster, not because of jsonrpc.el. Can you confirm
> this?
Yes.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 27 Feb 2023 08:00:01 GMT)
Full text and
rfc822 format available.
Message #68 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
> * I don't know how to obtain the pretty Backtrace that you Michael got.
> Michael how do you get that?? Maybe you're using some tramp flag,
> because debug-on-error and debug-on-quit don't work.
There's a trick with tramp-verbose. If you set it to 10, you get a *lot*
of traces in the debug buffer, plus a backtrace in case of errors or quit.
> * If I set
>
> (setq tramp-ssh-controlmaster-options nil
> tramp-use-ssh-controlmaster-options nil)
>
> I _no longer_ experience the hang. This is after visiting 5 or 6
> files, trying some completion, M-. xref-find-definitions, etc.
Don't touch tramp-ssh-controlmaster-options, it is meant for internal purposes.
> Michael can you confirm that these are per-buffer values? Would it
> make sense for Eglot to set them like so in its minor mode hook?
No, they are global IIRC. I haven't checked. Maybe we could use them as
connection-local variables, that needs further checks.
And it won't help if a user has enabled ControlMaster in her
~/.ssh/config. So we must speak about in the doc anyway.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 27 Feb 2023 09:34:02 GMT)
Full text and
rfc822 format available.
Message #71 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> João Távora <joaotavora <at> gmail.com> writes:
>
> Hi João,
>
>>>>> It wasn't lost. The process output was retrieved and placed into the
>>>>> Tramp buffer, w/o Tramp's interaction.
>>
>> Can you confirm that you no longer believe this to be the case?
>
> I don't know whether it was Tramp's accept-process-output call, or
> another one. But yes, this doesn't seem to be a problem.
>
>>> But yesterday's debugging has shown, that ssh ControlMaster
>>> seems to be guilty; it cannot handle large amount of data reliably.
>>
>> So maybe one should disable it by default.
>
> No, not by default. It improves performance.
So would you buy a very fast car that only explodes 10% of the time :-)
I think that these kinds of features should be opt-in, at least until
one is more sure of their stability. Here, this looks like an unstable
feature.
>> How does one disable it? I have trouble understanding from
>> tramp-use-ssh-controlmaster-options and
>> tramp-ssh-controlmaster-options.
>
> RTFM. The Tramp manual says
>
> If the ‘~/.ssh/config’ file is configured appropriately for the above
> behavior, then any changes to ‘ssh’ can be suppressed with this ‘nil’
> setting:
>
> (customize-set-variable 'tramp-use-ssh-controlmaster-options nil)
>
> This should also be set to ‘nil’ if you use the ‘ProxyCommand’ or
> ‘ProxyJump’ options in your ‘ssh’ configuration.
I had read the docstrings. Neither the docstrings nor this manual entry
unequivocally states how to _guarante_ the SSH "controlmaster" are _not_
used for TRAMP. It is something like this?
(setq tramp-use-ssh-controlmaster-options t
tramp-ssh-controlmaster-options "")
I understand you appreciate this feature and it has advantages
elsewhere. But I we can recognize that, as things stand, it is a
non-essential performance optimization that wrecks Eglot functionally.
Until the problem is fixed, it's important for Eglot users to know how
to turn it off guaranteedly, at least for the duration Eglot sesssions.
This will helps us clear a big slice of doubt/uncertainty in all the
Eglot TRAMP-related bugs (of which there are many).
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 27 Feb 2023 09:41:01 GMT)
Full text and
rfc822 format available.
Message #74 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> And it won't help if a user has enabled ControlMaster in her
> ~/.ssh/config.
Can't Tramp override this config it it wants to?
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 27 Feb 2023 20:11:01 GMT)
Full text and
rfc822 format available.
Message #77 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>>> So maybe one should disable it by default.
>>
>> No, not by default. It improves performance.
>
> So would you buy a very fast car that only explodes 10% of the time :-)
> I think that these kinds of features should be opt-in, at least until
> one is more sure of their stability. Here, this looks like an unstable
> feature.
It is stable. Except in the case of large amount of data, which is
exceptional for Tramp usage.
> I had read the docstrings. Neither the docstrings nor this manual entry
> unequivocally states how to _guarante_ the SSH "controlmaster" are _not_
> used for TRAMP. It is something like this?
>
> (setq tramp-use-ssh-controlmaster-options t
> tramp-ssh-controlmaster-options "")
>
> I understand you appreciate this feature and it has advantages
> elsewhere. But I we can recognize that, as things stand, it is a
> non-essential performance optimization that wrecks Eglot functionally.
It is essential. I have been beaten by many Tramp users to support it.
> Until the problem is fixed, it's important for Eglot users to know how
> to turn it off guaranteedly, at least for the duration Eglot sesssions.
>
> This will helps us clear a big slice of doubt/uncertainty in all the
> Eglot TRAMP-related bugs (of which there are many).
Yes. I believe we must extend the meaning of
tramp-use-ssh-controlmaster-options. `t' shall mean that Tramp shall
apply its own settings. `nil' shall mean Tramp doesn't do anything, and
settings in ~/.ssh/config shall apply. And a new value, say `suppress',
shall suppress this feature even if enabled in ~/.ssh/config.
I'll try to implement something along this line.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 27 Feb 2023 20:12:02 GMT)
Full text and
rfc822 format available.
Message #80 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
>> And it won't help if a user has enabled ControlMaster in her
>> ~/.ssh/config.
>
> Can't Tramp override this config it it wants to?
See my other message. I'll try to do so if instructed by the user.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 00:09:02 GMT)
Full text and
rfc822 format available.
Message #83 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
>> So would you buy a very fast car that only explodes 10% of the time :-)
>> I think that these kinds of features should be opt-in, at least until
>> one is more sure of their stability. Here, this looks like an unstable
>> feature.
>
> It is stable. Except in the case of large amount of data, which is
> exceptional for Tramp usage.
...is it though? :-) Can a feature that hangs when presented with more
data than usual (however one defines that) be considered stable?
>> I understand you appreciate this feature and it has advantages
>> elsewhere. But I we can recognize that, as things stand, it is a
>> non-essential performance optimization that wrecks Eglot functionally.
>
> It is essential. I have been beaten by many Tramp users to support it.
I'd say something "essential" is something you can't live without. But
that doesn't seem to be the case here. I could use Eglot over Tramp
with decent performance (although this is a local-to-local in disguise).
I wonder what Thomas has to say about performance with a real remote
server when he turns off ControlMaster.
Of course I do understand that it's a very sought after performance
optimization. Doesn't seem to be quite finished, but that's just MHO.
> settings in ~/.ssh/config shall apply. And a new value, say `suppress',
> shall suppress this feature even if enabled in ~/.ssh/config.
>
> I'll try to implement something along this line.
Thanks. I think that is a good start. But can Eglot or the user
somehow set 'suppress' for connections "motivated" by Eglot, i.e. M-x
eglot in some remote file? How?
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 10:39:02 GMT)
Full text and
rfc822 format available.
Message #86 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> It is stable. Except in the case of large amount of data, which is
>> exceptional for Tramp usage.
>
> ...is it though? :-) Can a feature that hangs when presented with more
> data than usual (however one defines that) be considered stable?
Tramp supports ControlMaster since Emacs 24. Eglot is the first case
I've heard of problems. I don't deny it, but it seems to be premature to
me to disable the feature at all, falling back to an opt-in config.
>> It is essential. I have been beaten by many Tramp users to support it.
>
> I'd say something "essential" is something you can't live without. But
> that doesn't seem to be the case here.
Ohh. You haven't seen how much Tramp has been blamed because it doesn't
support it out of the box.
> I could use Eglot over Tramp with decent performance (although this is
> a local-to-local in disguise).
Of course. Tramp allows you to opt-out, which is an accepted
configuration possibility. I don't say the way to opt-out is perfect
(it's currently an all-or-nothing), but this can be improved.
>> settings in ~/.ssh/config shall apply. And a new value, say `suppress',
>> shall suppress this feature even if enabled in ~/.ssh/config.
>>
>> I'll try to implement something along this line.
>
> Thanks. I think that is a good start. But can Eglot or the user
> somehow set 'suppress' for connections "motivated" by Eglot, i.e. M-x
> eglot in some remote file? How?
Currently it's not possible. I'll investigate a way to disable
ControlMaster per process. By this, the main Tramp process and other
processes on the remote host could still profit from the performance
boost, and other processes (like Eglot) could opt-out.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 11:33:02 GMT)
Full text and
rfc822 format available.
Message #89 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> João Távora <joaotavora <at> gmail.com> writes:
>
> Hi João,
>
>>> It is stable. Except in the case of large amount of data, which is
>>> exceptional for Tramp usage.
>> ...is it though? :-) Can a feature that hangs when presented with more
>> data than usual (however one defines that) be considered stable?
> Tramp supports ControlMaster since Emacs 24. Eglot is the first case
> I've heard of problems. I don't deny it
That could be because, as far as I understand, Eglot is the first
"major" package to use the transparent (make-process ... :file-handler
t) which, AFAIU, means that Eglot's process, which doesn't really
transmit _that_ much data, is piped transparently through Tramp's
process managing the remote file whose buffer M-x eglot is executed in.
>>> It is essential. I have been beaten by many Tramp users to support it.
>> I'd say something "essential" is something you can't live without. But
>> that doesn't seem to be the case here.
> Ohh. You haven't seen how much Tramp has been blamed because it doesn't
> support it out of the box.
I understand, but it's the occasionally-exploding racecar all over
again. I understand your reasoning: you're betting that it's acceptable
for the ocassionally-exploded user to learn opt-out for a slightly
slower car.
The thing is they will probably blame the steering wheel, Eglot,
because -- Tramp being "transparent" -- that's what they see.
Ideally, we would just fix this. It would be nice to understand what
actually happen and what information is lost, presumably to
ControlMaster's gremlins. I think bringing in Eli's knowledge of
process.c internals could be beneficial here.
> Currently it's not possible. I'll investigate a way to disable
> ControlMaster per process. By this, the main Tramp process and other
> processes on the remote host could still profit from the performance
> boost, and other processes (like Eglot) could opt-out.
One thing I don't understand, here and in other bugs, is the "process"
model of Tramp.
Here are some questions. Apologies in advance if some of them are in
the "FM".
1. When I open a single remote file on a remote host, I am creating
Tramp process, I presume.
2. If I kill this buffer and re-visit it, is the Tramp process re-used
somehow or is a new one created?
3. If I open another nearby file on the same host, I am sharing that
Tramp process?
4. What about in a different host?
5. Is there any way to get an overview of which Tramp processes are
"responsible" for each set of remote files?
6. Now if I M-x eglot in a Tramp remote file that I had been working on
Eglot-less for a while, is the Eglot process tunneled through the
existing Tramp process, reusing it, or is a new one started?
7. If an existing process is reused, is there any way to force Tramp to
open a new one, perhaps with slightly different configuration
options?
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 13:00:02 GMT)
Full text and
rfc822 format available.
Message #92 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>>> ...is it though? :-) Can a feature that hangs when presented with more
>>> data than usual (however one defines that) be considered stable?
>> Tramp supports ControlMaster since Emacs 24. Eglot is the first case
>> I've heard of problems. I don't deny it
>
> That could be because, as far as I understand, Eglot is the first
> "major" package to use the transparent (make-process ... :file-handler
> t) which, AFAIU, means that Eglot's process, which doesn't really
> transmit _that_ much data, is piped transparently through Tramp's
> process managing the remote file whose buffer M-x eglot is executed in.
Eglot is not the first user of this. But Eglot sends a huge amount of
data over the wire, which seems to trigger the problem.
For example, lsp-mode uses start-file-process-shell-command, which calls
under the hood make-process similarly.
>>>> It is essential. I have been beaten by many Tramp users to support it.
>>> I'd say something "essential" is something you can't live without. But
>>> that doesn't seem to be the case here.
>> Ohh. You haven't seen how much Tramp has been blamed because it doesn't
>> support it out of the box.
>
> I understand, but it's the occasionally-exploding racecar all over
> again. I understand your reasoning: you're betting that it's acceptable
> for the ocassionally-exploded user to learn opt-out for a slightly
> slower car.
>
> The thing is they will probably blame the steering wheel, Eglot,
> because -- Tramp being "transparent" -- that's what they see.
>
> Ideally, we would just fix this. It would be nice to understand what
> actually happen and what information is lost, presumably to
> ControlMaster's gremlins. I think bringing in Eli's knowledge of
> process.c internals could be beneficial here.
I'm not against changing Tramp, really. But first I must understand
what's the problem, and why a change will solve it. Without other
collateral damages.
> Here are some questions. Apologies in advance if some of them are in
> the "FM".
The Tramp manual misses detailed implementation notes. On my TODO for
years, but it didn't happen until now.
> 1. When I open a single remote file on a remote host, I am creating
> Tramp process, I presume.
Yes. In Tramp slang, this is the (main) connection process.
> 2. If I kill this buffer and re-visit it, is the Tramp process re-used
> somehow or is a new one created?
The same connection process is used.
> 3. If I open another nearby file on the same host, I am sharing that
> Tramp process?
Yes, as long as the remote identification (method, user, host) are the
same.
> 4. What about in a different host?
A new connection process. Also if just the method or user
changes. "/ssh::" and "/scp::" use different connection processes.
> 5. Is there any way to get an overview of which Tramp processes are
> "responsible" for each set of remote files?
The so called connection buffers. This is always a buffer with the name
"*tramp/METHOD HOST*" or "*tramp/METHOD USER <at> HOST*", depending whether
there is a user name, or not. See function tramp-buffer-name. The Tramp
connection processes are bound to these buffers.
> 6. Now if I M-x eglot in a Tramp remote file that I had been working on
> Eglot-less for a while, is the Eglot process tunneled through the
> existing Tramp process, reusing it, or is a new one started?
It depends. A synchronous process, created by process-file, is tunnelled
through the connection process. An asynchronous process, created by
make-process or start-file-process, is a new one.
> 7. If an existing process is reused, is there any way to force Tramp to
> open a new one, perhaps with slightly different configuration
> options?
No need for synchronous processes. And for asynchronous processes, they
have their own process. If configuration options are changed prior the
make-process call, they apply.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 14:19:01 GMT)
Full text and
rfc822 format available.
Message #95 received at 61350 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
João Távora <joaotavora <at> gmail.com> writes:
Hi Joao,
>> settings in ~/.ssh/config shall apply. And a new value, say `suppress',
>> shall suppress this feature even if enabled in ~/.ssh/config.
>>
>> I'll try to implement something along this line.
>
> Thanks. I think that is a good start. But can Eglot or the user
> somehow set 'suppress' for connections "motivated" by Eglot, i.e. M-x
> eglot in some remote file? How?
The appended patch fixes the issue for me with Thomas' example. It binds
tramp-use-ssh-controlmaster-options to nil before calling make-process,
and that's sufficient for the majority of users.
Users, who have set ControlMaster in their ~/.ssh/config (like Thomas),
are not satified. For them we need another setting of
tramp-use-ssh-controlmaster-options. I'm working on this.
> João
Best regards, Michael.
[Message part 2 (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 14:42:02 GMT)
Full text and
rfc822 format available.
Message #98 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Tue, Feb 28, 2023 at 12:59 PM Michael Albinus <michael.albinus <at> gmx.de> wrote:
> Eglot is not the first user of this. But Eglot sends a huge amount of
> data over the wire, which seems to trigger the problem.
>
> For example, lsp-mode uses start-file-process-shell-command, which calls
> under the hood make-process similarly.
Do you have reasons to think that lsp-mode's approach is better in
this regard? I'd be happy to switch, if feasible, but judging from
LSP mode's bug tracker it doesn't look very good. There are
quite a few Tramp-related issues and some of them seem to reference
hangs too.
So Eglot might not be the first ever victim here.
> I'm not against changing Tramp, really. But first I must understand
> what's the problem, and why a change will solve it. Without other
> collateral damages.
Of course. I just happen think changing the default of a variable
to a more conservative and safe one is a good starting point.
> > Here are some questions.
> [...]
> No need for synchronous processes. And for asynchronous processes, they
> have their own process. If configuration options are changed prior the
> make-process call, they apply.
That's good news. Thank you for your answers.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 14:52:02 GMT)
Full text and
rfc822 format available.
Message #101 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Tue, Feb 28, 2023 at 2:18 PM Michael Albinus <michael.albinus <at> gmx.de> wrote:
> The appended patch fixes the issue for me with Thomas' example. It binds
> tramp-use-ssh-controlmaster-options to nil before calling make-process,
> and that's sufficient for the majority of users.
Thanks. Something like that patch is acceptable for now.
> Users, who have set ControlMaster in their ~/.ssh/config (like Thomas),
> are not satified. For them we need another setting of
> tramp-use-ssh-controlmaster-options. I'm working on this.
Looking at the Tramp code, I think Thomas's idea of using
-o "ControlMaster=no" -o "ControlPath=none"
would probably be quite effective. It would work with previous
Tramp versions that Eglot users might be using. Thomas,
can you try this patch?
diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index e20d209332d..83462633dd9 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -130,6 +130,8 @@
(defvar markdown-fontify-code-blocks-natively)
(defvar company-backends)
(defvar company-tooltip-align-annotations)
+(defvar tramp-ssh-controlmaster-options)
+(defvar tramp-use-ssh-controlmaster-options)
@@ -1247,7 +1249,15 @@ eglot--connect
(contact (cl-subseq contact 0 probe)))
`(:process
,(lambda ()
- (let ((default-directory default-directory))
+ (let ((default-directory default-directory)
+ ;; bug#61350: Tramp turns a feature on
+ ;; by default that can't (yet) handle
+ ;; very much data so we turn it off
+ ;; unconditionally -- just for our
+ ;; process.
+ (tramp-use-ssh-controlmaster-options t)
+ (tramp-ssh-controlmaster-options
+ "-o ControlMaster=no -o ControlPath=none"))
(make-process
:name readable-name
:command (setq server-info (eglot--cmd contact))
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 15:02:02 GMT)
Full text and
rfc822 format available.
Message #104 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
> Looking at the Tramp code, I think Thomas's idea of using
>
> -o "ControlMaster=no" -o "ControlPath=none"
>
> would probably be quite effective. It would work with previous
> Tramp versions that Eglot users might be using.
Yes, that I have also in mind.
> Thomas, can you try this patch?
>
> diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
> index e20d209332d..83462633dd9 100644
> --- a/lisp/progmodes/eglot.el
> +++ b/lisp/progmodes/eglot.el
> @@ -130,6 +130,8 @@
> (defvar markdown-fontify-code-blocks-natively)
> (defvar company-backends)
> (defvar company-tooltip-align-annotations)
> +(defvar tramp-ssh-controlmaster-options)
> +(defvar tramp-use-ssh-controlmaster-options)
>
>
>
> @@ -1247,7 +1249,15 @@ eglot--connect
> (contact (cl-subseq contact 0 probe)))
> `(:process
> ,(lambda ()
> - (let ((default-directory default-directory))
> + (let ((default-directory default-directory)
> + ;; bug#61350: Tramp turns a feature on
> + ;; by default that can't (yet) handle
> + ;; very much data so we turn it off
> + ;; unconditionally -- just for our
> + ;; process.
> + (tramp-use-ssh-controlmaster-options t)
> + (tramp-ssh-controlmaster-options
> + "-o ControlMaster=no -o ControlPath=none"))
> (make-process
> :name readable-name
> :command (setq server-info (eglot--cmd contact))
Even better than my proposal :-)
There is the small risk that a user runs a local ssh client which is
not aware of the ControlMaster option. For such a user, Eglot on a
remote host might fail.
Tramp is busy to check, whether ControlMaster could be applied.
OTOH, I don't know if such ssh clients are still used in the wild, at
least by Eglot users.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 17:56:02 GMT)
Full text and
rfc822 format available.
Message #107 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thank you both for being so constructively engaged in the bug I opened.
I'll give Joao's patch a try tomorrow. (I'm too tired now.) However I don't see ControlMaster as the root cause at the moment.
My current working theory is:
1. There is some buffer in SSH (or TCP) that gets filled by the language server sending data to eglot.
2. Tramp sends a command over SSH.
3. Tramp sets the JUST-THIS-ONE option of accept-process-output to t since https://debbugs.gnu.org/12145.
4. The response to 2. can not arrive due to the buffer being filled in 1. Tramp blocks the emacs main thread.
I tested my theory by deleting (and thus disabling) the JUST-THIS-ONE argument in all calls to accept-process-output in tramp.el and tramp-sh.el. Eglot did not freeze anymore in two tests with two different systems (but the same jdtls binary on the same Debian version). In one test however I saw this in the message buffer:
[jsonrpc] (warning) Invalid JSON: (control character 0xd <callback> 1 381 381) {"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{"uri":"file:///home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigSearchBox.java","diagnostics":[{"range":{"start":{"line":35,"character":86},"end":{"line":35,"character":94}},"severity":3,"code":"1102","source":"Java","message":"At least one of the problems in category \u0027unuseContent-Length: 798
{"jsonrpc":"2.0","method":"textDocument/publ
I already sent the above to Michael in an out-of-band thread (thanks for your patience!).
I have a vague feeling, that Tramp could be improved with a work queue such that requests to tramp from notification or timer threads get blocked while another tramp command is still waiting for a reply. Additionally I understand that Joao has an idea to use markers controlled by Tramp. - I'm sorry that I can not (yet) contribute with putting both ideas into code.
There are two statements in this bug thread that I don't yet understand and that might be worth more elaboration. I volunteer as a debugging rubber duck[1]. However if you both understand each other then never mind:
[1] https://en.wikipedia.org/wiki/Rubber_duck_debugging
Joao: "markers in the process buffer is what is used commonly"
I understand that tramp sends only one command at a time per connection. Otherwise the reentrant error is thrown. The command result gets written to a connection-specific buffer. The output is parsed from point-min and the buffer content deleted after parsing and before sending the next command. So what should there be improved? Jsonrpc needs its own markers because N messages can arrive at any given time and the buffer can not be deleted after each message.
Michael: "If another process has consumed the output, even if it is pushed into the correct buffer, Tramp doesn't know."
What exact scenario should that be? Emacs writes the output (if there's no filter) in the correct buffer for the process. Tramp might call accept-process-output unnecessarily because the output is already in the buffer due to a accept-process-output call of some other code. But in this case Tramp will find the output there eventually, maybe after having waited until a timeout. This could be improved by checking for already present output before calling accept-process-output?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 28 Feb 2023 19:37:01 GMT)
Full text and
rfc822 format available.
Message #110 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> João Távora <joaotavora <at> gmail.com> writes:
>
> Hi João,
>
>> Looking at the Tramp code, I think Thomas's idea of using
>>
>> -o "ControlMaster=no" -o "ControlPath=none"
>>
>> would probably be quite effective. It would work with previous
>> Tramp versions that Eglot users might be using.
>
> Yes, that I have also in mind.
>
>> Thomas, can you try this patch?
>>
>> diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
>> index e20d209332d..83462633dd9 100644
>> --- a/lisp/progmodes/eglot.el
>> +++ b/lisp/progmodes/eglot.el
>> @@ -130,6 +130,8 @@
>> (defvar markdown-fontify-code-blocks-natively)
>> (defvar company-backends)
>> (defvar company-tooltip-align-annotations)
>> +(defvar tramp-ssh-controlmaster-options)
>> +(defvar tramp-use-ssh-controlmaster-options)
>>
>>
>>
>> @@ -1247,7 +1249,15 @@ eglot--connect
>> (contact (cl-subseq contact 0 probe)))
>> `(:process
>> ,(lambda ()
>> - (let ((default-directory default-directory))
>> + (let ((default-directory default-directory)
>> + ;; bug#61350: Tramp turns a feature on
>> + ;; by default that can't (yet) handle
>> + ;; very much data so we turn it off
>> + ;; unconditionally -- just for our
>> + ;; process.
>> + (tramp-use-ssh-controlmaster-options t)
>> + (tramp-ssh-controlmaster-options
>> + "-o ControlMaster=no -o ControlPath=none"))
>> (make-process
>> :name readable-name
>> :command (setq server-info (eglot--cmd contact))
>
> Even better than my proposal :-)
>
> There is the small risk that a user runs a local ssh client which is
> not aware of the ControlMaster option. For such a user, Eglot on a
> remote host might fail.
>
> Tramp is busy to check, whether ControlMaster could be applied.
>
> OTOH, I don't know if such ssh clients are still used in the wild, at
> least by Eglot users.
There is this risk indeed. But considering the two demographics:
1. People who use ControlMaster in ~/.ssh/config
2. People who have very old ssh clients
I think that 2 is probably smaller than 1, especially if ControlMaster
is such a momentous thing. So i think the risk is worth taking.
Especially because an error for people in 2. is easier to recognize,
understand and report than an inexplicable Emacs hang. And Eglot is a
:core ELPA package, so bugfixes can be expedited.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 08:45:02 GMT)
Full text and
rfc822 format available.
Message #113 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> OTOH, I don't know if such ssh clients are still used in the wild, at
>> least by Eglot users.
>
> There is this risk indeed. But considering the two demographics:
>
> 1. People who use ControlMaster in ~/.ssh/config
> 2. People who have very old ssh clients
>
> I think that 2 is probably smaller than 1, especially if ControlMaster
> is such a momentous thing. So i think the risk is worth taking.
> Especially because an error for people in 2. is easier to recognize,
> understand and report than an inexplicable Emacs hang. And Eglot is a
> :core ELPA package, so bugfixes can be expedited.
I agree with your judgement. But we know that it is just a workaround. I
would sleep much better if I would know what's the root cause of the
problem.
Thomas seems to be on a good path with his analysis. Hopefully, we can
find something more on his way.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 10:47:01 GMT)
Full text and
rfc822 format available.
Message #116 received at 61350 <at> debbugs.gnu.org (full text, mbox):
>> Users, who have set ControlMaster in their ~/.ssh/config (like Thomas),
>> are not satified. For them we need another setting of
>> tramp-use-ssh-controlmaster-options. I'm working on this.
>
> Looking at the Tramp code, I think Thomas's idea of using
>
> -o "ControlMaster=no" -o "ControlPath=none"
>
> would probably be quite effective.
>
Instead of turning ControlMaster off, I suggest to first try to set
ControlPersist on. The "freeze" problem described here is typical of
multiplexing an ssh connection with ControlMaster set without also setting
ControlPersist. (If you set ControlMaster without also setting
ControlPersist, the socket is closed when the connection by which it was
created is closed, even if other connections are still using it.)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 11:07:01 GMT)
Full text and
rfc822 format available.
Message #119 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Gregory Heytings <gregory <at> heytings.org> writes:
>>> Users, who have set ControlMaster in their ~/.ssh/config (like
>>> Thomas), are not satified. For them we need another setting of
>>> tramp-use-ssh-controlmaster-options. I'm working on this.
>>
>> Looking at the Tramp code, I think Thomas's idea of using
>>
>> -o "ControlMaster=no" -o "ControlPath=none"
>>
>> would probably be quite effective.
>>
>
> Instead of turning ControlMaster off, I suggest to first try to set
> ControlPersist on. The "freeze" problem described here is typical of
> multiplexing an ssh connection with ControlMaster set without also
> setting ControlPersist. (If you set ControlMaster without also
> setting ControlPersist, the socket is closed when the connection by
> which it was created is closed, even if other connections are still
> using it.)
Still hangs. I tried my previously described test with this patch:
diff --git a/lisp/net/tramp-sh.el b/lisp/net/tramp-sh.el
index ec8437176db..f9ac83e4008 100644
--- a/lisp/net/tramp-sh.el
+++ b/lisp/net/tramp-sh.el
@@ -4974,10 +4974,10 @@ tramp-ssh-controlmaster-options
(when (zerop
(tramp-call-process
vec "ssh" nil nil nil
- "-G" "-o" "ControlPersist=no" "0.0.0.1"))
+ "-G" "-o" "ControlPersist=yes" "0.0.0.1"))
(setq tramp-ssh-controlmaster-options
(concat tramp-ssh-controlmaster-options
- " -o ControlPersist=no")))))))
+ " -o ControlPersist=yes")))))))
tramp-ssh-controlmaster-options)))
(defun tramp-scp-strict-file-name-checking (vec)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 11:15:02 GMT)
Full text and
rfc822 format available.
Message #122 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
>> I think that 2 is probably smaller than 1, especially if ControlMaster
>> is such a momentous thing. So i think the risk is worth taking.
>> Especially because an error for people in 2. is easier to recognize,
>> understand and report than an inexplicable Emacs hang. And Eglot is a
>> :core ELPA package, so bugfixes can be expedited.
>
> I agree with your judgement.
I just pushed the patch to Emacs 29.
> But we know that it is just a workaround. I would sleep much better if
> I would know what's the root cause of the problem. Thomas seems to be
> on a good path with his analysis. Hopefully, we can find something
> more on his way.
Agree.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 11:24:02 GMT)
Full text and
rfc822 format available.
Message #125 received at 61350 <at> debbugs.gnu.org (full text, mbox):
>
> Still hangs. I tried my previously described test with this patch:
>
Are you sure that this patch actually turns ControlPersist on? Instead of
trying to modify tramp-sh.el, I would try to add a section like
Host foo
...
ControlMaster yes
ControlPath /some/path/%r@%h:%p
ControlPersist yes
in my .ssh/config.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 11:36:02 GMT)
Full text and
rfc822 format available.
Message #128 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Wed, Mar 1, 2023 at 11:23 AM Gregory Heytings <gregory <at> heytings.org> wrote:
> Are you sure that this patch actually turns ControlPersist on?
From Tramp's debug log.
11:34:29.869596 tramp-maybe-open-connection (3) # Sending command
‘exec ssh -l sshuser -p 2022 -o ControlMaster=auto -o
ControlPath=tramp.%C -o ControlPersist=yes -e none localhost’
11:34:29.869679 tramp-send-command (6) # exec ssh -l sshuser -p 2022
-o ControlMaster=auto -o ControlPath=tramp.%C -o ControlPersist=yes -e
none localhost
11:34:29.869814 tramp-process-actions (3) # Waiting for prompts from
remote shell...
> I would try to add a section like in my .ssh/config.
You can try your own experiments if you follow my Docker-based MRE.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 14:10:02 GMT)
Full text and
rfc822 format available.
Message #131 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
> Thank you both for being so constructively engaged in the bug I opened.
>
> I'll give Joao's patch a try tomorrow. (I'm too tired now.) However I don't see ControlMaster as the root cause at the moment.
>
> My current working theory is:
>
> 1. There is some buffer in SSH (or TCP) that gets filled by the language server sending data to eglot.
> 2. Tramp sends a command over SSH.
> 3. Tramp sets the JUST-THIS-ONE option of accept-process-output to t since https://debbugs.gnu.org/12145.
> 4. The response to 2. can not arrive due to the buffer being filled in 1. Tramp blocks the emacs main thread.
> I tested my theory by deleting (and thus disabling) the JUST-THIS-ONE
> argument in all calls to accept-process-output in tramp.el and
> tramp-sh.el. Eglot did not freeze anymore
I don't think your observation necessarily validates your theory. If in
'1.' you mean _an Emacs buffer_, the buffer in question is populated by
Tramp's process filter (the default process filter, that is). Then
accept-process-output with non-nil JUST-THIS-ONE would not prevent that
buffer from getting the all the output, becasue the "THIS-ONE" is the
process.
But JUST-THIS-ONE _is_ relevant when there is more than one process.
Here, there is. There's one process, the jsonrpc.el process, henceforth
'jprocess', and the Tramp process, henceforth 'tprocess'. jprocess
receives only JSONRPC data from the LSP server. It "thinks" it is
talking directly to a JSONRPC server, but in Tramp scenarios it is being
fed data from tprocess, which is the process connected to the remote
host. In tprocess, other things, such as shell interactions are going
on.
Michael can probably confirm, correct or deny this.
When one (accept-process-output tprocess nil nil 'JUST-THIS-ONE=t) one
must be absolutely sure that tprocess is going to send _something_
"soon". If it doesn't, we'll hang indefinitely (until the process dies
or the user quits)
That's what has been confirmed through a backtrace. It's a particular
accept-process-output call in tramp-wait-for-regexp that hangs,
understandibly so.
The need to ensure safety of that call is why there is a prior for any
'found' messages in 'tramp-wait-for-regexp'. Only then is the "risky"
'accept-process-output' attempted.
(defun tramp-wait-for-regexp (proc timeout regexp)
...
(let ((found (tramp-check-for-regexp proc regexp)))
(cond (...)
(t
(while (not found)
(tramp-accept-process-output proc) ;; WE "HANG" HERE SOMETIMES
(unless (process-live-p proc)
(tramp-error-with-buffer
nil proc 'file-error "Process has died"))
(setq found (tramp-check-for-regexp proc regexp)))))
'found' relies on searching the contents of what is already in
tprocess's buffer, which tprocess's filter dumped there. So far so
good.
Now, 'tramp-check-for-regexp' uses a somewhat non-standard technique of
searching for messages: it searches them from the back, from the end of
the tprocess's buffer. I don't know what motivated this, but I find it
odd. I find one of its callees, tramp-search-regexp, particularly
suspicious:
(defun tramp-search-regexp (regexp)
"Search for REGEXP backwards, starting at point-max.""
(goto-char (point-max))
;; We restrict ourselves to the last 256 characters. There were
;; reports of a shell command "git ls-files -zco --exclude-standard"
;; with 85k files involved, which has blocked Tramp forever.
(re-search-backward regexp (max (point-min) (- (point) 256)) 'noerror))
See the comment there? Only 256 characters back are inspected.
So, finally, here's my conjecture:
1. Tramp goes into 'tramp-wait-for-regexp'. tprocess's buffer already
the message that 'found' is supposed to return, but it also has a lot
more stuff, say a lot of JSONRPC data from the LSP server that also
came into that tprocess buffer and is awaiting to be delivered to
jprocess.
2. This data is for piping into jprocess, where the JSONRPC message will
be decoded, but it will probably never arrive at its destination.
3. 'found' will be nil in tramp-wait-for-regexp, because of the
tramp-search-regexp limitation.
4. tramp-wait-for-regexp will issue the "risky" accept-process-output
call.
5. there is no more data that accept-process-output wants to put in the
buffer, because the LSP server is fine for the moment.
6. Emacs hang
Just a conjecture.
More comments to your comments.
> Jsonrpc needs its own markers because N messages can arrive at any
> given time and the buffer can not be deleted after each message.
Jsonrpc operates in jprocess's buffer, which is separate.
> I have a vague feeling, that Tramp could be improved with a work queue
> such that requests to tramp from notification or timer threads get
> blocked while another tramp command is still waiting for a
> reply.
There are no (usable) threads in Emacs. Timers are events, and so are
runs of each processe's process filter. Those two are what creates
asynchronicity and the emulation of simultaneity in Emacs. When
jprocess's filter sees a whole JSONRPC message, it calls the message
handler.
> Joao: "markers in the process buffer is what is used commonly"
Markers are positions in the buffer that maintain relative positions if
you add text before them. Point is a marker, and so is the
process-mark, which marks where process last left input. With that
concept in place, it's usually easi(er) to code processing of messages.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 14:52:01 GMT)
Full text and
rfc822 format available.
Message #134 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi,
> On Wed, Mar 1, 2023 at 11:23 AM Gregory Heytings <gregory <at> heytings.org> wrote:
>
>> Are you sure that this patch actually turns ControlPersist on?
>
> From Tramp's debug log.
>
> 11:34:29.869596 tramp-maybe-open-connection (3) # Sending command
> ‘exec ssh -l sshuser -p 2022 -o ControlMaster=auto -o
> ControlPath=tramp.%C -o ControlPersist=yes -e none localhost’
> 11:34:29.869679 tramp-send-command (6) # exec ssh -l sshuser -p 2022
> -o ControlMaster=auto -o ControlPath=tramp.%C -o ControlPersist=yes -e
> none localhost
> 11:34:29.869814 tramp-process-actions (3) # Waiting for prompts from
> remote shell...
I confirm João's observation. Setting ControlPersist=yes doesn't fix the
problem. And it is clear why it doesn't change anything:
- You open a remote connection. Tramp's main connection process becomes
the ControlMaster.
- Eglot is called, which reuses this ControlMaster socket. Both the
Tramp main connection process, and the Eglot process, persist, and are
used for data exchange. It doesn't make a difference what value we
have for ControlPersist.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 15:03:01 GMT)
Full text and
rfc822 format available.
Message #137 received at 61350 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
>
> I confirm João's observation. Setting ControlPersist=yes doesn't fix the
> problem.
>
Okay, thanks. That was just a suggestion, I've seen similar hanging
problems in setups without ControlPersist set, and AFAIR that setting
wasn't mentioned in this thread.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 01 Mar 2023 16:21:02 GMT)
Full text and
rfc822 format available.
Message #140 received at 61350 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Wed, Mar 1, 2023, 14:08 João Távora <joaotavora <at> gmail.com> wrote:
> in Tramp scenarios it is being
> fed data from tprocess, which is the process connected to the remote
> host. In tprocess, other things, such as shell interactions are going
> on.
>
> Michael can probably confirm, correct or deny this.
>
In the mean time, someone reached out to me to point out this is the wrong
model for how things work in this scenario, particularly in the relation
between jprocess and tprocess. That may very well be. I'm very sure those
two processes exist, but they don't share data like that.
João
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 02 Mar 2023 10:41:01 GMT)
Full text and
rfc822 format available.
Message #143 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
Hi Thomas,
> My current working theory is:
>
> 1. There is some buffer in SSH (or TCP) that gets filled by the language server sending data to eglot.
>
> 2. Tramp sends a command over SSH.
>
> 3. Tramp sets the JUST-THIS-ONE option of accept-process-output to t since https://debbugs.gnu.org/12145.
>
> 4. The response to 2. can not arrive due to the buffer being filled in 1. Tramp blocks the emacs main thread.
It is a little bit more specific. Tramp speaks with the remote shell in
order to execute commands like '(file-truename "...")'. This
is transformed by Tramp into the shell command
--8<---------------cut here---------------start------------->8---
\readlink --canonicalize-missing /home/albinus/src/yacy_search_server/examples/SimpleSearchClient/pom.xml 2>/dev/null; echo tramp_exit_status $?
--8<---------------cut here---------------end--------------->8---
Tramp expects the result of the readlink call, and the return value (in
order to know whether this result can be used). And it expects a special
shell prompt, which is the indication, that the remote command has been
finished. So it calls tramp-accept-process-output, unitl the connection
buffer contains something like
--8<---------------cut here---------------start------------->8---
/home/albinus/src/yacy_search_server/examples/SimpleSearchClient/pom.xml
tramp_exit_status 0
///4b3b7d4fa561141e84c94a1cf25e8575#$
--8<---------------cut here---------------end--------------->8---
This special shell prompt is initialized at process startup, by setting
PS1.
This works fine, until there is the concurrency with the Eglot
process. According to my debugging, Tramp sees only part of the
output. The final shell prompt, ///4b3b7d4fa561141e84c94a1cf25e8575#$,
doesn't arrive, and so Tramp is blocked.
> I tested my theory by deleting (and thus disabling) the JUST-THIS-ONE
> argument in all calls to accept-process-output in tramp.el and
> tramp-sh.el. Eglot did not freeze anymore in two tests with two
> different systems (but the same jdtls binary on the same Debian
> version).
Yes. But I still don't understand what's the difference in scenario.
> In one test however I saw this in the message buffer:
>
> [jsonrpc] (warning) Invalid JSON: (control character 0xd <callback> 1
> 381 381)
> {"jsonrpc":"2.0","method":"textDocument/publishDiagnostics","params":{"uri":"file:///home/thk/git/yacy_search_server/source/net/yacy/htroot/ConfigSearchBox.java","diagnostics":[{"range":{"start":{"line":35,"character":86},"end":{"line":35,"character":94}},"severity":3,"code":"1102","source":"Java","message":"At
> least one of the problems in category \u0027unuseContent-Length: 798
>
> {"jsonrpc":"2.0","method":"textDocument/publ
>
> I already sent the above to Michael in an out-of-band thread (thanks
> for your patience!).
I suspect this is due to the "Forbidden reentrant call", but it isn't obvious.
> I have a vague feeling, that Tramp could be improved with a work queue
> such that requests to tramp from notification or timer threads get
> blocked while another tramp command is still waiting for a
> reply.
Exactly. This is what I want to achieve by using threads for protecting
the communication with the remote processes. Or at least with the main
connection process.
> Additionally I understand that Joao has an idea to use markers
> controlled by Tramp. - I'm sorry that I can not (yet) contribute with
> putting both ideas into code.
>
> There are two statements in this bug thread that I don't yet understand and that might be worth more elaboration. I volunteer as a debugging rubber duck[1]. However if you both understand each other then never mind:
>
> [1] https://en.wikipedia.org/wiki/Rubber_duck_debugging
>
> Joao: "markers in the process buffer is what is used commonly"
>
> I understand that tramp sends only one command at a time per
> connection. Otherwise the reentrant error is thrown. The command
> result gets written to a connection-specific buffer. The output is
> parsed from point-min and the buffer content deleted after parsing and
> before sending the next command. So what should there be improved?
> Jsonrpc needs its own markers because N messages can arrive at any
> given time and the buffer can not be deleted after each message.
Meanwhile I don't believe any longer that markers are the problem.
> Michael: "If another process has consumed the output, even if it is pushed into the correct buffer, Tramp doesn't know."
>
> What exact scenario should that be? Emacs writes the output (if
> there's no filter) in the correct buffer for the process. Tramp might
> call accept-process-output unnecessarily because the output is already
> in the buffer due to a accept-process-output call of some other
> code. But in this case Tramp will find the output there eventually,
> maybe after having waited until a timeout. This could be improved by
> checking for already present output before calling
> accept-process-output?
I stand corrected. The problem isn't that another process has consumed
the output, but rather that not everything arrives. For whatever reason.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 02 Mar 2023 11:02:01 GMT)
Full text and
rfc822 format available.
Message #146 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
> But JUST-THIS-ONE _is_ relevant when there is more than one process.
> Here, there is. There's one process, the jsonrpc.el process, henceforth
> 'jprocess', and the Tramp process, henceforth 'tprocess'. jprocess
> receives only JSONRPC data from the LSP server. It "thinks" it is
> talking directly to a JSONRPC server, but in Tramp scenarios it is being
> fed data from tprocess, which is the process connected to the remote
> host. In tprocess, other things, such as shell interactions are going
> on.
>
> Michael can probably confirm, correct or deny this.
More or less correct. But I still can't say which process gets output
when, because I cannot debug accept-process-output (it's a C
function). And running Emacs under gdb changes timings, which is
important I believe.
> When one (accept-process-output tprocess nil nil 'JUST-THIS-ONE=t) one
> must be absolutely sure that tprocess is going to send _something_
> "soon". If it doesn't, we'll hang indefinitely (until the process dies
> or the user quits)
Yes. But Tramp calls accept-process-output only, if it has send a
command to the remote shell, and it expects something to be returned. At
least the shell prompt.
During my tests I have also changed this to
--8<---------------cut here---------------start------------->8---
(while (accept-process-output tprocess 0 nil 'JUST-THIS-ONE))
--8<---------------cut here---------------end--------------->8---
but it didn't help either.
> That's what has been confirmed through a backtrace. It's a particular
> accept-process-output call in tramp-wait-for-regexp that hangs,
> understandibly so.
Yes.
> Now, 'tramp-check-for-regexp' uses a somewhat non-standard technique of
> searching for messages: it searches them from the back, from the end of
> the tprocess's buffer. I don't know what motivated this, but I find it
> odd. I find one of its callees, tramp-search-regexp, particularly
> suspicious:
>
> (defun tramp-search-regexp (regexp)
> "Search for REGEXP backwards, starting at point-max.""
> (goto-char (point-max))
> ;; We restrict ourselves to the last 256 characters. There were
> ;; reports of a shell command "git ls-files -zco --exclude-standard"
> ;; with 85k files involved, which has blocked Tramp forever.
> (re-search-backward regexp (max (point-min) (- (point) 256)) 'noerror))
>
> See the comment there? Only 256 characters back are inspected.
Yes. But the regexp it searches for is the final shell prompt. Something
like "///4b3b7d4fa561141e84c94a1cf25e8575#$", which is shorter than 256
bytes for sure.
> So, finally, here's my conjecture:
>
> 1. Tramp goes into 'tramp-wait-for-regexp'. tprocess's buffer already
> the message that 'found' is supposed to return, but it also has a lot
> more stuff, say a lot of JSONRPC data from the LSP server that also
> came into that tprocess buffer and is awaiting to be delivered to
> jprocess.
>
> 2. This data is for piping into jprocess, where the JSONRPC message will
> be decoded, but it will probably never arrive at its destination.
>
> 3. 'found' will be nil in tramp-wait-for-regexp, because of the
> tramp-search-regexp limitation.
>
> 4. tramp-wait-for-regexp will issue the "risky" accept-process-output
> call.
>
> 5. there is no more data that accept-process-output wants to put in the
> buffer, because the LSP server is fine for the moment.
>
> 6. Emacs hang
>
> Just a conjecture.
Yes, this is more or less the scenario. But I still don't understand why
not all data are delivered through the socket ssh is using. Could it be
there is a limitation, how much data could be buffered by ssh?
>> I have a vague feeling, that Tramp could be improved with a work queue
>> such that requests to tramp from notification or timer threads get
>> blocked while another tramp command is still waiting for a
>> reply.
>
> There are no (usable) threads in Emacs.
There are. I made Tramp using threads, and it worked fine, when no
interactive dialogue inside a thread happened.
> Timers are events, and so are runs of each processe's process filter.
> Those two are what creates asynchronicity and the emulation of
> simultaneity in Emacs. When jprocess's filter sees a whole JSONRPC
> message, it calls the message handler.
Timers and process filters are the cause of the "Forbidden reentrant
call in Tramp" errors. Wwe must do anything, solving this.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 02 Mar 2023 11:22:02 GMT)
Full text and
rfc822 format available.
Message #149 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
>> Michael can probably confirm, correct or deny this.
> More or less correct.
Actually, I don't think it is correct, at least not the way I meant it:
JSONRPC data never gets into the tprocess's buffer. Experiments also
seem to disprove it.
> But I still can't say which process gets output
> when, because I cannot debug accept-process-output (it's a C
> function). And running Emacs under gdb changes timings, which is
> important I believe.
Yes, we must probably write some gdb scripts. Eli is expert at that,
but I've done some it myself.
>> When one (accept-process-output tprocess nil nil 'JUST-THIS-ONE=t) one
>> must be absolutely sure that tprocess is going to send _something_
>> "soon". If it doesn't, we'll hang indefinitely (until the process dies
>> or the user quits)
>
> Yes. But Tramp calls accept-process-output only, if it has send a
> command to the remote shell, and it expects something to be returned. At
> least the shell prompt.
Yes. Tramp is doing the right thing. It really expects a response to
come. And more often than not, it does. But sometimes it doesn't, and
that's when we hang.
>> See the comment there? Only 256 characters back are inspected.
>
> Yes. But the regexp it searches for is the final shell prompt. Something
> like "///4b3b7d4fa561141e84c94a1cf25e8575#$", which is shorter than 256
> bytes for sure.
OK, but why can't one search for it from where you last left parsing,
(i.e. point) up to process-mark? Anyway, I increased that value
significantly and it didn't make a difference, so this is probably a red
herring.
>> So, finally, here's my conjecture:
>>
>> 1. Tramp goes into 'tramp-wait-for-regexp'. tprocess's buffer already
>> the message that 'found' is supposed to return, but it also has a lot
>> more stuff, say a lot of JSONRPC data from the LSP server that also
>> came into that tprocess buffer and is awaiting to be delivered to
>> jprocess.
>>
>> 2. This data is for piping into jprocess, where the JSONRPC message will
>> be decoded, but it will probably never arrive at its destination.
>>
>> 3. 'found' will be nil in tramp-wait-for-regexp, because of the
>> tramp-search-regexp limitation.
>>
>> 4. tramp-wait-for-regexp will issue the "risky" accept-process-output
>> call.
>>
>> 5. there is no more data that accept-process-output wants to put in the
>> buffer, because the LSP server is fine for the moment.
>>
>> 6. Emacs hang
>>
>> Just a conjecture.
>
> Yes, this is more or less the scenario. But I still don't understand why
> not all data are delivered through the socket ssh is using. Could it be
> there is a limitation, how much data could be buffered by ssh?
After testing with a enhanced tramp-backtrace that prints out the
contents of every process buffer, I don't think my conjecture is
correct.
diff --git a/lisp/net/tramp.el b/lisp/net/tramp.el
index df4b7dfca2c..f24a3b51074 100644
--- a/lisp/net/tramp.el
+++ b/lisp/net/tramp.el
@@ -2212,12 +2212,21 @@ tramp-backtrace
If VEC-OR-PROC is nil, the buffer *debug tramp* is used. FORCE
forces the backtrace even if `tramp-verbose' is less than 10.
This function is meant for debugging purposes."
- (let ((tramp-verbose (if force 10 tramp-verbose)))
+ (let ((tramp-verbose (if force 10 tramp-verbose))
+ (bt (lambda ()
+ (backtrace)
+ (dolist (p (process-list))
+ (let* ((buf (process-buffer p))
+ (name (and buf (buffer-name buf))))
+ (when buf
+ (princ (format "\n--8<---- begin contents of `%s' ------>8---\n" name))
+ (princ (with-current-buffer buf (buffer-string)))
+ (princ (format "\n--8<---- end contents of `%s' ------>8---\n" name))))))))
(when (>= tramp-verbose 10)
(if vec-or-proc
(tramp-message
- vec-or-proc 10 "\n%s" (with-output-to-string (backtrace)))
- (with-output-to-temp-buffer "*debug tramp*" (backtrace))))))
+ vec-or-proc 10 "\n%s" (with-output-to-string (funcall bt)))
+ (with-output-to-temp-buffer "*debug tramp*" (funcall bt))))))
When you press C-g after the hang occurs, the backtrace is correct but
the tprocess buffer is simply empty, according to the new logs. IOW the
response Tramp was waiting for never arrived.
>> There are no (usable) threads in Emacs.
> There are. I made Tramp using threads, and it worked fine, when no
> interactive dialogue inside a thread happened.
Right. There are so-called green threads, which could fit input-waiting
situations like this one, not for achieving true simultaneity. In my
opinion, they don't present any advantage over the evented model, which
we understand much better (wait... except we don't :-) )
I don't think we should do that until we understand what is happening.
>> Timers are events, and so are runs of each processe's process filter.
>> Those two are what creates asynchronicity and the emulation of
>> simultaneity in Emacs. When jprocess's filter sees a whole JSONRPC
>> message, it calls the message handler.
>
> Timers and process filters are the cause of the "Forbidden reentrant
> call in Tramp" errors.
I've had problems with reentrant calls to process filters before, and I
solved them with a timer. Not sure what that error is.
> Wwe must do anything, solving this.
I don't understand what you mean here.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 02 Mar 2023 11:51:02 GMT)
Full text and
rfc822 format available.
Message #152 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>>> Michael can probably confirm, correct or deny this.
>> More or less correct.
>
> Actually, I don't think it is correct, at least not the way I meant it:
> JSONRPC data never gets into the tprocess's buffer. Experiments also
> seem to disprove it.
Hmm, right. *When* output arrives, it is pushed into the right
buffer. But sometimes, the expected output doesn't arrive.
>> But I still can't say which process gets output
>> when, because I cannot debug accept-process-output (it's a C
>> function). And running Emacs under gdb changes timings, which is
>> important I believe.
>
> Yes, we must probably write some gdb scripts. Eli is expert at that,
> but I've done some it myself.
If you like, pls. I'm not so fluent with process handling in Emacs C core.
> When you press C-g after the hang occurs, the backtrace is correct but
> the tprocess buffer is simply empty, according to the new logs. IOW the
> response Tramp was waiting for never arrived.
Result is random. Sometimes the buffer is empty, sometimes the buffer
contains the beginning of the output. The point is, that not everything
is contained, and Tramp waits for the final shell prompt.
>> Wwe must do anything, solving this.
>
> I don't understand what you mean here.
Fixing the "Forbidden reentrant call ..." error now and forever.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 05 Mar 2023 11:22:02 GMT)
Full text and
rfc822 format available.
Message #155 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> Yes. But Tramp calls accept-process-output only, if it has send a
>> command to the remote shell, and it expects something to be returned. At
>> least the shell prompt.
>
> Yes. Tramp is doing the right thing. It really expects a response to
> come. And more often than not, it does. But sometimes it doesn't, and
> that's when we hang.
After digging further, I believe I understand now why it hangs. We have
the following scenario:
- Both Eglot and Tramp use the same ssh connection with enabled ControlMaster.
- Eglot gets JSON output from the remote LSP server. In
jsonrpc--process-filter, this output is handled. It includes a call to
file-truename, which triggers Tramp to send a request in its own
process to the remote side.
- The remote side returns the answer for Tramp (shell output). However,
the ssh socket is still full of the jsonrpc process output, which
waits to be handled.
- So the socket is blocked. The jsonrpc output cannot be read, because
jsonrpc--process-filter waits for the result of file-truename. And the
Tramp process output cannot be handled, because it is stuck in the
socket after the jsonrpc output.
The proper solution is indeed to have two connections, and to refuse use
of ControlMaster.
Surprisingly, this is not new to Tramp :-) But I've simply forgotten the
case ...
See tramp-integration.el. There is a comment for bug#45518, a similar
blocking in compile.el. And the solution is indeed to disable ssh
ControlMaster. Tramp hooks into compilation-mode-hook, and sets
tramp-use-ssh-controlmaster-options buffer-local to nil.
I'd like to apply the same solution for eglot.el. Unfortunately, there's
no hook Tramp could use. I've played with eglot-server-initialized-hook,
but this is applied too late.
So would you mind to add a hook to Eglot, which runs before calling
make-process, but in the proper process buffer? When you've added it,
I'll let Tramp hook into, and you don't need any longer the Tramp
specific code in eglot.el.
In parallel, I'll extend tramp-use-ssh-controlmaster-options to accept a
further value 'suppress', which overrides possible settings in ~/.ssh/config.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 05 Mar 2023 11:47:02 GMT)
Full text and
rfc822 format available.
Message #158 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thanks Michael for digging! This is exactly what I also think happens.
However my conclusion is different. I consider the root-cause to be the use of JUST-THIS-ONE in tramps call to accept-process-output. Please check my comment to bug
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=12145
Applying a workaround (adding extra code to surpress ControlMaster with a newly introduced hook) would increase the complexity and thus brittleness. I believe the right thing to do is remove JUST-THIS-ONE in Tramp and fix find-name-dired and any other place that's broken.
As a practical roadmap you could add an option to Tramp to disable JUST-THIS-ONE and recommend its use. Later the options default could be toggled. Eventually the option can go away.
Eglot could emit a warning if it sees the above tramp option not being used. (... could lead to Emacs hanging. Are you sure to continue (y/n)?)
> Michael Albinus <michael.albinus <at> gmx.de> hat am 05.03.2023 13:21 EET geschrieben:
>
>
> João Távora <joaotavora <at> gmail.com> writes:
>
> Hi João,
>
> >> Yes. But Tramp calls accept-process-output only, if it has send a
> >> command to the remote shell, and it expects something to be returned. At
> >> least the shell prompt.
> >
> > Yes. Tramp is doing the right thing. It really expects a response to
> > come. And more often than not, it does. But sometimes it doesn't, and
> > that's when we hang.
>
> After digging further, I believe I understand now why it hangs. We have
> the following scenario:
>
> - Both Eglot and Tramp use the same ssh connection with enabled ControlMaster.
>
> - Eglot gets JSON output from the remote LSP server. In
> jsonrpc--process-filter, this output is handled. It includes a call to
> file-truename, which triggers Tramp to send a request in its own
> process to the remote side.
>
> - The remote side returns the answer for Tramp (shell output). However,
> the ssh socket is still full of the jsonrpc process output, which
> waits to be handled.
>
> - So the socket is blocked. The jsonrpc output cannot be read, because
> jsonrpc--process-filter waits for the result of file-truename. And the
> Tramp process output cannot be handled, because it is stuck in the
> socket after the jsonrpc output.
>
> The proper solution is indeed to have two connections, and to refuse use
> of ControlMaster.
>
> Surprisingly, this is not new to Tramp :-) But I've simply forgotten the
> case ...
>
> See tramp-integration.el. There is a comment for bug#45518, a similar
> blocking in compile.el. And the solution is indeed to disable ssh
> ControlMaster. Tramp hooks into compilation-mode-hook, and sets
> tramp-use-ssh-controlmaster-options buffer-local to nil.
>
> I'd like to apply the same solution for eglot.el. Unfortunately, there's
> no hook Tramp could use. I've played with eglot-server-initialized-hook,
> but this is applied too late.
>
> So would you mind to add a hook to Eglot, which runs before calling
> make-process, but in the proper process buffer? When you've added it,
> I'll let Tramp hook into, and you don't need any longer the Tramp
> specific code in eglot.el.
>
> In parallel, I'll extend tramp-use-ssh-controlmaster-options to accept a
> further value 'suppress', which overrides possible settings in ~/.ssh/config.
>
> > João
>
> Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 05 Mar 2023 12:25:01 GMT)
Full text and
rfc822 format available.
Message #161 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
Hi Thomas,
> Thanks Michael for digging! This is exactly what I also think happens.
>
> However my conclusion is different. I consider the root-cause to be the use of JUST-THIS-ONE in tramps call to accept-process-output. Please check my comment to bug
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=12145
I've seen this. And yes, we shall fix the problems described there.
> Applying a workaround (adding extra code to surpress ControlMaster
> with a newly introduced hook) would increase the complexity and thus
> brittleness. I believe the right thing to do is remove JUST-THIS-ONE
> in Tramp and fix find-name-dired and any other place that's broken.
I won't change Tramp this way in Emacs 29. It is in feature freeze,
closed to pretest. I cannot risk to destabilize it.
And as we see with the find-name-dired case, there might be collateral
damages if we change Tramp this way.
And I don't believe, that adding a hook in Eglot is worse than what we
have now, where Eglot manipulates Tramp data. Whenever we need to change
Tramp in this area, it wouldn't apply in Egloot. A hook is much better
suited, and we use the very same technique already with compile.el.
> As a practical roadmap you could add an option to Tramp to disable
> JUST-THIS-ONE and recommend its use. Later the options default could
> be toggled. Eventually the option can go away.
Might be applicable in the master branch. But first we need much more
checks that it doesn't break something else.
I'm always uncomfortable to change Tramp such a way that other packages
could be broken. Letting Tramp accept process output for all existing
processes doesn't seem to be the right thing to me, although I admit
that it seems to fix the specific problem we're discussing here.
> Eglot could emit a warning if it sees the above tramp option not being
> used. (... could lead to Emacs hanging. Are you sure to continue
> (y/n)?)
No check in Eglot. A hint in the manual would be sufficient I believe.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 06 Mar 2023 12:43:02 GMT)
Full text and
rfc822 format available.
Message #164 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
Hi João,
> In parallel, I'll extend tramp-use-ssh-controlmaster-options to accept a
> further value 'suppress', which overrides possible settings in ~/.ssh/config.
I've pushed a respective change to the master branch. It will also be
included in Tramp 2.6.0.3, the next planned version on GNU ELPA.
If you want to check this feature, you might call
--8<---------------cut here---------------start------------->8---
(safe-local-variable-p 'tramp-use-ssh-controlmaster-options 'suppress)
--8<---------------cut here---------------end--------------->8---
>> João
Best regards, Michael.
PS: Starting on Wednesday, I'll be almost offline for a couple of
days. I have no idea how long, 'tho.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 06 Mar 2023 13:43:02 GMT)
Full text and
rfc822 format available.
Message #167 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Sun, Mar 5, 2023 at 11:21 AM Michael Albinus <michael.albinus <at> gmx.de> wrote:
>
> João Távora <joaotavora <at> gmail.com> writes:
>
> Hi João,
>
> >> Yes. But Tramp calls accept-process-output only, if it has send a
> >> command to the remote shell, and it expects something to be returned. At
> >> least the shell prompt.
> >
> > Yes. Tramp is doing the right thing. It really expects a response to
> > come. And more often than not, it does. But sometimes it doesn't, and
> > that's when we hang.
>
> After digging further, I believe I understand now why it hangs. We have
> the following scenario:
>
> - Both Eglot and Tramp use the same ssh connection with enabled ControlMaster.
>
> - Eglot gets JSON output from the remote LSP server. In
> jsonrpc--process-filter, this output is handled. It includes a call to
> file-truename, which triggers Tramp to send a request in its own
> process to the remote side.
>
> - The remote side returns the answer for Tramp (shell output). However,
> the ssh socket is still full of the jsonrpc process output, which
> waits to be handled.
>
> - So the socket is blocked. The jsonrpc output cannot be read, because
> jsonrpc--process-filter waits for the result of file-truename. And the
> Tramp process output cannot be handled, because it is stuck in the
> socket after the jsonrpc output.
OK, I think I understand. But here, the main point that makes this
deadlock possible is that tramp when called by the jsonrpc process
filter is calling accept-process-output with JUST-THIS-ONE set to true.
_That_ is why jsonrpc--process-filter cannot see the remaining output
that is stuckl in the ssh socket, as you describe.
> The proper solution is indeed to have two connections, and to refuse use
> of ControlMaster.
Here, I disagree. I think, like Thomas, that the proper solution is to
stop passing JUST-THIS-ONE to tramp's a-p-output calls.
> So would you mind to add a hook to Eglot, which runs before calling
> make-process, but in the proper process buffer? When you've added it,
> I'll let Tramp hook into, and you don't need any longer the Tramp
> specific code in eglot.el.
I don't think using this hook for this is the right idea, and I'm
fine with having that bit of Tramp-specific code in there until
we find a better solution. So no need to worry about that.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 06 Mar 2023 13:46:02 GMT)
Full text and
rfc822 format available.
Message #170 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Mon, Mar 6, 2023 at 12:42 PM Michael Albinus <michael.albinus <at> gmx.de> wrote:
>
> Michael Albinus <michael.albinus <at> gmx.de> writes:
> --8<---------------cut here---------------start------------->8---
> (safe-local-variable-p 'tramp-use-ssh-controlmaster-options 'suppress)
> --8<---------------cut here---------------end--------------->8---
Thanks.
As Tramp is a :core GNU ELPA package, I think we can make Eglot
"Package-Require" tramp and use this new variable value. Or we
can just keep using slightly more hacky version for now (since turning
off ControlMaster is kind of a hack intrinsically). I don't
see any rush.
> PS: Starting on Wednesday, I'll be almost offline for a couple of
> days. I have no idea how long, 'tho.
Take care!
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Mar 2023 12:50:02 GMT)
Full text and
rfc822 format available.
Message #173 received at 61350 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Michael Albinus <michael.albinus <at> gmx.de> writes:
Hi Thomas & João,
before going offline just a short status update.
>> However my conclusion is different. I consider the root-cause to be
>> the use of JUST-THIS-ONE in tramps call to
>> accept-process-output. Please check my comment to bug
>> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=12145
>
> I've seen this. And yes, we shall fix the problems described there.
>
>> As a practical roadmap you could add an option to Tramp to disable
>> JUST-THIS-ONE and recommend its use. Later the options default could
>> be toggled. Eventually the option can go away.
>
> Might be applicable in the master branch. But first we need much more
> checks that it doesn't break something else.
>
> I'm always uncomfortable to change Tramp such a way that other packages
> could be broken. Letting Tramp accept process output for all existing
> processes doesn't seem to be the right thing to me, although I admit
> that it seems to fix the specific problem we're discussing here.
Thinking about, I came to a less invasive idea: Just the processes which
possibly use the same socket for the ssh connection should accept their
output in time. This avoids to change the behavior of other processes,
which are not related to a given Tramp connection. And it makes me feel
much better :-)
The small patch below, on top of Tramp 2.6.0.2, seems to fix the
problem. I've also deactivated (in Emacs 29) João's Tramp fix, and I've
applied the recipe given by Thomas 10 times in a row, always starting
with "emacs -Q ...". 10 times success.
Could you please check how it works in your environment? You'll have
some days until I'll be back. If it also works for you, I will add it to
Tramp 2.6.0.3, and we could close this bug.
For completeness, tramp-test44-asynchronous-requests now fails in
tramp-tests.el. But this test is already flaky (not passing successfully
every time), so I'll investigate later what's up with this.
Best regards, Michael.
[Message part 2 (text/x-patch, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Mar 2023 13:05:02 GMT)
Full text and
rfc822 format available.
Message #176 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thanks Michael! What is the advantage of this patch over just removing the JUST-THIS-ONE argument? In both cases tramp is triggering accept-process-output for processes it does not own.
However with your patch there is a potential timing issue: blocking process output could still arrive between your new loop and tramps call to accept-process-output with JUST-THIS-ONE.
I'd think that there might be room for an essay "JUST-THIS-ONE considered evil"?
> Michael Albinus <michael.albinus <at> gmx.de> hat am 07.03.2023 14:49 EET geschrieben:
>
>
> Michael Albinus <michael.albinus <at> gmx.de> writes:
>
> Hi Thomas & João,
>
> before going offline just a short status update.
>
> >> However my conclusion is different. I consider the root-cause to be
> >> the use of JUST-THIS-ONE in tramps call to
> >> accept-process-output. Please check my comment to bug
> >> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=12145
> >
> > I've seen this. And yes, we shall fix the problems described there.
> >
> >> As a practical roadmap you could add an option to Tramp to disable
> >> JUST-THIS-ONE and recommend its use. Later the options default could
> >> be toggled. Eventually the option can go away.
> >
> > Might be applicable in the master branch. But first we need much more
> > checks that it doesn't break something else.
> >
> > I'm always uncomfortable to change Tramp such a way that other packages
> > could be broken. Letting Tramp accept process output for all existing
> > processes doesn't seem to be the right thing to me, although I admit
> > that it seems to fix the specific problem we're discussing here.
>
> Thinking about, I came to a less invasive idea: Just the processes which
> possibly use the same socket for the ssh connection should accept their
> output in time. This avoids to change the behavior of other processes,
> which are not related to a given Tramp connection. And it makes me feel
> much better :-)
>
> The small patch below, on top of Tramp 2.6.0.2, seems to fix the
> problem. I've also deactivated (in Emacs 29) João's Tramp fix, and I've
> applied the recipe given by Thomas 10 times in a row, always starting
> with "emacs -Q ...". 10 times success.
>
> Could you please check how it works in your environment? You'll have
> some days until I'll be back. If it also works for you, I will add it to
> Tramp 2.6.0.3, and we could close this bug.
>
> For completeness, tramp-test44-asynchronous-requests now fails in
> tramp-tests.el. But this test is already flaky (not passing successfully
> every time), so I'll investigate later what's up with this.
>
> Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Mar 2023 13:35:01 GMT)
Full text and
rfc822 format available.
Message #179 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Tue, Mar 7, 2023 at 1:04 PM Thomas Koch <thomas <at> koch.ro> wrote:
>
> Thanks Michael! What is the advantage of this patch over just removing the JUST-THIS-ONE argument?
> In both cases tramp is triggering accept-process-output for processes it does not own.
Yes, I also ask myself that question. Though here, Tramp
has "shared" ownership of them, sort of, so it's more
conservative when compared to the current mechanics.
Michael, your patch goes in the right direction but AFAIR
a single accept-process-output is not guaranteed to bring
into the process filter all the buffered data.
In "40.9.4 Accepting Output from Processes":
[In accept-process-output, ] If PROCESS is
non-‘nil’ then this function does not return until some output has
[^^^^^^^^^^^^^^^^^]
been received from PROCESS or PROCESS has closed the connection.
Note "some", not "all". So while less common, I think the hang
can still happen.
So maybe you meant:
(while (accept-process-output p 0 nil t))
as suggested in that section?
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Mar 2023 13:48:02 GMT)
Full text and
rfc822 format available.
Message #182 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
> Thanks Michael!
Hi Thomas,
> What is the advantage of this patch over just removing the JUST-THIS-ONE argument? In both cases tramp is triggering accept-process-output for processes it does not own.
It touches only processes which share the same connection via ssh
ControlMaster (and putty share). Nothing else.
> However with your patch there is a potential timing issue: blocking
> process output could still arrive between your new loop and tramps
> call to accept-process-output with JUST-THIS-ONE.
You have seen the timing? I'll change the timeout of
accept-process-output to 0. This should prevent blocking.
> I'd think that there might be room for an essay "JUST-THIS-ONE considered evil"?
In my book, one shall touch only data one is related to. Processes, not
related to the same Tramp connection, are taboo.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Mar 2023 13:53:02 GMT)
Full text and
rfc822 format available.
Message #185 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> Thanks Michael! What is the advantage of this patch over just removing the JUST-THIS-ONE argument?
>> In both cases tramp is triggering accept-process-output for processes it does not own.
>
> Yes, I also ask myself that question. Though here, Tramp
> has "shared" ownership of them, sort of, so it's more
> conservative when compared to the current mechanics.
Indeed.
> Michael, your patch goes in the right direction but AFAIR
> a single accept-process-output is not guaranteed to bring
> into the process filter all the buffered data.
>
> In "40.9.4 Accepting Output from Processes":
>
> [In accept-process-output, ] If PROCESS is
> non-‘nil’ then this function does not return until some output has
> [^^^^^^^^^^^^^^^^^]
> been received from PROCESS or PROCESS has closed the connection.
>
>
> Note "some", not "all". So while less common, I think the hang
> can still happen.
>
> So maybe you meant:
>
> (while (accept-process-output p 0 nil t))
>
> as suggested in that section?
That was my first idea as well. But tramp-accept-process-output itself is
called in a loop, so there's no difference in practice[1].
Pls give my patch testing, in order to see whether Tramp still blocks.
> João
Best regards, Michael.
[1]: In theory, there's no difference between theory and practice. In
practice, there is. :-)
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Mar 2023 14:04:02 GMT)
Full text and
rfc822 format available.
Message #188 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Tue, Mar 7, 2023 at 1:52 PM Michael Albinus <michael.albinus <at> gmx.de> wrote:
> > [In accept-process-output, ] If PROCESS is
> > non-‘nil’ then this function does not return until some output has
> > [^^^^^^^^^^^^^^^^^]
> > been received from PROCESS or PROCESS has closed the connection.
> >
> >
> > Note "some", not "all". So while less common, I think the hang
> > can still happen.
> >
> > So maybe you meant:
> >
> > (while (accept-process-output p 0 nil t))
> >
> > as suggested in that section?
>
> That was my first idea as well. But tramp-accept-process-output itself is
> called in a loop, so there's no difference in practice[1].
Hmmm, I still don't think they are equivalent. In tramp-a-p-o with
TIMEOUT set to nil is where it normally hangs, right? Well then, if
it does hang it will _not_ be called in a loop, by definition.
So the prior "hang removal" a-p-o call that you added must _still_
be made in a loop itself to ensure that the outher existing inner
call:
(accept-process-output tprocess nil nil t)
will not block due to the conditions you described.
Regardless, if this were my code. I'd put a big fat FIXME there
explaining that I want to remove the JUST-THIS-ONE, but I'm
afraid to :-)
> [1]: In theory, there's no difference between theory and practice. In
> practice, there is. :-)
hehe :-)
> Pls give my patch testing, in order to see whether Tramp still blocks.
I will do that later. But 10 out of 10 successes in your testing is
a pretty solid indication that this is on the right track.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 07 Mar 2023 14:33:01 GMT)
Full text and
rfc822 format available.
Message #191 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> > So maybe you meant:
>> >
>> > (while (accept-process-output p 0 nil t))
>> >
>> > as suggested in that section?
>>
>> That was my first idea as well. But tramp-accept-process-output itself is
>> called in a loop, so there's no difference in practice[1].
>
> Hmmm, I still don't think they are equivalent. In tramp-a-p-o with
> TIMEOUT set to nil is where it normally hangs, right? Well then, if
> it does hang it will _not_ be called in a loop, by definition.
But the hang happens only when there is another process using the same
socket. As soon as such a parallel process is detected, my patch does
--8<---------------cut here---------------start------------->8---
(setq timeout (or timeout 0))
--8<---------------cut here---------------end--------------->8---
No blocking anymore.
>> Pls give my patch testing, in order to see whether Tramp still blocks.
>
> I will do that later. But 10 out of 10 successes in your testing is
> a pretty solid indication that this is on the right track.
Thanks.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 09:06:02 GMT)
Full text and
rfc822 format available.
Message #194 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
Hi João & Thomas,
I'm back home. Not fully recovered yet, but able to sit at may desk :-)
>>> Pls give my patch testing, in order to see whether Tramp still blocks.
>>
>> I will do that later. But 10 out of 10 successes in your testing is
>> a pretty solid indication that this is on the right track.
No further comments, so I guess it is OK for you?
I plan to install the patch next days on master, unless you oppose.
>> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 10:15:02 GMT)
Full text and
rfc822 format available.
Message #197 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Hi Michael,
gute Besserung!
I'm sorry, I didn't test your patch yet. I didn't have the focus time to do so. But mainly it does not look good to me.
Even if our tests would indicate no problem, we couldn't possibly test for all side effects to all elisp code that works with processes. For me the conservative approach is to remove the JUST-THIS-ONE argument.
I digged into the history of the JUST-THIS-ONE argument. etc/NEWS.22 says, emphasis from me:
"""
*** Function 'accept-process-output' has a new optional fourth arg
JUST-THIS-ONE. If non-nil, only output from the specified process
is handled, suspending output from other processes. If value is an
integer, also inhibit running timers. THIS FEATURE IS GENERALLY NOT
RECOMMENDED, but may be necessary for specific applications, such as
speech synthesis.
"""
The argument was discussed here:
https://lists.gnu.org/archive/html/emacs-devel/2004-08/msg00141.html
and introduced in this commit:
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=107ed38d4bdec03002b2a23619e205722cd5b8d1
I don't even think that the original motivation for introducing JUST-THIS-ONE was valid. Unfortunately there was not much discussion about it. It was argued, that it would be hard to make a process filter function reentrant. And I think that this was an invalid root cause analysis to start with.
First, the emacs manual says[1]: "Note that if any of those functions are called by the filter, the filter may be called recursively." - So you should make your filter reentrant, If I understand correctly.
[1] https://www.gnu.org/software/emacs/manual/html_node/elisp/Filter-Functions.html
A process filter function
Second, the manual further says: "Quitting is normally inhibited within a filter function". This indicates to me, that a filter function should be "side effect free" besides putting its input somewhere (e.g. in a buffer or message queue) and trigger an event if there is enough input for further processing. This also reduces the risk, that the function could be called recursively in a damaging way.
It seems to me, that there is not yet a standard way in Emacs for continuations (or event driven programming) although the Emacs Wiki refers to the emacs-deferred library:
https://www.emacswiki.org/emacs/ConcurrentEmacs
Because there is no such library in Emacs, people either write their own code for continuations (eglot?) or do too much work in a process filter function (speechd-el in 2004 which led to JUST-THIS-ONE).
I'm sending this email now to give you a quick response but will continue to read: https://jyp.github.io/posts/elisp-cps.html
Thanks, Thomas
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 11:41:01 GMT)
Full text and
rfc822 format available.
Message #200 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
>>> I will do that later. But 10 out of 10 successes in your testing is
>>> a pretty solid indication that this is on the right track.
>
> No further comments, so I guess it is OK for you?
I think the patch goes in the right direction. The force resetting of
'timeout' to 0, which you highlighted earlier, is odd.
+ (dolist (p (delq proc (process-list)))
+ (when (tramp-file-name-equal-p
+ (process-get proc 'vector) (process-get p 'vector))
+ (setq timeout (or timeout 0))
+ (accept-process-output p 0 nil t)))
That's because callers of 'tramp-accept-process-output' who call it with
TIMEOUT=nil will now see their expectations violated depending on
conditions they do not control. You fully control these callers, so you
may know better, but I think this patch makes more sense, because it
keeps the contract.
+ (dolist (p (delq proc (process-list)))
+ (when (tramp-file-name-equal-p
+ (process-get proc 'vector) (process-get p 'vector))
+ (while (accept-process-output p 0 nil t))))
(with-current-buffer (process-buffer proc)
(let ((inhibit-read-only t)
last-coding-system-used
This is even more "conservative" than your patch, because it deviates
less from the current behaviour. I've tested it (first removing the
eglot.el workaround) with my reproduction recipe to a 100% success rate
in 6-7 experiments (whereas without it it is 0%).
> I plan to install the patch next days on master, unless you oppose.
I don't, but consider my alternative.
João
PS: And I also think that Eglot over Tramp does seem to have become
faster with ControlMaster, at least in a subjective evaluation.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 11:46:01 GMT)
Full text and
rfc822 format available.
Message #203 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
> I'm sorry, I didn't test your patch yet. I didn't have the focus time
> to do so. But mainly it does not look good to me.
> Even if our tests would indicate no problem, we couldn't possibly test
> for all side effects to all elisp code that works with processes. For
> me the conservative approach is to remove the JUST-THIS-ONE argument.
I agree that JUST-THIS-ONE would be cleaner, but I disagree that it
would be more conservative.
I can't speak for Michael, but in my view, "conservative" is a measure
of the amplitude of the change. The change that Michael and I are
proposing, despite being objectively ugly when compared to removing
JUST-THIS-ONE, is _less_ removed from the current Tramp mechanisms than
your idea. Thus, it is safer.
> I digged into the history of the JUST-THIS-ONE argument. etc/NEWS.22 says, emphasis from me:
>
> [...]
> integer, also inhibit running timers. THIS FEATURE IS GENERALLY NOT
> RECOMMENDED, but may be necessary for specific applications, such as
> speech synthesis.
> """
> [...]
> Because there is no such library in Emacs, people either write their
> own code for continuations (eglot?) or do too much work in a process
> filter function (speechd-el in 2004 which led to JUST-THIS-ONE).
I think this analysis is very interesting and valid. Not necessarily
for fixing this particular bug. You should send it to
emacs-devel <at> gnu.org so it becomes more visible to people who may not be
following the bug tracker.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 12:28:01 GMT)
Full text and
rfc822 format available.
Message #206 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Thomas Koch <thomas <at> koch.ro> writes:
> Hi Michael,
Hi Thomas,
> I'm sorry, I didn't test your patch yet. I didn't have the focus time
> to do so.
No problem. Whatever we change, it won't go into the emacs-29 branch, so
we have plenty of time.
> But mainly it does not look good to me.
>
> Even if our tests would indicate no problem, we couldn't possibly test
> for all side effects to all elisp code that works with processes. For
> me the conservative approach is to remove the JUST-THIS-ONE argument.
I believe my patch is more conservative (as João has said as well). It
changes the behavior ONLY for processes created by Tramp's
make-process. If we would remove the JUST-THIS-ONE argument, it would
change the behavior of ANY process, even for processes out of Tramp
control.
> Thanks, Thomas
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 12:45:01 GMT)
Full text and
rfc822 format available.
Message #209 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
> I think the patch goes in the right direction. The force resetting of
> 'timeout' to 0, which you highlighted earlier, is odd.
Yes. But I've tried first w/o this setting, and the recipe from Thomas
still didn't work.
> + (dolist (p (delq proc (process-list)))
> + (when (tramp-file-name-equal-p
> + (process-get proc 'vector) (process-get p 'vector))
> + (setq timeout (or timeout 0))
> + (accept-process-output p 0 nil t)))
>
> That's because callers of 'tramp-accept-process-output' who call it with
> TIMEOUT=nil will now see their expectations violated depending on
> conditions they do not control. You fully control these callers, so you
> may know better, but I think this patch makes more sense, because it
> keeps the contract.
In fact, I'm even not sure that a nil timeout is still needed. This is
something from the very beginning of Tramp. Maybe I shall review the
code whether we still need a nil timeout. Because its initial idea, wait
in accept-process-output until the remote shell prompt appears, is no
longer needed, because we call tramp-accept-process-output (and
therefore accept-process-output) in a loop. This is something to be
changed later on.
A short scan over the Tramp sources shows, that there are 12 calls of
tramp-accept-process-output with a TIMEOUT of 0, and only two other
calls with a nil TIMEOUT. And these two calls are already in while loop,
so I guess it would be also OK to give them a default TIMEOUT of 0.
I will try it, but this needs lot of regression tests (I have 100+
different configurations to run tramp-tests.el).
> + (dolist (p (delq proc (process-list)))
> + (when (tramp-file-name-equal-p
> + (process-get proc 'vector) (process-get p 'vector))
> + (while (accept-process-output p 0 nil t))))
> (with-current-buffer (process-buffer proc)
> (let ((inhibit-read-only t)
> last-coding-system-used
>
> This is even more "conservative" than your patch, because it deviates
> less from the current behaviour. I've tested it (first removing the
> eglot.el workaround) with my reproduction recipe to a 100% success rate
> in 6-7 experiments (whereas without it it is 0%).
Hmm, in my experience there were still one or two blocking tests wit
Thomas' recipe, IIRC. This is because the blocking situation, data is
going through the only ssh socket for both tramp-accept-process-output
and jsonrpc, is not unblocked then.
So I would prefer to keep this (setq timeout (or timeout 0)) for
now. And I will reviw the code, whether tramp-accept-process-output
still needs a TIMEOUT ARGUMENT as sketched above.
>> I plan to install the patch next days on master, unless you oppose.
>
> I don't, but consider my alternative.
I consider it by removing the TIMEOUT argument, if possible.
> João
>
> PS: And I also think that Eglot over Tramp does seem to have become
> faster with ControlMaster, at least in a subjective evaluation.
Of course. That is the reason behind using it.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 14:00:02 GMT)
Full text and
rfc822 format available.
Message #212 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> Hmm, in my experience there were still one or two blocking tests wit
> Thomas' recipe, IIRC.
That's surprising to me. What tests are you running? I'm testing with
the same "Thomas recipe" AFAICT.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 11 Mar 2023 14:26:02 GMT)
Full text and
rfc822 format available.
Message #215 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> Hmm, in my experience there were still one or two blocking tests wit
>> Thomas' recipe, IIRC.
>
> That's surprising to me. What tests are you running? I'm testing with
> the same "Thomas recipe" AFAICT.
Me too.
Anyway, I've removed the TIMEOUT argument from t-a-p-o (defaulting it to
0 now), and I've started the Tramp regression test campaign. It takes 2
or 3 days usually, so let's see what's the outcome.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 12 Mar 2023 00:47:02 GMT)
Full text and
rfc822 format available.
Message #218 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> Anyway, I've removed the TIMEOUT argument from t-a-p-o (defaulting it to
> 0 now), and I've started the Tramp regression test campaign. It takes 2
> or 3 days usually, so let's see what's the outcome.
OK, sounds reasonable. We're definitely in a better place now that the
problem is understood. After you push the patch, are you also going to
bump the Tramp version?
I'd like to remove the Tramp-specific workaround in Eglot, but I don't
know if I can: Eglot would have to "Package-Require" a certain version
of Tramp. Normally, :core packages should only require other :core
things. But Tramp is not a :core GNU ELPA package. At least according
to elpa.git/elpa-packages, it lives in its own repo. Wait, but Tramp is
_also_ in emacs.git. Huh? So maybe it's possible? I'm confused.
Stefan, can you help?
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sun, 12 Mar 2023 10:28:02 GMT)
Full text and
rfc822 format available.
Message #221 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> Anyway, I've removed the TIMEOUT argument from t-a-p-o (defaulting it to
>> 0 now), and I've started the Tramp regression test campaign. It takes 2
>> or 3 days usually, so let's see what's the outcome.
>
> OK, sounds reasonable. We're definitely in a better place now that the
> problem is understood. After you push the patch, are you also going to
> bump the Tramp version?
Yes, this patch is also marked for the next Tramp release on GNU
ELPA. This will be Tramp 2.6.0.3, roughly scheduled for end of March.
FTR, the regression tests have shown problems with older Emacsen, Emacs
26 and 27, to say. So I need to fix this somehow, before the regression
tests will continue.
To tell the truth, the regression tests use the upcoming 2.6.0.3
version. Once everything is OK, it will be merged into the Emacs master
branch.
> I'd like to remove the Tramp-specific workaround in Eglot, but I don't
> know if I can: Eglot would have to "Package-Require" a certain version
> of Tramp. Normally, :core packages should only require other :core
> things. But Tramp is not a :core GNU ELPA package. At least according
> to elpa.git/elpa-packages, it lives in its own repo. Wait, but Tramp is
> _also_ in emacs.git. Huh? So maybe it's possible? I'm confused.
> Stefan, can you help?
I'm not Stefan, but perhaps I could clarify. The Tramp version released
on GNU ELPA doesn't correspond to the Tramp in Emacs master. This is
mainly because I want to release a stable version on GNU ELPA. People
with a need for the bleeding edge Tramp can always use Emacs master, or
pull Tramp from its own repository on savannah.
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 14 Mar 2023 11:02:01 GMT)
Full text and
rfc822 format available.
Message #224 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
Hi João & Thomas,
>>> Anyway, I've removed the TIMEOUT argument from t-a-p-o (defaulting it to
>>> 0 now), and I've started the Tramp regression test campaign. It takes 2
>>> or 3 days usually, so let's see what's the outcome.
>>
>> OK, sounds reasonable. We're definitely in a better place now that the
>> problem is understood. After you push the patch, are you also going to
>> bump the Tramp version?
>
> Yes, this patch is also marked for the next Tramp release on GNU
> ELPA. This will be Tramp 2.6.0.3, roughly scheduled for end of March.
>
> FTR, the regression tests have shown problems with older Emacsen, Emacs
> 26 and 27, to say. So I need to fix this somehow, before the regression
> tests will continue.
>
> To tell the truth, the regression tests use the upcoming 2.6.0.3
> version. Once everything is OK, it will be merged into the Emacs master
> branch.
Regression tests are finished, and there were indeed some minor
annoyances I had to fix. I have pushed everything to Emacs master
branch. If you like, you could test it with eglot.el w/o the Tramp fix.
As said, release of Tramp 2.6.0.3 on GNU ELPA will follow end of
March. This gives us also the time to see whether there are collateral
damages, reported by somebody. Until then I guess the bug might still be
open.
>> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 14 Mar 2023 15:01:02 GMT)
Full text and
rfc822 format available.
Message #227 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
Hi João & Thomas,
> As said, release of Tramp 2.6.0.3 on GNU ELPA will follow end of
> March. This gives us also the time to see whether there are collateral
> damages, reported by somebody. Until then I guess the bug might still be
> open.
FTR, the first collateral damage is reported already on EMBA:
--8<---------------cut here---------------start------------->8---
FAILED file-notify-test04-autorevert-remote "Forbidden reentrant call of Tramp"
FAILED file-notify-test11-symlinks-remote ((should (file-notify--test-with-actions-check actions)) :form (file-notify--test-with-actions-check (nil)) :value nil :explanation "Received actions do not match expected actions\n(deleted stopped)\nnil")
--8<---------------cut here---------------end--------------->8---
Achhhhhh :-(
I will investigate. IMHO, this is the final indication that we shouldn't
allow a nil JUST-THIS-ONE in accept-process-output.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 14 Mar 2023 15:19:02 GMT)
Full text and
rfc822 format available.
Message #230 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Tue, Mar 14, 2023 at 3:00 PM Michael Albinus <michael.albinus <at> gmx.de> wrote:
> Achhhhhh :-(
Pech!
> I will investigate. IMHO, this is the final indication that we shouldn't
> allow a nil JUST-THIS-ONE in accept-process-output.
I don't see this failure as an indication of that, at all!!
First, you didn't do that -- put nil in JUST-THIS-ONE -- did you?
At most this means that you are doing more or less the same
as setting JUST-THIS-ONE to nil, but in a much more convoluted
way, so that you may as well start setting JUST-THIS-ONE to nil...
...and bite the bullet, continuing this analysis. We're making
progress...
So, I reach a completely different conclusion.
For me, this means that those auto-revert and symlinks-remote tests,
whatever they are, must be investigated. It's good that you have these
tests! Do they fail in all Emacsen or just some? Always/sometimes?
Can I reproduce this here?
After we get a consistent failure, we should ask ourselves what exactly
is the reentrancy in question here. Perhaps you already have a conjecture
of the chain of events that leads to it.
FWIW I've recently solved a reentrancy bug in jsonrpc.el.
I identified the conditions (here, i was ignoring that process-send-string
when given enough input to send, will cause output acceptance, and
filters to run), and fixed it.
João
commit 8bf4cdcf79bc3254a9169f28f68922ab83bd4e78
Author: João Távora Date: Thu Dec 15 15:26:13 2022 +0000
Avoid recursive process filters in lisp/jsonrpc.el (bug#60088)
jsonrpc.el may lose JSON-RPC messages because of recursive process
filters. The problem happens in jsonrpc.el's jsonrpc--process-filter.
The code of the process filter didn't expect to be called recursively
and fails in that case.
But that can happen if the three conditions are verified.
1. the client code invoked by its jsonrpc--connection-receive inside
the process filter callee immediately sends follow-up input to
process within the same Lisp stack. This is a common scenario,
especially during LSP initialiation sequence used by Eglot, a
jsonrpc.el client.
2. that follow-up message is large enough for process-send-string to
send the input in bunches (output from processes can arrive in
between bunches).
3. the process happens to have already some more output ready
The fix in this commit detects recursive invocations and immediately
re-schedules them as non-recursive calls to the
jsonrpc--process-filter (but started from timers).
* lisp/jsonrpc.el (jsonrpc--process-filter): Rework.
(Version): Bump to 1.0.16.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Tue, 14 Mar 2023 15:43:01 GMT)
Full text and
rfc822 format available.
Message #233 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
Hi João,
>> I will investigate. IMHO, this is the final indication that we shouldn't
>> allow a nil JUST-THIS-ONE in accept-process-output.
>
> I don't see this failure as an indication of that, at all!!
>
> First, you didn't do that -- put nil in JUST-THIS-ONE -- did you?
> At most this means that you are doing more or less the same
> as setting JUST-THIS-ONE to nil, but in a much more convoluted
> way, so that you may as well start setting JUST-THIS-ONE to nil...
It is an indication, that, if some processes accept output at the same
time, and this output runs through process filters, anything can happen.
No, I didn't set JUST-THIS-ONE to nil. I just allow related processes,
belonging to the same Tramp connection, to do so. This is a case we have
under our control. When JUSt_THIS-ONE is set to nil, the same could
happen to any arbitrary process running randomly in concurrency to a
Tramp process, and this we wouldn't have under our control.
> ...and bite the bullet, continuing this analysis. We're making
> progress...
Sure, I'll do. Coincidently, the author of filenotify.el and
filenotify-tests.el is well known to us, I'll contact him :-)
> For me, this means that those auto-revert and symlinks-remote tests,
> whatever they are, must be investigated. It's good that you have these
> tests! Do they fail in all Emacsen or just some? Always/sometimes?
> Can I reproduce this here?
This happens with Emacs master (which runs on EMBA). And I could
reproduce it locally, with slightly modified error messages, by
--8<---------------cut here---------------start------------->8---
# make -C test filenotify-tests
--8<---------------cut here---------------end--------------->8---
> After we get a consistent failure, we should ask ourselves what exactly
> is the reentrancy in question here. Perhaps you already have a conjecture
> of the chain of events that leads to it.
Yes, I have. Contacting the author ...
> João
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 17:48:03 GMT)
Full text and
rfc822 format available.
Message #236 received at 61350 <at> debbugs.gnu.org (full text, mbox):
> It is an indication, that, if some processes accept output at the same
> time, and this output runs through process filters, anything can happen.
Yes, this is a fundamental problem in Emacs's handling of asynchronous
execution. We should really revisit this.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 18:06:02 GMT)
Full text and
rfc822 format available.
Message #239 received at 61350 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On Wed, Mar 15, 2023, 17:47 Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
> > It is an indication, that, if some processes accept output at the same
> > time, and this output runs through process filters, anything can happen.
>
> Yes, this is a fundamental problem in Emacs's handling of asynchronous
> execution. We should really revisit this.
>
We could start by understanding exactly what is triggering this reentrancy.
AFAIU, we don't. If have a recipe to trigger is consistently, now we should
work backwards from there.
João
>
[Message part 2 (text/html, inline)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 18:31:02 GMT)
Full text and
rfc822 format available.
Message #242 received at 61350 <at> debbugs.gnu.org (full text, mbox):
> We could start by understanding exactly what is triggering this reentrancy.
> AFAIU, we don't. If have a recipe to trigger is consistently, now we should
> work backwards from there.
I think we understand some of the problems well enough to think
about solutions.
E.g. the case where we run a process filter for process P while we
haven't finished running a previous invocation of P's process filter.
[ While this tends to occur in hard-to-debug corner cases, it also
tends to occur when Edebugging process filters. ]
Another is that `process-send-string` should be more clear about its
asynchronous behavior: I'd argue that it should *never* block, but it
should additionally provide a way to check when the send is completed.
I suspect we also have some loose ends in timer handling, where we
should provide ways to distinguish "run every N seconds", "wait
N seconds between each run", and whether re-entrancy should be allowed
when running "every N seconds".
And I seem to remember seeing cases where upon a process's exit we run
the process sentinel before the last invocation of the process filter:
we should clarify the ordering of these things (where "these things"
should also include things like `process-live-p` and `process-status`).
I haven't bumped into such problem recently, so maybe they've been
fixed, but maybe it's just because I haven't played with those
tools recently (not sure what the doc says about these things either).
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 19:43:01 GMT)
Full text and
rfc822 format available.
Message #245 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
>> We could start by understanding exactly what is triggering this reentrancy.
>> AFAIU, we don't. If have a recipe to trigger is consistently, now we should
>> work backwards from there.
>
> I think we understand some of the problems well enough to think
> about solutions.
Does that "some" include _this_ problem? Let's take this one for
example.
make -C test filenotify-tests SELECTOR=file-notify-test04-autorevert
Michael reports that this reports "Forbidden reentrant call of Tramp" on
EMBA. I've confirmed it fails locally with a different error:
(file-error "‘tramp_stat_file_attributes /tmp/file-notify-testNkTum7/file-notify-testhEtklA’ does not return a valid Lisp expression: ‘’")
Good enough, a failure is a failure is a failure.
We know this was introduced by the recent changes
54ef338ba3670415cf47fabc33a92d4904707c7e. But we don't know a lot of
other stuff
* Why the difference between Local and EMBA?
* How is this test supposed to work?
* Can we split the test to be shorter, as currently it takes between 25
second and 120 seconds to run?
* What exactly breaks here? The test fails consistently enough, that we
should be able to log/instrument/debug.
IMHO, we can be answering this in parallel with -- but ideally before --
a deeper review of Emacs's process machinery.
We could also ask other questions, of course, like: could Tramp use a
process filter? IMO these are very powerful tools and bring a
substantial degree of freedom of implementation. Currently Tramp relies
on accept-p-o and then searches the process buffer for regexps. To me,
that just seems more difficult and error-prone than process filters.
But even my own questions I would defer until after we have understood
this particular failure of file-notify-test04-autorevert on master.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 20:13:02 GMT)
Full text and
rfc822 format available.
Message #248 received at 61350 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
João Távora <joaotavora <at> gmail.com> writes:
> Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> Does that "some" include _this_ problem? Let's take this one for
> example.
>
> make -C test filenotify-tests
> SELECTOR=file-notify-test04-autorevert
Sorry, here I meant to type 'file-notify-test04-autorevert-remote'
> We know this was introduced by the recent changes
> 54ef338ba3670415cf47fabc33a92d4904707c7e.
FWIW, I've run the following experiment:
1. Reverted 54ef338ba3670415cf47fabc33a92d4904707c7e
2. Applied a simpler, more conservative, patch, discussed with Michael
before. The patch keeps the protocol for tramp-a-p-o untouched. It
simply adds a "while" to the 'a-p-o' call issued for "related"
processes.
3. Ran the filenotify test suite. All but one tests pass.
'file-notify-test04-autorevert-remote' now passes. The failure I
get, 'file-notify-test06-dir-validity-remote', is _also_ in Emacs 29
4. Removed the Tramp-specific Eglot workaround in
lisp/progmodes/eglot.el which disabled Controlmaster.
5. Ran the Eglot-over-Tramp experiment with the JDTLS LSP server
multiple times with 0% failures.
6. The two Tramp-related failures in the eglot-tests.el test suite are
_also_ fixed. Ran this multiple times, 0% failures.
It seems evident that my simpler patch puts us in a much better place
than we are now, so I propose we push them. Patches are attached.
Of course, we can continue to explore other more elegant solutions.
João
[0001-Revert-Improve-Tramp-processes-to-accept-output-over.patch (text/x-patch, inline)]
From f8ea932928fe37573d28c197c1c9d2da9a08089d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jo=C3=A3o=20T=C3=A1vora?= <joaotavora <at> gmail.com>
Date: Wed, 15 Mar 2023 19:18:00 +0000
Subject: [PATCH 1/3] Revert "Improve Tramp processes to accept output over the
same socket"
This reverts commit 54ef338ba3670415cf47fabc33a92d4904707c7e.
---
lisp/net/tramp-adb.el | 2 +-
lisp/net/tramp-gvfs.el | 2 +-
lisp/net/tramp-sh.el | 14 +-------------
lisp/net/tramp-smb.el | 6 +++---
lisp/net/tramp-sudoedit.el | 2 +-
lisp/net/tramp.el | 33 +++++++++------------------------
6 files changed, 16 insertions(+), 43 deletions(-)
diff --git a/lisp/net/tramp-adb.el b/lisp/net/tramp-adb.el
index d338201ab72..64f45e7958d 100644
--- a/lisp/net/tramp-adb.el
+++ b/lisp/net/tramp-adb.el
@@ -990,7 +990,7 @@ tramp-adb-handle-make-process
(progn
(goto-char (point-min))
(not (search-forward "\n" nil t)))
- (tramp-accept-process-output p))
+ (tramp-accept-process-output p 0))
(delete-region (point-min) (point)))
;; Provide error buffer. This shows only
;; initial error messages; messages
diff --git a/lisp/net/tramp-gvfs.el b/lisp/net/tramp-gvfs.el
index c1ad37de1d2..266724c587f 100644
--- a/lisp/net/tramp-gvfs.el
+++ b/lisp/net/tramp-gvfs.el
@@ -1469,7 +1469,7 @@ tramp-gvfs-handle-file-notify-add-watch
(set-process-sentinel p #'tramp-file-notify-process-sentinel)
;; There might be an error if the monitor is not supported.
;; Give the filter a chance to read the output.
- (while (tramp-accept-process-output p))
+ (while (tramp-accept-process-output p 0))
(unless (process-live-p p)
(tramp-error
p 'file-notify-error "Monitoring not supported for `%s'" file-name))
diff --git a/lisp/net/tramp-sh.el b/lisp/net/tramp-sh.el
index 882b79b3ee7..48ebfff6cfe 100644
--- a/lisp/net/tramp-sh.el
+++ b/lisp/net/tramp-sh.el
@@ -2424,10 +2424,6 @@ tramp-do-copy-or-rename-file-out-of-band
copy-program copy-args)))
(tramp-message v 6 "%s" (string-join (process-command p) " "))
(process-put p 'vector v)
- ;; This is neded for ssh or PuTTY based processes, and
- ;; only if the respective options are set. Perhaps,
- ;; the setting could be more fine-grained.
- (process-put p 'shared-socket t)
(process-put p 'adjust-window-size-function #'ignore)
(set-process-query-on-exit-flag p nil)
@@ -3757,10 +3753,6 @@ tramp-sh-handle-file-notify-add-watch
(string-join sequence " "))
(tramp-message v 6 "Run `%s', %S" (string-join sequence " ") p)
(process-put p 'vector v)
- ;; This is neded for ssh or PuTTY based processes, and only if
- ;; the respective options are set. Perhaps, the setting could
- ;; be more fine-grained.
- (process-put p 'shared-socket t)
;; Needed for process filter.
(process-put p 'events events)
(process-put p 'watch-name localname)
@@ -3769,7 +3761,7 @@ tramp-sh-handle-file-notify-add-watch
(set-process-sentinel p #'tramp-file-notify-process-sentinel)
;; There might be an error if the monitor is not supported.
;; Give the filter a chance to read the output.
- (while (tramp-accept-process-output p))
+ (while (tramp-accept-process-output p 0))
(unless (process-live-p p)
(tramp-error
p 'file-notify-error "Monitoring not supported for `%s'" file-name))
@@ -5121,10 +5113,6 @@ tramp-maybe-open-connection
;; Set sentinel and query flag. Initialize variables.
(set-process-sentinel p #'tramp-process-sentinel)
(process-put p 'vector vec)
- ;; This is neded for ssh or PuTTY based processes, and
- ;; only if the respective options are set. Perhaps,
- ;; the setting could be more fine-grained.
- (process-put p 'shared-socket t)
(process-put p 'adjust-window-size-function #'ignore)
(set-process-query-on-exit-flag p nil)
(setq tramp-current-connection (cons vec (current-time)))
diff --git a/lisp/net/tramp-smb.el b/lisp/net/tramp-smb.el
index bb4ab9e3057..1aa4520eeb6 100644
--- a/lisp/net/tramp-smb.el
+++ b/lisp/net/tramp-smb.el
@@ -757,7 +757,7 @@ tramp-smb-action-get-acl
"Read ACL data from connection buffer."
(unless (process-live-p proc)
;; Accept pending output.
- (while (tramp-accept-process-output proc))
+ (while (tramp-accept-process-output proc 0))
(with-current-buffer (tramp-get-connection-buffer vec)
;; There might be a hidden password prompt.
(widen)
@@ -1363,7 +1363,7 @@ tramp-smb-action-set-acl
"Set ACL data."
(unless (process-live-p proc)
;; Accept pending output.
- (while (tramp-accept-process-output proc))
+ (while (tramp-accept-process-output proc 0))
(tramp-message
vec 10 "\n%s" (tramp-get-buffer-string (tramp-get-connection-buffer vec)))
(throw 'tramp-action 'ok)))
@@ -2023,7 +2023,7 @@ tramp-smb-wait-for-output
;; Read pending output.
(while (not (re-search-forward tramp-smb-prompt nil t))
- (while (tramp-accept-process-output p))
+ (while (tramp-accept-process-output p 0))
(goto-char (point-min)))
(tramp-message vec 6 "\n%s" (buffer-string))
diff --git a/lisp/net/tramp-sudoedit.el b/lisp/net/tramp-sudoedit.el
index 3cacde2468c..abb9afc570b 100644
--- a/lisp/net/tramp-sudoedit.el
+++ b/lisp/net/tramp-sudoedit.el
@@ -692,7 +692,7 @@ tramp-sudoedit-action-sudo
"Check, whether a sudo process has finished. Remove unneeded output."
;; There might be pending output for the exit status.
(unless (process-live-p proc)
- (while (tramp-accept-process-output proc))
+ (while (tramp-accept-process-output proc 0))
;; Delete narrowed region, it would be in the way reading a Lisp form.
(goto-char (point-min))
(widen)
diff --git a/lisp/net/tramp.el b/lisp/net/tramp.el
index b6e985db6b1..47173b95bea 100644
--- a/lisp/net/tramp.el
+++ b/lisp/net/tramp.el
@@ -5087,11 +5087,6 @@ tramp-handle-make-process
;; t. See Bug#51177.
(when filter
(set-process-filter p filter))
- (process-put p 'vector v)
- ;; This is neded for ssh or PuTTY based processes, and
- ;; only if the respective options are set. Perhaps, the
- ;; setting could be more fine-grained.
- (process-put p 'shared-socket t)
(process-put p 'remote-command orig-command)
(tramp-set-connection-property p "remote-command" orig-command)
@@ -5494,7 +5489,7 @@ tramp-handle-file-notify-rm-watch
;; There might be pending output. Avoid problems with reentrant
;; call of Tramp.
(ignore-errors
- (while (tramp-accept-process-output proc)))
+ (while (tramp-accept-process-output proc 0)))
(tramp-message proc 6 "Kill %S" proc)
(delete-process proc))
@@ -5646,13 +5641,13 @@ tramp-action-process-alive
"Check, whether a process has finished."
(unless (process-live-p proc)
;; There might be pending output.
- (while (tramp-accept-process-output proc))
+ (while (tramp-accept-process-output proc 0))
(throw 'tramp-action 'process-died)))
(defun tramp-action-out-of-band (proc vec)
"Check, whether an out-of-band copy has finished."
;; There might be pending output for the exit status.
- (while (tramp-accept-process-output proc))
+ (while (tramp-accept-process-output proc 0))
(cond ((and (not (process-live-p proc))
(zerop (process-exit-status proc)))
(tramp-message vec 3 "Process has finished.")
@@ -5683,7 +5678,7 @@ tramp-process-one-action
(while (not found)
;; Reread output once all actions have been performed.
;; Obviously, the output was not complete.
- (while (tramp-accept-process-output proc))
+ (while (tramp-accept-process-output proc 0))
(setq todo actions)
(while todo
(setq item (pop todo)
@@ -5800,21 +5795,11 @@ with-tramp-locked-connection
,@body)
(tramp-flush-connection-property ,proc "locked"))))
-(defun tramp-accept-process-output (proc &optional _timeout)
+(defun tramp-accept-process-output (proc &optional timeout)
"Like `accept-process-output' for Tramp processes.
This is needed in order to hide `last-coding-system-used', which is set
for process communication also.
If the user quits via `C-g', it is propagated up to `tramp-file-name-handler'."
- (declare (advertised-calling-convention (proc) "29.2"))
- ;; There could be other processes which use the same socket for
- ;; communication. This could block the output for the current
- ;; process. Read such output first. (Bug#61350)
- (when-let (((process-get proc 'shared-socket))
- (v (process-get proc 'vector)))
- (dolist (p (delq proc (process-list)))
- (when (tramp-file-name-equal-p v (process-get p 'vector))
- (accept-process-output p 0 nil t))))
-
(with-current-buffer (process-buffer proc)
(let ((inhibit-read-only t)
last-coding-system-used
@@ -5824,10 +5809,10 @@ tramp-accept-process-output
;; JUST-THIS-ONE is set due to Bug#12145. `with-local-quit'
;; returns t in order to report success.
(if (with-local-quit
- (setq result (accept-process-output proc 0 nil t)) t)
+ (setq result (accept-process-output proc timeout nil t)) t)
(tramp-message
- proc 10 "%s %s %s\n%s"
- proc (process-status proc) result (buffer-string))
+ proc 10 "%s %s %s %s\n%s"
+ proc timeout (process-status proc) result (buffer-string))
;; Propagate quit.
(keyboard-quit)))
result)))
@@ -6840,7 +6825,7 @@ tramp-interrupt-process
(tramp-get-remote-null-device (process-get proc 'vector))))
;; Wait, until the process has disappeared. If it doesn't,
;; fall back to the default implementation.
- (while (tramp-accept-process-output proc))
+ (while (tramp-accept-process-output proc 0))
(not (process-live-p proc))))))
(add-hook 'interrupt-process-functions #'tramp-interrupt-process)
--
2.39.2
[0003-Simpler-fix-for-bug-61350.patch (text/x-patch, inline)]
From 16e93416780e416d27ed514e3d8a876405a4f4bc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jo=C3=A3o=20T=C3=A1vora?= <joaotavora <at> gmail.com>
Date: Wed, 15 Mar 2023 20:02:43 +0000
Subject: [PATCH 3/3] Simpler fix for bug#61350
* lisp/net/tramp.el (tramp-accept-process-output): Accept process
from related processes.
---
lisp/net/tramp.el | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/lisp/net/tramp.el b/lisp/net/tramp.el
index 47173b95bea..885b29f9471 100644
--- a/lisp/net/tramp.el
+++ b/lisp/net/tramp.el
@@ -5800,6 +5800,11 @@ tramp-accept-process-output
This is needed in order to hide `last-coding-system-used', which is set
for process communication also.
If the user quits via `C-g', it is propagated up to `tramp-file-name-handler'."
+ (when-let (((process-get proc 'shared-socket))
+ (v (process-get proc 'vector)))
+ (dolist (p (delq proc (process-list)))
+ (when (tramp-file-name-equal-p v (process-get p 'vector))
+ (while (accept-process-output p 0 nil t)))))
(with-current-buffer (process-buffer proc)
(let ((inhibit-read-only t)
last-coding-system-used
--
2.39.2
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 21:35:01 GMT)
Full text and
rfc822 format available.
Message #251 received at 61350 <at> debbugs.gnu.org (full text, mbox):
> @@ -990,7 +990,7 @@ tramp-adb-handle-make-process
> (progn
> (goto-char (point-min))
> (not (search-forward "\n" nil t)))
> - (tramp-accept-process-output p))
> + (tramp-accept-process-output p 0))
> (delete-region (point-min) (point)))
> ;; Provide error buffer. This shows only
> ;; initial error messages; messages
Could you explain why we need all those "0 second timeout" thingies?
They make me nervous because it really means busy waiting in a tight loop.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 21:44:02 GMT)
Full text and
rfc822 format available.
Message #254 received at 61350 <at> debbugs.gnu.org (full text, mbox):
>> I think we understand some of the problems well enough to think
>> about solutions.
> Does that "some" include _this_ problem?
Not on my side, sadly, no, but maybe on Michael's?
> We could also ask other questions, of course, like: could Tramp use a
> process filter? IMO these are very powerful tools and bring a
> substantial degree of freedom of implementation. Currently Tramp relies
> on accept-p-o and then searches the process buffer for regexps. To me,
> that just seems more difficult and error-prone than process filters.
FWIW, I believe the crux of the matter here is that process filters
receive an arbitrary chunk at a time, so we almost always need to make
them accumulate the received data into some kind of buffer-like data
structure until we have "enough" data.
IOW, most process filters need to do something like
(lambda (proc string)
(let* ((buffer (process-get proc 'my-buffer))
(new (concat buffer string)))
(if (not (had-enough-p new))
(process-put proc 'my-buffer new)
(process-put proc 'my-buffer nil)
(do-the-actual-processing new))))
Another issue for Tramp is that basically all of the API it has to
implement (for `file-name-handler-alist`) is synchronous, so that forces
it to resort to `accept-p-o` :-(
If we want Tramp to work better, we'll need to provide asynchronous
versions of most of those operations (presumably using some kind of
promise/future abstraction).
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 21:49:01 GMT)
Full text and
rfc822 format available.
Message #257 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Wed, Mar 15, 2023 at 9:43 PM Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
> IOW, most process filters need to do something like
>
> (lambda (proc string)
> (let* ((buffer (process-get proc 'my-buffer))
> (new (concat buffer string)))
> (if (not (had-enough-p new))
> (process-put proc 'my-buffer new)
> (process-put proc 'my-buffer nil)
> (do-the-actual-processing new))))
>
> Another issue for Tramp is that basically all of the API it has to
> implement (for `file-name-handler-alist`) is synchronous, so that forces
> it to resort to `accept-p-o` :-(
SLY, jsonrpc.el, and other code is synchronous as well (the completion
API is synch, as you well know). `accept-p-o` + a filter that invokes
a closure that throws out of the loop is a great way to do this.
See jsonrpc-request, for example. No JUST-THIS-ONE, and has
been working fine in many forms since Emacs 24.3 AFAIK.
> If we want Tramp to work better, we'll need to provide asynchronous
> versions of most of those operations (presumably using some kind of
> promise/future abstraction).
We could start with using a filter, just like the one you described above
Tramp doesn't use a filter, and I think that's part of the problem.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Wed, 15 Mar 2023 21:56:01 GMT)
Full text and
rfc822 format available.
Message #260 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Sorry 'bout my previous email.
> diff --git a/lisp/net/tramp.el b/lisp/net/tramp.el
> index 47173b95bea..885b29f9471 100644
> --- a/lisp/net/tramp.el
> +++ b/lisp/net/tramp.el
> @@ -5800,6 +5800,11 @@ tramp-accept-process-output
> This is needed in order to hide `last-coding-system-used', which is set
> for process communication also.
> If the user quits via `C-g', it is propagated up to `tramp-file-name-handler'."
> + (when-let (((process-get proc 'shared-socket))
> + (v (process-get proc 'vector)))
> + (dolist (p (delq proc (process-list)))
> + (when (tramp-file-name-equal-p v (process-get p 'vector))
> + (while (accept-process-output p 0 nil t)))))
> (with-current-buffer (process-buffer proc)
> (let ((inhibit-read-only t)
> last-coding-system-used
Looks OK to me (tho this `dolist` in the middle of all those
`(tramp-accept-process-output .. 0)` busy loops is definitely not the
most efficient use of resources :-).
Stefan
PS: An unrelated comment, is that `shared-socket` and `vector` should be
renamed to clarify that these are properties specific to Tramp processes
(i.e. use a `tramp-` prefix).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 16 Mar 2023 06:25:02 GMT)
Full text and
rfc822 format available.
Message #263 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On 3/15/2023 2:49 PM, João Távora wrote:
> SLY, jsonrpc.el, and other code is synchronous as well (the completion
> API is synch, as you well know). `accept-p-o` + a filter that invokes
> a closure that throws out of the loop is a great way to do this.
> See jsonrpc-request, for example. No JUST-THIS-ONE, and has
> been working fine in many forms since Emacs 24.3 AFAIK.
Assuming I understand the context here correctly, this sort of thing is
part of why Chris Wellons' emacs-aio package uses 'run-at-time' for
handling resolved promises[1]:
> If the result is ready call the callback in the next event loop turn using run-at-time. This is important because it keeps all the asynchronous components isolated from one another. They won’t see each others’ frames on the call stack, nor frames from aio. This is so important that the Promises/A+ specification is explicit about it.
While a solution specific to the problem in this bug likely doesn't
require an entire asynchrony framework, if we get one into Emacs, it
should definitely support this case. (That said, a lot of the discussion
from this is probably best left to the emacs-devel thread so as not to
cause too much distraction... :))
[1] https://nullprogram.com/blog/2019/03/10/
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 16 Mar 2023 13:27:01 GMT)
Full text and
rfc822 format available.
Message #266 received at 61350 <at> debbugs.gnu.org (full text, mbox):
>> SLY, jsonrpc.el, and other code is synchronous as well (the completion
>> API is synch, as you well know). `accept-p-o` + a filter that invokes
>> a closure that throws out of the loop is a great way to do this.
>> See jsonrpc-request, for example. No JUST-THIS-ONE, and has
>> been working fine in many forms since Emacs 24.3 AFAIK.
>
> Assuming I understand the context here correctly, this sort of thing is part
> of why Chris Wellons' emacs-aio package uses 'run-at-time' for handling
> resolved promises[1]:
>> If the result is ready call the callback in the next event loop turn using
>> run-at-time. This is important because it keeps all the asynchronous
>> components isolated from one another. They won’t see each others’ frames
>> on the call stack, nor frames from aio. This is so important that the
>> Promises/A+ specification is explicit about it.
My futur.el code does the same, indeed (tho it uses `funcall-later`
which I implemented in C to expose to ELisp the `pending_funcalls` we
already had in C).
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 16 Mar 2023 13:27:02 GMT)
Full text and
rfc822 format available.
Message #269 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Jim Porter <jporterbugs <at> gmail.com> writes:
> On 3/15/2023 2:49 PM, João Távora wrote:
>> SLY, jsonrpc.el, and other code is synchronous as well (the completion
>> API is synch, as you well know). `accept-p-o` + a filter that invokes
>> a closure that throws out of the loop is a great way to do this.
>> See jsonrpc-request, for example. No JUST-THIS-ONE, and has
>> been working fine in many forms since Emacs 24.3 AFAIK.
>
> Assuming I understand the context here correctly, this sort of thing
> is part of why Chris Wellons' emacs-aio package uses 'run-at-time' for
> handling resolved promises[1]:
There's a relation, but it's not a direct relation. emacs-aio is a
promise/future/cps library which uses timers. I was describing this
design pattern for _synch_ requests over Emacs's asynchronous processes
(set-process-filter
proc (lambda (proc output)
(internal-default-process-filter proc output)
(when (and (process-get proc 'busy) (buffer-live-p (process-buffer proc)))
(with-current-buffer (process-buffer proc)
(when (search-forward ", DAVE.\n")
(unwind-protect (throw 'done (buffer-substring (point-min) (point)))
(process-put proc 'busy nil)
(delete-region (point-min) (point))))))))
(setq answer-from-proc (catch 'done
(process-put proc 'busy t)
(process-send-string proc "OPEN THE POD BAY DOORS\n")
(while (accept-process-output))))
This is very simplified (and untested), but that's the gist. You can
then add cancellation, timeouts, etc. Also normally (in SLY and
JSONRPC) instead of a single flag 'busy, you look up a table of pending
requests
This pattern has worked very well for me, for a long time, 10+ years.
SLIME used this pattern back in 2000's, there where I saw it (and
distilled it). It's similar to what "await" does other languages.
>> If the result is ready call the callback in the next event loop turn
>> using run-at-time. This is important because it keeps all the
>> asynchronous components isolated from one another. They won’t see
>> each others’ frames on the call stack, nor frames from aio. This is
>> so important that the Promises/A+ specification is explicit about
>> it.
> While a solution specific to the problem in this bug likely doesn't
> require an entire asynchrony framework, if we get one into Emacs, it
> should definitely support this case.
Sure, I'd be happy with just using the stuff that's been in Emacs for
ages.
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 16 Mar 2023 13:27:02 GMT)
Full text and
rfc822 format available.
Message #272 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
> Sorry 'bout my previous email.
>
>> diff --git a/lisp/net/tramp.el b/lisp/net/tramp.el
>> index 47173b95bea..885b29f9471 100644
>> --- a/lisp/net/tramp.el
>> +++ b/lisp/net/tramp.el
>> @@ -5800,6 +5800,11 @@ tramp-accept-process-output
>> This is needed in order to hide `last-coding-system-used', which is set
>> for process communication also.
>> If the user quits via `C-g', it is propagated up to `tramp-file-name-handler'."
>> + (when-let (((process-get proc 'shared-socket))
>> + (v (process-get proc 'vector)))
>> + (dolist (p (delq proc (process-list)))
>> + (when (tramp-file-name-equal-p v (process-get p 'vector))
>> + (while (accept-process-output p 0 nil t)))))
>> (with-current-buffer (process-buffer proc)
>> (let ((inhibit-read-only t)
>> last-coding-system-used
>
> Looks OK to me (tho this `dolist` in the middle of all those
> `(tramp-accept-process-output .. 0)` busy loops is definitely not the
> most efficient use of resources :-).
It sure ain't pretty, but that's what is known to fix it for now.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 16 Mar 2023 15:58:02 GMT)
Full text and
rfc822 format available.
Message #275 received at 61350 <at> debbugs.gnu.org (full text, mbox):
João Távora <joaotavora <at> gmail.com> writes:
> Jim Porter <jporterbugs <at> gmail.com> writes:
> (setq answer-from-proc (catch 'done
> (process-put proc 'busy t)
> (process-send-string proc "OPEN THE POD BAY DOORS\n")
> (while (accept-process-output))))
>
> This is very simplified (and untested), but that's the gist.
The gist it is, but also very broken. As ever, vapourware sucks. This
is what that snippet should have been:
;;; hal.el --- asdasd -*- lexical-binding: t; -*-
(defvar hal-proc nil)
(defun hal-connect()
(interactive)
(when (processp hal-proc) (delete-process hal-proc))
(setq hal-proc
(make-process
:name "halproc"
:command (list (expand-file-name "hal" default-directory))
:buffer (generate-new-buffer "*halbuf*")
:filter (lambda (proc output)
(internal-default-process-filter proc output)
(when-let* ((buffer (process-buffer proc))
(callback (and (buffer-live-p buffer)
(process-get proc 'callback))))
(with-current-buffer buffer
(save-excursion
(goto-char (point-min))
(when (search-forward ", DAVE\n" nil t)
(unwind-protect (funcall callback
(buffer-substring
(point-min) (point)))
(process-put proc 'callback nil)
(delete-region (point-min)
(point)))))))))))
(defun hal-command () "Synch command to HAL that doesn't block others."
(interactive)
(message
(catch 'hal-done
(process-put hal-proc 'callback
(lambda (result) (throw 'hal-done result)))
(process-send-string hal-proc "OPEN THE POD BAY DOORS!\n")
(while (accept-process-output hal-proc)))))
You can test this with this very complex C++ program
#include <cstdlib>
#include <iostream>
#include <string>
#include <array>
int main() {
std::string line;
const char* retorts[] = {
"I'M AFRAID I CAN'T DO THAT",
"I'D RATHER NOT",
"THINK OF THE CONSEQUENCES",
"PLEASE STOP"
};
while (std::cin >> line)
std::cout << retorts[std::rand()%4] << ", DAVE" << std::endl;
}
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 16 Mar 2023 20:37:02 GMT)
Full text and
rfc822 format available.
Message #278 received at 61350 <at> debbugs.gnu.org (full text, mbox):
> The gist it is, but also very broken. As ever, vapourware sucks. This
> is what that snippet should have been:
>
> ;;; hal.el --- asdasd -*- lexical-binding: t; -*-
> (defvar hal-proc nil)
> (defun hal-connect()
> (interactive)
> (when (processp hal-proc) (delete-process hal-proc))
> (setq hal-proc
> (make-process
> :name "halproc"
> :command (list (expand-file-name "hal" default-directory))
> :buffer (generate-new-buffer "*halbuf*")
> :filter (lambda (proc output)
> (internal-default-process-filter proc output)
> (when-let* ((buffer (process-buffer proc))
> (callback (and (buffer-live-p buffer)
> (process-get proc 'callback))))
> (with-current-buffer buffer
> (save-excursion
> (goto-char (point-min))
> (when (search-forward ", DAVE\n" nil t)
> (unwind-protect (funcall callback
> (buffer-substring
> (point-min) (point)))
> (process-put proc 'callback nil)
> (delete-region (point-min)
> (point)))))))))))
>
> (defun hal-command () "Synch command to HAL that doesn't block others."
> (interactive)
> (message
> (catch 'hal-done
> (process-put hal-proc 'callback
> (lambda (result) (throw 'hal-done result)))
> (process-send-string hal-proc "OPEN THE POD BAY DOORS!\n")
> (while (accept-process-output hal-proc)))))
:-)
I think doing more of the work asynchronously (i.e. from the process's
filters/sentinels rather than after `accept-process-output`), like in
this example, is virtually always better, yes.
The use of `catch/throw` is not applicable to code which may be used
asynchronously (i.e. where we don't know that there's always a `catch`
waiting for our `throw`), so I hope we can get code to work just as
reliably without using `catch/throw`.
This mission is too important for me to allow you to jeopardize it.
Stefan
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Thu, 16 Mar 2023 22:04:01 GMT)
Full text and
rfc822 format available.
Message #281 received at 61350 <at> debbugs.gnu.org (full text, mbox):
On Thu, Mar 16, 2023 at 8:36 PM Stefan Monnier <monnier <at> iro.umontreal.ca> wrote:
> I think doing more of the work asynchronously (i.e. from the process's
> filters/sentinels rather than after `accept-process-output`), like in
> this example, is virtually always better, yes.
>
> The use of `catch/throw` is not applicable to code which may be used
> asynchronously (i.e. where we don't know that there's always a `catch`
> waiting for our `throw`), so I hope we can get code to work just as
> reliably without using `catch/throw`.
Of course, normally you want the async/sync pair of primitives:
(defun hal-async-command (callback)
"Send command, call CALLBACK."
(when (process-get hal-proc 'callback)
(error "HAL only takes one command at a time")
(process-put hal-proc 'callback callback)
(process-send-string hal-proc "OPEN THE POD BAY DOORS!\n"))
(defun hal-sync-command ()
(catch 'hal-done
(hal-async-command (lambda (r) (throw 'hal-done r))))
So what I mean is:
* when synchronous APIs are needed (that is presumably what TRAMP needs,
according to you) then *do* use catch/throw/filter/accept-process-output
* when async is needed, use callbacks.
You can mix in the upcoming emacs-io.el/futur.el/whatever frameworks,
but when processes are involved, some variation on that technique
should be used somewhere.
My main point here is that this bug's root cause is the lack
of good filter+accept-process-output design. I just think that
basic tools to achieve that design are already in place and
reasonably easy to grasp. I'm not opposed to newer tools.
And I already mentioned that the above approach can be
enhanced to include timeouts and cancellation on user input
(for the latter, sit-for is usually preferable to a-p-o).
> This mission is too important for me to allow you to jeopardize it.
:-) Sorry to interrupt the festivities, Dave
João
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Sat, 18 Mar 2023 12:35:02 GMT)
Full text and
rfc822 format available.
Message #284 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:
Hi Stefan,
> PS: An unrelated comment, is that `shared-socket` and `vector` should be
> renamed to clarify that these are properties specific to Tramp processes
> (i.e. use a `tramp-` prefix).
That's true. There are three properties which shall keep their names,
`remote-tty', `remote-pid' and `remote-command'. They are documented in
(info "(tramp) Remote processes")
All other Trmp internal properties I have prefixed with `tramp-', as
proposed. Pushed to master.
> Stefan
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Mon, 24 Apr 2023 01:45:02 GMT)
Full text and
rfc822 format available.
Message #287 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Hello all. I wanted to share that changes related to this bug,
specifically the changes in commit 54ef338 ("Improve Tramp processes
to accept output over the same socket") break a package I use:
https://github.com/jscheid/prettier.el
It seems that prettier.el calls tramp-accept-process-output with a 20s
timeout, and when that timeout is not respected it thinks that the
process has died. Here is the relevant code:
https://github.com/jscheid/prettier.el/blob/e419bb7a916c38a4a62632c49233de1523f41218/prettier.el#L1251-L1256
I don't know enough about process handling to understand if
prettier.el is doing the wrong thing, and if so how to fix it.
Thanks,
Aaron
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Fri, 05 May 2023 11:33:02 GMT)
Full text and
rfc822 format available.
Message #290 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Aaron Madlon-Kay <aaron <at> madlon-kay.com> writes:
> Hello all.
Hi,
> I wanted to share that changes related to this bug,
> specifically the changes in commit 54ef338 ("Improve Tramp processes
> to accept output over the same socket") break a package I use:
> https://github.com/jscheid/prettier.el
>
> It seems that prettier.el calls tramp-accept-process-output with a 20s
> timeout, and when that timeout is not respected it thinks that the
> process has died. Here is the relevant code:
> https://github.com/jscheid/prettier.el/blob/e419bb7a916c38a4a62632c49233de1523f41218/prettier.el#L1251-L1256
>
> I don't know enough about process handling to understand if
> prettier.el is doing the wrong thing, and if so how to fix it.
tramp-accept-process-output is not a public function, it could change
internally in an incompatible way. So you might contact the Tramp team
on the respective mailing list in order to check what we could do with prettier.el.
> Thanks,
> Aaron
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Fri, 05 May 2023 13:19:01 GMT)
Full text and
rfc822 format available.
Message #293 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> tramp-accept-process-output is not a public function, it could change
> internally in an incompatible way.
I'm confused -- I don't see "private" in its name (no double dash), nor
did its docstring say anything about it. If this function is in fact
private, is there any documentation on this topic?
--
Best,
RY
[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#61350
; Package
emacs
.
(Fri, 05 May 2023 14:54:01 GMT)
Full text and
rfc822 format available.
Message #296 received at 61350 <at> debbugs.gnu.org (full text, mbox):
Ruijie Yu <ruijie <at> netyu.xyz> writes:
Hi,
>> tramp-accept-process-output is not a public function, it could change
>> internally in an incompatible way.
>
> I'm confused -- I don't see "private" in its name (no double dash), nor
> did its docstring say anything about it. If this function is in fact
> private, is there any documentation on this topic?
The interface to be used for Tramp are the functions described in the
Elisp manual at (info "(elisp) Magic File Names")
All Tramp functions should be regarded as internal, except some very few
commands described the Tramp manual. In (info "(tramp) External packages")
there is
--8<---------------cut here---------------start------------->8---
8.2 Integrating with external Lisp packages
===========================================
In general, it is not recommended to use TRAMP functions and variables
not described in this manual. They might change their signature
and/or semantics without any announcement.
--8<---------------cut here---------------end--------------->8---
Best regards, Michael.
This bug report was last modified 1 year and 253 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.