GNU bug report logs -
#33018
26.1.50; thread starvation with async processes and accept-process-output
Previous Next
To reply to this bug, email your comments to 33018 AT debbugs.gnu.org.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Thu, 11 Oct 2018 14:59:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
"Basil L. Contovounesios" <contovob <at> tcd.ie>
:
New bug report received and forwarded. Copy sent to
bug-gnu-emacs <at> gnu.org
.
(Thu, 11 Oct 2018 14:59:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[test.el (application/emacs-lisp, attachment)]
[Message part 2 (text/plain, inline)]
I attach a sample program test.el whose central function, test-slave,
invokes wget asynchronously before waiting for the process to exit.
The issue I'm facing is that running test-slave twice in succession,
each time in a new thread, causes accept-process-output to hang with no
output (unless a timeout argument is given, in which case the function
returns nil) the second time around.
When this happens, the process sentinel is never called, which is why
I'm assuming accept-process-output is indeed "hanging" in some sense and
it's not just that the process has already exited and so has no further
output.
I could very well be doing or assuming something incorrectly, but what
baffles me is that the "hang" does not occur either when Emacs is run
non-interactively, or when "https://en.wikipedia.org/wiki/Emacs" is
replaced with "https://www.gnu.org/software/emacs/", or when test-slave
is run in the current thread (and not in make-thread).
Since I can reliably reproduce this on both an optimised build of master
and a non-optimised build of emacs-26, I hope to be able to provide
further insights using gdb as time allows. Please let me know if there
are any specific details/output you would like me to provide. As a
relatively inexperienced gdb user I welcome any tips and tricks for
debugging threads and processes.
Here are some ways test.el can be run to illustrate the issue:
# All five processes exit successfully.
emacs -batch -l test.el -f test-no-threads
# All five processes (and threads) exit successfully.
emacs -batch -l test.el -f test-threads
# All five processes exit successfully.
emacs -Q -l test.el -f test-no-threads
# First process and thread exit successfully,
# but accept-process-output starts timing out in second thread.
# Warning: may leave empty wget-log files lying around.
emacs -Q -l test.el -f test-threads
Details of the two Emacs versions I'm using follow:
In GNU Emacs 26.1.50 (build 2, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
of 2018-10-11 built on thunk
Repository revision: a7ebc6bf633bd3849ccab032dad6b1fd31b1ef43
Windowing system distributor 'The X.Org Foundation', version 11.0.12001000
System Description: Debian GNU/Linux testing (buster)
Configured using:
'configure 'CC=ccache gcc' 'CFLAGS=-O0 -g3 -ggdb -gdwarf-4 -pipe'
--config-cache --prefix=/home/blc/.local --program-suffix=26
--enable-checking=yes,glyphs --enable-check-lisp-object-type
--with-mailutils --with-x-toolkit=lucid --with-modules
--with-file-notification=yes --with-x'
Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT
ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD
LCMS2
In GNU Emacs 27.0.50 (build 21, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
of 2018-10-11 built on thunk
Repository revision: 5bd8cfc14d4b0c78c07e65a583f42a10c4cbc06d
Windowing system distributor 'The X.Org Foundation', version 11.0.12001000
System Description: Debian GNU/Linux buster/sid
Configured using:
'configure --config-cache --prefix=/home/blc/.local --with-mailutils
--with-x-toolkit=lucid --with-modules --with-file-notification=yes
--with-x 'CC=ccache gcc' 'CFLAGS=-O2 -march=native -pipe''
Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT
ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD
JSON LCMS2 GMP
Thanks,
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Fri, 12 Oct 2018 08:08:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Date: Thu, 11 Oct 2018 15:57:50 +0100
>
> I attach a sample program test.el whose central function, test-slave,
> invokes wget asynchronously before waiting for the process to exit.
>
> The issue I'm facing is that running test-slave twice in succession,
> each time in a new thread, causes accept-process-output to hang with no
> output (unless a timeout argument is given, in which case the function
> returns nil) the second time around.
When the hang happens, is there any wget process still alive, or did
they all exit? Please use OS tools to find that out, don't rely on
what Emacs thinks.
> I could very well be doing or assuming something incorrectly, but what
> baffles me is that the "hang" does not occur either when Emacs is run
> non-interactively, or when "https://en.wikipedia.org/wiki/Emacs" is
> replaced with "https://www.gnu.org/software/emacs/"
Could be different properties of the servers related to async
connections, like TLS handshake or even async getaddrinfo.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Fri, 12 Oct 2018 12:03:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 33018 <at> debbugs.gnu.org (full text, mbox):
"Basil L. Contovounesios" <contovob <at> tcd.ie> writes:
Hi Basil,
> I attach a sample program test.el whose central function, test-slave,
> invokes wget asynchronously before waiting for the process to exit.
>
> The issue I'm facing is that running test-slave twice in succession,
> each time in a new thread, causes accept-process-output to hang with no
> output (unless a timeout argument is given, in which case the function
> returns nil) the second time around.
If you want a process to communicate in a given thread, you must call
`set-process-thread'. See the elisp manual.
> Thanks,
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Fri, 12 Oct 2018 12:44:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> From: Michael Albinus <michael.albinus <at> gmx.de>
> Date: Fri, 12 Oct 2018 14:02:46 +0200
> Cc: 33018 <at> debbugs.gnu.org
>
> If you want a process to communicate in a given thread, you must call
> `set-process-thread'. See the elisp manual.
But the default is that the process is locked to the thread that
created it, so it sounds like this should have just worked (if that is
the problem).
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Fri, 12 Oct 2018 12:50:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 33018 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> If you want a process to communicate in a given thread, you must call
>> `set-process-thread'. See the elisp manual.
>
> But the default is that the process is locked to the thread that
> created it, so it sounds like this should have just worked (if that is
> the problem).
I'm not sure. In the branch feature/tramp-thread-safe there were also
mysterious blockings in accept-process-output, until I've applied
set-process-thread explicitly.
At least it is worth a try.
Best regards, Michael.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Sun, 14 Oct 2018 15:02:01 GMT)
Full text and
rfc822 format available.
Message #20 received at 33018 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Date: Thu, 11 Oct 2018 15:57:50 +0100
>>
>> I attach a sample program test.el whose central function, test-slave,
>> invokes wget asynchronously before waiting for the process to exit.
>>
>> The issue I'm facing is that running test-slave twice in succession,
>> each time in a new thread, causes accept-process-output to hang with no
>> output (unless a timeout argument is given, in which case the function
>> returns nil) the second time around.
>
> When the hang happens, is there any wget process still alive, or did
> they all exit? Please use OS tools to find that out, don't rely on
> what Emacs thinks.
When the hang happens, the wget process launched by the waiting thread
is still alive but asleep (idle), as reported by ps and top.
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Sun, 14 Oct 2018 15:18:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: <33018 <at> debbugs.gnu.org>
> Date: Sun, 14 Oct 2018 16:00:56 +0100
>
> > When the hang happens, is there any wget process still alive, or did
> > they all exit? Please use OS tools to find that out, don't rely on
> > what Emacs thinks.
>
> When the hang happens, the wget process launched by the waiting thread
> is still alive but asleep (idle), as reported by ps and top.
Any idea why is that?
And if this is the situation, doesn't it explain why
accept-process-output times out?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Sun, 14 Oct 2018 15:18:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 33018 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Michael Albinus <michael.albinus <at> gmx.de> writes:
> "Basil L. Contovounesios" <contovob <at> tcd.ie> writes:
>
>> I attach a sample program test.el whose central function, test-slave,
>> invokes wget asynchronously before waiting for the process to exit.
>>
>> The issue I'm facing is that running test-slave twice in succession,
>> each time in a new thread, causes accept-process-output to hang with no
>> output (unless a timeout argument is given, in which case the function
>> returns nil) the second time around.
>
> If you want a process to communicate in a given thread, you must call
> `set-process-thread'. See the elisp manual.
Thanks, this is the first thing I tried when earlier experiments started
to hang. I tried both the following redundant but explicit call:
[test.diff (text/x-diff, inline)]
diff -u --label /tmp/test.el --label \#\<buffer\ /tmp/test.el\> /tmp/test.el /tmp/buffer-content-hY0EDf
--- /tmp/test.el
+++ #<buffer /tmp/test.el>
@@ -24,6 +24,7 @@
:command '("wget" "-qO-" "https://en.wikipedia.org/wiki/Emacs")
:connection-type 'pipe
:sentinel #'test-sentinel)))
+ (set-process-thread proc (current-thread))
(while (eq (process-status proc) 'run)
(test-debug proc 'accept-output (accept-process-output proc 5)))
(test-debug proc 'exit (process-status proc) (process-exit-status proc))))
Diff finished. Sun Oct 14 16:04:48 2018
[Message part 3 (text/plain, inline)]
as well as replacing (current-thread) with nil, to unlock the process.
Adding test-debug calls before and after accept-process-output revealed
nothing out of the ordinary, and explicitly un/locking the process
didn't fix the hang. Should I be calling set-process-thread elsewhere?
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Sun, 14 Oct 2018 15:37:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 33018 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Cc: <33018 <at> debbugs.gnu.org>
>> Date: Sun, 14 Oct 2018 16:00:56 +0100
>>
>> > When the hang happens, is there any wget process still alive, or did
>> > they all exit? Please use OS tools to find that out, don't rely on
>> > what Emacs thinks.
>>
>> When the hang happens, the wget process launched by the waiting thread
>> is still alive but asleep (idle), as reported by ps and top.
>
> Any idea why is that?
No idea, sorry. I haven't yet found the time to look into this deeper.
> And if this is the situation, doesn't it explain why
> accept-process-output times out?
Possibly, but the question is why do we enter either of these situations
(process sleeping and accept-process-output timing out) exclusively when
using Emacs threads, no?
By the way, the problem isn't specific to wget; the same thing happens
with curl. I also find it strange that the first thread doesn't suffer
from the same problem as the second thread, especially given each thread
is created only after the last thread exited.
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Mon, 15 Oct 2018 08:04:02 GMT)
Full text and
rfc822 format available.
Message #32 received at 33018 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
"Basil L. Contovounesios" <contovob <at> tcd.ie> writes:
Hi Basil,
>> If you want a process to communicate in a given thread, you must call
>> `set-process-thread'. See the elisp manual.
>
> Thanks, this is the first thing I tried when earlier experiments started
> to hang. I tried both the following redundant but explicit call:
Well, I've played with your example. As Eli said, `set-process-thread'
is not needed here.
With your original example, I could reproduce the problem, However, if I
call
emacs -l test.el -f test-threads
the problem does NOT happen. My .emacs is quite long, so I didn't bisect
in order to find out what makes the difference.
I have changed your example a little bit wrt `thread-join', see
appended. This version runs w/o any problem even if emacs is called with
-Q. Maybe this helps you to debug further.
Best regards, Michael.
[Message part 2 (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Tue, 16 Oct 2018 01:16:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 33018 <at> debbugs.gnu.org (full text, mbox):
[test.el (application/emacs-lisp, attachment)]
[Message part 2 (text/plain, inline)]
Michael Albinus <michael.albinus <at> gmx.de> writes:
> Well, I've played with your example. As Eli said, `set-process-thread'
> is not needed here.
>
> With your original example, I could reproduce the problem, However, if I
> call
>
> emacs -l test.el -f test-threads
>
> the problem does NOT happen. My .emacs is quite long, so I didn't bisect
> in order to find out what makes the difference.
>
> I have changed your example a little bit wrt `thread-join', see
> appended. This version runs w/o any problem even if emacs is called with
> -Q. Maybe this helps you to debug further.
Thanks, creating all threads before waiting for any of them to exit
indeed does not suffer from the same hang. Doing this twice (see
attached update), however, still hangs.
There's something about going through a complete create-join cycle more
than once within a non-main-thread which is triggering this behaviour.
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Tue, 16 Oct 2018 13:55:02 GMT)
Full text and
rfc822 format available.
Message #38 received at 33018 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
"Basil L. Contovounesios" <contovob <at> tcd.ie> writes:
> There's something about going through a complete create-join cycle more
> than once within a non-main-thread which is triggering this behaviour.
I'm not sure that it is related to threads. It looks, like some of your
processes do not exit properly, and then thread-join is blocked.
I've modified your example, again. It runs perfectly. And during its
work, you could call "M-x list-threads" and see how the threads are
created and die.
Best regards, Michael.
[Message part 2 (text/plain, attachment)]
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Tue, 16 Oct 2018 14:56:01 GMT)
Full text and
rfc822 format available.
Message #41 received at 33018 <at> debbugs.gnu.org (full text, mbox):
Michael Albinus <michael.albinus <at> gmx.de> writes:
> "Basil L. Contovounesios" <contovob <at> tcd.ie> writes:
>
>> There's something about going through a complete create-join cycle more
>> than once within a non-main-thread which is triggering this behaviour.
>
> I'm not sure that it is related to threads.
It has to be, because there is never an issue when I run the same
asynchronous wget processes without threads, and with threads the hang
reliably occurs 100% of the time.
> It looks, like some of your processes do not exit properly, and then
> thread-join is blocked.
Indeed, but there is something about the interaction of Emacs threads
and subprocesses which is causing unsuccessful process termination.
Note that I am not ruling out pilot error; I simply haven't debugged
this issue any deeper yet. The fact that no-one has yet pointed out any
obvious blunders on my part gives me more confidence that there is
indeed some ghost in the wire.
> I've modified your example, again. It runs perfectly.
Indeed, there are many subprocess-within-a-thread examples which don't
suffer from a hang, e.g. by using a different URL. I would like to get
to the bottom of why network programs like wget/curl in particular
eventually hang, though.
Thanks,
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Tue, 16 Oct 2018 14:59:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Date: Tue, 16 Oct 2018 02:15:27 +0100
> Cc: 33018 <at> debbugs.gnu.org
>
> Thanks, creating all threads before waiting for any of them to exit
> indeed does not suffer from the same hang. Doing this twice (see
> attached update), however, still hangs.
>
> There's something about going through a complete create-join cycle more
> than once within a non-main-thread which is triggering this behaviour.
Can you attach a debugger to the wget process that's stuck, and see
where it is stuck? You will probably need to rebuild wget with debug
info, or install one from your package repository (if they offer
such). This could give us hints for where to look further.
Thanks.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Wed, 17 Oct 2018 17:38:01 GMT)
Full text and
rfc822 format available.
Message #47 received at 33018 <at> debbugs.gnu.org (full text, mbox):
[gdb.txt (text/plain, attachment)]
[Message part 2 (text/plain, inline)]
Eli Zaretskii <eliz <at> gnu.org> writes:
> Can you attach a debugger to the wget process that's stuck, and see
> where it is stuck? You will probably need to rebuild wget with debug
> info, or install one from your package repository (if they offer
> such). This could give us hints for where to look further.
I did the following:
0. Build wget 1.19.5 from Debian's repositories:
apt-get build-dep wget
apt-get source wget
configure CC='ccache gcc' CFLAGS='-O0 -g3 -ggdb -gdwarf-4 -pipe'
--config-cache --enable-assert --with-gnu-ld
make
1. Substitute resulting wget file name in original test.el program
2. emacs26 -Q -l test.el -f test-threads
3. gdb -p <pid of stuck wget>
4. set logging on
5. bt
I attach the resulting gdb.txt log file.
Thanks,
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Wed, 17 Oct 2018 17:57:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: <michael.albinus <at> gmx.de>, <33018 <at> debbugs.gnu.org>
> Date: Wed, 17 Oct 2018 18:37:00 +0100
>
> #0 0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096)
> at ../sysdeps/unix/sysv/linux/write.c:27
> #1 0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>,
> data=0x5555559c1b30, n=4096) at fileops.c:1203
> #2 0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>,
> data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., to_do=to_do <at> entry=4096) at fileops.c:457
> #3 0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>,
> to_do=4096) at fileops.c:433
> #4 0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>)
> at fileops.c:813
> #5 0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>) at iofflush.c:40
> #6 0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0,
> buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is designed for manipulating pieces of text, although it is capable of formatting and print"..., bufsize=4096, skip=0x7fffffffd0e8,
> written=0x7fffffffd0e0) at retr.c:207
> #7 0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4,
> out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20,
> qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498
Looks like the buffer of the pipe through which Emacs reads the stuff
is full, and wget waits for some space there?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Wed, 17 Oct 2018 18:07:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 33018 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Cc: <michael.albinus <at> gmx.de>, <33018 <at> debbugs.gnu.org>
>> Date: Wed, 17 Oct 2018 18:37:00 +0100
>>
>> #0 0x00007ffff766a2a4 in __GI___libc_write (fd=1, buf=0x5555559c1b30, nbytes=4096)
>> at ../sysdeps/unix/sysv/linux/write.c:27
>> #1 0x00007ffff75fb56d in _IO_new_file_write (f=0x7ffff7739760 <_IO_2_1_stdout_>,
>> data=0x5555559c1b30, n=4096) at fileops.c:1203
>> #2 0x00007ffff75fa88f in new_do_write (fp=0x7ffff7739760 <_IO_2_1_stdout_>,
>> data=0x5555559c1b30 "pan></span></h2>\n<p>Emacs is primarily a <a
>> href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is
>> designed for manipulating pieces of text, although it is capable of formatting
>> and print"..., to_do=to_do <at> entry=4096) at fileops.c:457
>> #3 0x00007ffff75fc6f9 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>,
>> to_do=4096) at fileops.c:433
>> #4 0x00007ffff75fa6d8 in _IO_new_file_sync (fp=0x7ffff7739760 <_IO_2_1_stdout_>)
>> at fileops.c:813
>> #5 0x00007ffff75ef6ed in __GI__IO_fflush (fp=0x7ffff7739760 <_IO_2_1_stdout_>)
>> at iofflush.c:40
>> #6 0x00005555555918bb in write_data (out=0x7ffff7739760 <_IO_2_1_stdout_>, out2=0x0,
>> buf=0x5555559a9590 "pan></span></h2>\n<p>Emacs is primarily a <a
>> href=\"/wiki/Text_editor\" title=\"Text editor\">text editor</a> and is
>> designed for manipulating pieces of text, although it is capable of formatting
>> and print"..., bufsize=4096, skip=0x7fffffffd0e8,
>> written=0x7fffffffd0e0) at retr.c:207
>> #7 0x000055555559204f in fd_read_body (downloaded_filename=0x5555555e18a0 "-", fd=4,
>> out=0x7ffff7739760 <_IO_2_1_stdout_>, toread=198224, startpos=0, qtyread=0x7fffffffda20,
>> qtywritten=0x7fffffffd9d0, elapsed=0x7fffffffda28, flags=1, out2=0x0) at retr.c:498
>
> Looks like the buffer of the pipe through which Emacs reads the stuff
> is full, and wget waits for some space there?
Would that imply that different threads/processes are (re)using the same
buffer/pipe?
FWIW, strace -p <pip of stuck emacs> gives:
pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8}
and strace -p <pip of stuck wget> gives:
write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096
The former reminded me of bug#24201: https://debbugs.gnu.org/24201
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Wed, 17 Oct 2018 18:21:01 GMT)
Full text and
rfc822 format available.
Message #56 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: <michael.albinus <at> gmx.de>, <33018 <at> debbugs.gnu.org>
> Date: Wed, 17 Oct 2018 19:05:59 +0100
>
> > Looks like the buffer of the pipe through which Emacs reads the stuff
> > is full, and wget waits for some space there?
>
> Would that imply that different threads/processes are (re)using the same
> buffer/pipe?
Could be, but it's more likely that Emacs simply doesn't read the
output from wget.
> pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8}
>
> and strace -p <pip of stuck wget> gives:
>
> write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096
>
> The former reminded me of bug#24201: https://debbugs.gnu.org/24201
I could be wrong, but it doesn't look similar to me. In that bug, the
CPU was pegged, whereas you said that CPU is idle.
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Wed, 17 Oct 2018 18:26:01 GMT)
Full text and
rfc822 format available.
Message #59 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> Date: Wed, 17 Oct 2018 21:20:22 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: michael.albinus <at> gmx.de, 33018 <at> debbugs.gnu.org
>
> > From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> > Cc: <michael.albinus <at> gmx.de>, <33018 <at> debbugs.gnu.org>
> > Date: Wed, 17 Oct 2018 19:05:59 +0100
> >
> > > Looks like the buffer of the pipe through which Emacs reads the stuff
> > > is full, and wget waits for some space there?
> >
> > Would that imply that different threads/processes are (re)using the same
> > buffer/pipe?
>
> Could be, but it's more likely that Emacs simply doesn't read the
> output from wget.
I think the relevant code should be instrumented to show which thread
waits for what process(es).
Btw, are you sure this is not a bug in your program? Michael caused
your program to work twice by simple changes, AFAIU, no?
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Wed, 17 Oct 2018 20:04:02 GMT)
Full text and
rfc822 format available.
Message #62 received at 33018 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Cc: <michael.albinus <at> gmx.de>, <33018 <at> debbugs.gnu.org>
>> Date: Wed, 17 Oct 2018 19:05:59 +0100
>>
>> pselect6(14, [6 7], [], NULL, {tv_sec=99975, tv_nsec=320947003}, {NULL, 8}
>>
>> and strace -p <pip of stuck wget> gives:
>>
>> write(1, ">]</span></span></h2>\n<p>Emacs i"..., 4096
>>
>> The former reminded me of bug#24201: https://debbugs.gnu.org/24201
>
> I could be wrong, but it doesn't look similar to me. In that bug, the
> CPU was pegged, whereas you said that CPU is idle.
Right, I was only reminded of that bug because of the common
accept-process-output/pselect hang in the context of a network-related
process. I'm not suggesting the underlying cause is the same.
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Wed, 17 Oct 2018 20:47:02 GMT)
Full text and
rfc822 format available.
Message #65 received at 33018 <at> debbugs.gnu.org (full text, mbox):
Eli Zaretskii <eliz <at> gnu.org> writes:
>> Date: Wed, 17 Oct 2018 21:20:22 +0300
>> From: Eli Zaretskii <eliz <at> gnu.org>
>> Cc: michael.albinus <at> gmx.de, 33018 <at> debbugs.gnu.org
>>
>> > From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> > Cc: <michael.albinus <at> gmx.de>, <33018 <at> debbugs.gnu.org>
>> > Date: Wed, 17 Oct 2018 19:05:59 +0100
>> >
>> > > Looks like the buffer of the pipe through which Emacs reads the stuff
>> > > is full, and wget waits for some space there?
>> >
>> > Would that imply that different threads/processes are (re)using the same
>> > buffer/pipe?
>>
>> Could be, but it's more likely that Emacs simply doesn't read the
>> output from wget.
>
> I think the relevant code should be instrumented to show which thread
> waits for what process(es).
Each thread launches a single wget process which it then waits for
before dying, and current-thread is always eq to that thread around
calls to accept-process-output. Or are you talking about some other
type of thread?
Either way, I'll report back when I've had a deeper look into what Emacs
is doing, unless someone beats me to it.
> Btw, are you sure this is not a bug in your program?
No, but the fact that it reliably works when Emacs is run with -batch,
and reliably hangs when run with -Q is at least somewhat intriguing.
> Michael caused your program to work twice by simple changes, AFAIU,
> no?
Michael avoided the hang by rewriting the create-join-...-create-join
sequence as create-create-...-join-join. But if the latter is done
twice from within the same master thread, the hang still occurs. As I
said:
> There's something about going through a complete create-join cycle more
> than once within a non-main-thread which is triggering this behaviour.
(Actually, I haven't checked whether the hang occurs when two
create-join cycles are completed within main-thread; I was just
specifically describing my sample program.)
In his second rewrite, Michael replaced wget with echo, which does not
suffer from any hangs. As I said:
> Indeed, there are many subprocess-within-a-thread examples which don't
> suffer from a hang, e.g. by using a different URL. I would like to get
> to the bottom of why network programs like wget/curl in particular
> eventually hang, though.
In other words, I don't (yet) see why my recipe shouldn't work, and I'm
curious to eventually get to the bottom of this.
I'm sorry I've been talking more than doing, but university and house
hunting will dominate my free time for the next few weeks.
Thanks,
--
Basil
Information forwarded
to
bug-gnu-emacs <at> gnu.org
:
bug#33018
; Package
emacs
.
(Sat, 20 Oct 2018 08:36:01 GMT)
Full text and
rfc822 format available.
Message #68 received at 33018 <at> debbugs.gnu.org (full text, mbox):
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: <michael.albinus <at> gmx.de>, <33018 <at> debbugs.gnu.org>
> Date: Wed, 17 Oct 2018 21:46:39 +0100
>
> > I think the relevant code should be instrumented to show which thread
> > waits for what process(es).
>
> Each thread launches a single wget process which it then waits for
> before dying, and current-thread is always eq to that thread around
> calls to accept-process-output. Or are you talking about some other
> type of thread?
I was talking about low-level details: we set up the file-descriptor
mask passed to pselect, to tell it which descriptors to wait on. The
call to accept-process-output is supposed to arrange for the
descriptor where the corresponding process will write to be one of
those on which the corresponding pselect will wait. I was thinking
that perhaps we become confused and don't ask pselect called by a
thread to wait on the process which was launched by that thread.
> Either way, I'll report back when I've had a deeper look into what Emacs
> is doing, unless someone beats me to it.
Thanks.
> I'm sorry I've been talking more than doing, but university and house
> hunting will dominate my free time for the next few weeks.
No need to be sorry, we all have our lives.
This bug report was last modified 6 years and 191 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.