GNU bug report logs - #33198
27.0.50; emacs_abort on EBADF during accept-process-output in non-main thread

Previous Next

Package: emacs;

Reported by: Gemini Lasswell <gazally <at> runbox.com>

Date: Mon, 29 Oct 2018 22:12:01 UTC

Severity: normal

Found in version 27.0.50

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33198 in the body.
You can then email your comments to 33198 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#33198; Package emacs. (Mon, 29 Oct 2018 22:12:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Gemini Lasswell <gazally <at> runbox.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 29 Oct 2018 22:12:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Gemini Lasswell <gazally <at> runbox.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50;
 emacs_abort on EBADF during accept-process-output in non-main thread
Date: Mon, 29 Oct 2018 15:10:39 -0700
[Message part 1 (text/plain, inline)]
I've hit the emacs_abort at line 5510 in process.c a few times in the
last week.  I haven't found a way to make it reproduce on demand.  I
tried to narrow the code it's happening in down to a smaller test case,
without success.  I'd appreciate suggestions for how to track down what
is going wrong.

I'm working on a Lisp program which has work to do which can be done in
parallel, and I'm implementing it using threads.  My code has 4 worker
threads which pick jobs to do off of a queue (which is made thread-safe
with a mutex and condition variables).  The jobs consist of an argument
to a shell script, which the threads run asynchronously using
start-file-process and accept-process-output.  This allows the worker
threads to be responsive to a user command to cancel the work in
progress, although I haven't been using that cancel command when the bug
happens.  When it has happened, it's been after I run a command which
adds 6 jobs to the queue for the 4 threads to process.

The crash has happened with two different shell scripts, one which just
consists of "exit 1" and another which makes a directory and a symlink.
Neither script prints anything to standard output.

I've tried using the process object instead of nil as the first argument
to accept-process-output and have seen the same crash both ways.

Here are the two main functions in my worker threads,
'erb--builder-func' which is passed to 'make-thread' to create the
threads, and 'erb--build' which runs the child processes.

[erb-build.el (text/plain, attachment)]
[Message part 3 (text/plain, inline)]
Thread 6 "ERB control" hit Breakpoint 1, terminate_due_to_signal (
    sig=sig <at> entry=6, backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:369
369	{
(gdb) bt
#0  terminate_due_to_signal (sig=sig <at> entry=6,
    backtrace_limit=backtrace_limit <at> entry=40) at emacs.c:369
#1  0x0000000000511a23 in emacs_abort () at sysdep.c:2429
#2  0x00000000005b68c1 in wait_reading_process_output (
    time_limit=<optimized out>, nsecs=<optimized out>, read_kbd=read_kbd <at> entry=0,
    do_display=do_display <at> entry=false, wait_for_cell=wait_for_cell <at> entry=XIL(0),
    wait_proc=<optimized out>, just_wait_proc=0) at process.c:5510
#3  0x00000000005b6eea in Faccept_process_output (process=XIL(0),
    seconds=<optimized out>, millisec=<optimized out>, just_this_one=XIL(0))
    at process.c:4677
#4  0x000000000056e815 in Ffuncall (nargs=3, args=args <at> entry=0x7fffd366d360)
    at eval.c:2856
#5  0x00000000005aa740 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=nargs <at> entry=1, args=<optimized out>,
    args <at> entry=0x15dede8 <bss_sbrk_buffer+10104552>) at bytecode.c:632
#6  0x0000000000571416 in funcall_lambda (fun=XIL(0x7fffd366d360),
    nargs=nargs <at> entry=1, arg_vector=0x15dede8 <bss_sbrk_buffer+10104552>,
    arg_vector <at> entry=0x7fffd366d600) at eval.c:3057
#7  0x000000000056e793 in Ffuncall (nargs=2, args=args <at> entry=0x7fffd366d5f8)
    at eval.c:2870
#8  0x00000000005aa740 in exec_byte_code (bytestr=<optimized out>,
    vector=<optimized out>, maxdepth=<optimized out>,
    args_template=<optimized out>, nargs=nargs <at> entry=0, args=<optimized out>,
    args <at> entry=0x15deca8 <bss_sbrk_buffer+10104232>) at bytecode.c:632
#9  0x0000000000571416 in funcall_lambda (fun=XIL(0x7fffd366d5f8),
    nargs=nargs <at> entry=0, arg_vector=0x15deca8 <bss_sbrk_buffer+10104232>,
    arg_vector <at> entry=0x1423c58 <bss_sbrk_buffer+8289624>) at eval.c:3057
#10 0x000000000056e793 in Ffuncall (nargs=nargs <at> entry=1,
    args=args <at> entry=0x1423c50 <bss_sbrk_buffer+8289616>) at eval.c:2870
#11 0x00000000005d425b in invoke_thread_function () at thread.c:684
#12 0x000000000056d9ef in internal_condition_case (
    bfun=bfun <at> entry=0x5d4220 <invoke_thread_function>,
    handlers=handlers <at> entry=XIL(0xc3c0),
    hfun=hfun <at> entry=0x5d3ae0 <record_thread_error>) at eval.c:1373
#13 0x00000000005d414b in run_thread (state=0x1423c30 <bss_sbrk_buffer+8289584>)
    at thread.c:723
#14 0x00007ffff15a65a7 in start_thread ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libpthread.so.0
#15 0x00007ffff0c4122f in clone ()
   from /nix/store/fg4yq8i8wd08xg3fy58l6q73cjy8hjr2-glibc-2.27/lib/libc.so.6

Lisp Backtrace:
"accept-process-output" (0xd366d368)
"erb--build" (0xd366d600)
"erb--builder-func" (0x1423c58)


In GNU Emacs 27.0.50 (build 8, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
 of 2018-10-28 built on sockeye
Repository revision: f7638edcb06fac3b58b986062ea679f6919d81d7
Windowing system distributor 'The X.Org Foundation', version 11.0.11906000
System Description: NixOS 18.09.git.ad56635 (Jellyfish)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --prefix=/home/gem/src/emacs/master/bin --with-modules
 --with-x-toolkit=gtk3 --with-xft --config-cache'

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND DBUS GSETTINGS GLIB NOTIFY LIBSELINUX
GNUTLS LIBXML2 FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS GTK3 X11 XDBE XIM
MODULES THREADS GMP

Important settings:
  value of $EMACSLOADPATH:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv dired dired-loaddefs format-spec rfc822 mml
easymenu mml-sec password-cache epa derived epg epg-config gnus-util
rmail rmail-loaddefs time-date mm-decode mm-bodies mm-encode mail-parse
rfc2231 mailabbrev gmm-utils mailheader cl-loaddefs cl-lib sendmail
rfc2047 rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair
mule-util tooltip eldoc electric uniquify ediff-hook vc-hooks
lisp-float-type mwheel term/x-win x-win term/common-win x-dnd tool-bar
dnd fontset image regexp-opt fringe tabulated-list replace newcomment
text-mode elisp-mode lisp-mode prog-mode register page menu-bar
rfn-eshadow isearch timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote threads dbusbind
inotify dynamic-setting system-font-setting font-render-setting
move-toolbar gtk x-toolkit x multi-tty make-network-process emacs)

Memory information:
((conses 16 95415 9749)
 (symbols 48 20031 1)
 (strings 32 28349 1783)
 (string-bytes 1 753921)
 (vectors 16 14931)
 (vector-slots 8 508718 9684)
 (floats 8 47 70)
 (intervals 56 209 0)
 (buffers 992 11))

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33198; Package emacs. (Tue, 30 Oct 2018 06:51:01 GMT) Full text and rfc822 format available.

Message #8 received at 33198 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Gemini Lasswell <gazally <at> runbox.com>
Cc: 33198 <at> debbugs.gnu.org
Subject: Re: bug#33198: 27.0.50;
 emacs_abort on EBADF during accept-process-output in non-main thread
Date: Tue, 30 Oct 2018 08:50:19 +0200
> From: Gemini Lasswell <gazally <at> runbox.com>
> Date: Mon, 29 Oct 2018 15:10:39 -0700
> 
> I've hit the emacs_abort at line 5510 in process.c a few times in the
> last week.  I haven't found a way to make it reproduce on demand.  I
> tried to narrow the code it's happening in down to a smaller test case,
> without success.  I'd appreciate suggestions for how to track down what
> is going wrong.

I suggest to instrument the code that determines which thread will
listen to what descriptors in its pselect call.  This happens inside
compute_input_wait_mask and compute_non_keyboard_wait_mask, and the
data those use is set by several add_*_fd functions.  The
instrumentation should output the descriptor, the thread ID, and what
is it used for.  Then I think you will be able to see where did the
bad descriptor come from, and how it happened to be bad.

You will also need to determine which descriptor is the bad one; the
usual paradigm to do that is by calling 'fcntl (fd, F_GETFD)' on each
descriptor on which pselect was asked to wait, and see which ones
return -1 with erno = EBADFD.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#33198; Package emacs. (Tue, 02 Feb 2021 15:03:01 GMT) Full text and rfc822 format available.

Message #11 received at 33198 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Gemini Lasswell <gazally <at> runbox.com>, 33198 <at> debbugs.gnu.org
Subject: Re: bug#33198: 27.0.50; emacs_abort on EBADF during
 accept-process-output in non-main thread
Date: Tue, 02 Feb 2021 16:02:37 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> You will also need to determine which descriptor is the bad one; the
> usual paradigm to do that is by calling 'fcntl (fd, F_GETFD)' on each
> descriptor on which pselect was asked to wait, and see which ones
> return -1 with erno = EBADFD.

This was two years ago, so I'm guessing there's little chance of there
being any progress with this crash, and I'm closing this bug report.  If
this is a problem that persists, please respond to the debbugs address
and we'll reopen.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug closed, send any further explanations to 33198 <at> debbugs.gnu.org and Gemini Lasswell <gazally <at> runbox.com> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Tue, 02 Feb 2021 15:03:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 03 Mar 2021 12:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 26 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.