GNU bug report logs - #32338
26.1; term.el broken on macOS

Previous Next

Package: emacs;

Reported by: Constantine Vetoshev <vetoshev <at> gmail.com>

Date: Tue, 31 Jul 2018 19:30:02 UTC

Severity: normal

Found in version 26.1

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 32338 in the body.
You can then email your comments to 32338 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Tue, 31 Jul 2018 19:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Constantine Vetoshev <vetoshev <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 31 Jul 2018 19:30:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Constantine Vetoshev <vetoshev <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.1; term.el broken on macOS
Date: Tue, 31 Jul 2018 15:29:21 -0400
M-x term nor M-x ansi-term both cause Emacs to lock up on macOS
10.13.4. Steps to reproduce with -Q:

1. M-x ansi-term
2. Type in /bin/bash, hit Enter

You'll notice that a prompt does not appear in the resulting window
(*ansi-term* buffer). Hit Enter. Emacs will display "Writing to process:
Input/output error, *ansi-term*" in the minibuffer. If you started Emacs
from the terminal, it will display "Fatal error 4: Illegal
instruction". After this, Emacs becomes unstable and will eventually
stop responding to input. All this worked fine in 25.3 and earlier
(i.e., a shell prompt appears and can be readily used).


In GNU Emacs 26.1 (build 1, x86_64-apple-darwin17.5.0, NS
appkit-1561.40 Version 10.13.4 (Build 17E202))
 of 2018-05-29 built on athena
Windowing system distributor 'Apple', version 10.3.1561
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured using:
 'configure --with-ns'

Configured features:
NOTIFY ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS THREADS

Important settings:
  value of $LC_COLLATE: C
  value of $LC_CTYPE: en_US.UTF-8
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv cl-loaddefs cl-lib dired dired-loaddefs
format-spec rfc822 mml easymenu mml-sec password-cache epa derived epg
epg-config gnus-util rmail rmail-loaddefs mm-decode mm-bodies mm-encode
mail-parse rfc2231 mailabbrev gmm-utils mailheader sendmail rfc2047
rfc2045 ietf-drums mm-util mail-prsvr mail-utils elec-pair time-date
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel term/ns-win ns-win ucs-normalize mule-util term/common-win
tool-bar dnd fontset image regexp-opt fringe tabulated-list replace
newcomment text-mode elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow isearch timer select scroll-bar mouse jit-lock
font-lock syntax facemenu font-core term/tty-colors frame cl-generic
cham georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese composite charscript charprop
case-table epa-hook jka-cmpr-hook help simple abbrev obarray minibuffer
cl-preloaded nadvice loaddefs button faces cus-face macroexp files
text-properties overlay sha1 md5 base64 format env code-pages mule
custom widget hashtable-print-readable backquote kqueue cocoa ns
multi-tty make-network-process emacs)

Memory information:
((conses 16 203885 9219)
 (symbols 48 20051 1)
 (miscs 40 43 146)
G (strings 32 28780 1228)
 (string-bytes 1 760308)
 (vectors 16 34997)
 (vector-slots 8 713320 17782)
 (floats 8 48 68)
 (intervals 56 202 0)
 (buffers 992 11))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Wed, 19 Sep 2018 23:31:01 GMT) Full text and rfc822 format available.

Message #8 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Wed, 19 Sep 2018 19:30:41 -0400
Constantine Vetoshev <vetoshev <at> gmail.com> writes:

> M-x term nor M-x ansi-term both cause Emacs to lock up on macOS
> 10.13.4. Steps to reproduce with -Q:
>
> 1. M-x ansi-term
> 2. Type in /bin/bash, hit Enter
>
> You'll notice that a prompt does not appear in the resulting window
> (*ansi-term* buffer). Hit Enter. Emacs will display "Writing to process:
> Input/output error, *ansi-term*" in the minibuffer. If you started Emacs
> from the terminal, it will display "Fatal error 4: Illegal
> instruction".

Do you mean Emacs itself is hitting this error and crashing, or
/bin/bash running inside *ansi-term* does?  Can you run under a debugger
and get a backtrace?

> After this, Emacs becomes unstable and will eventually
> stop responding to input. All this worked fine in 25.3 and earlier
> (i.e., a shell prompt appears and can be readily used).

Does it currently work fine on 25.3 (i.e., is it possible an OS update
caused this to stop working on earlier Emacs versions as well)?  If you
load term.el from 25.3 into 26.1 does it work then?

> In GNU Emacs 26.1 (build 1, x86_64-apple-darwin17.5.0, NS
> appkit-1561.40 Version 10.13.4 (Build 17E202))
>  of 2018-05-29 built on athena
> Windowing system distributor 'Apple', version 10.3.1561




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Thu, 20 Sep 2018 14:16:01 GMT) Full text and rfc822 format available.

Message #11 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Constantine Vetoshev <vetoshev <at> gmail.com>
To: npostavs <at> gmail.com
Cc: 32338 <at> debbugs.gnu.org
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Thu, 20 Sep 2018 07:15:17 -0700
[Message part 1 (text/plain, inline)]
On Wed, Sep 19, 2018 at 4:30 PM Noam Postavsky <npostavs <at> gmail.com> wrote:
> Do you mean Emacs itself is hitting this error and crashing, or
> /bin/bash running inside *ansi-term* does?  Can you run under a debugger
> and get a backtrace?

The error occurs inside Emacs. It does not matter which process I run
under term, they all do this. I did a little more digging, and
observed the following:

1. Right after executing /bin/bash, the Emacs process forks, which
makes sense. However, immediately afterwards, both processes slam the
CPU to nearly 100% (60% for the parent and 30% for the child).

2. When I Ctrl-C (sigint) the main process, it exits with this error message:

2018-09-20 07:01:57.542 Emacs[80069:65713609] *** -[NSAutoreleasePool
release]: This pool has already been released, do not drain it (double
release).

3. The child process does not respond to sigint, continues to consume
heavy CPU, and requires a sigkill to terminate.

4. I am attaching DTrace files for the two processes. I took two
samples of the main (parent) process, then a sample of the child
process, and then a sample of the child process after sending the main
process a sigint.

5. When I run Emacs 26.1 under lldb, everything works! No crashes, no
error messages. So I can't provide a crash-time stack trace, at least
not with this build.

> Does it currently work fine on 25.3 (i.e., is it possible an OS update
> caused this to stop working on earlier Emacs versions as well)?  If you
> load term.el from 25.3 into 26.1 does it work then?

Everything works perfectly on 25.3. I had to revert to using 25.3
because of this problem. Same OS version.

After copying term.el from 25.3 into 26.1 (and deleting term.elc),
26.1 still exhibits the bug.

Something must have changed in the macOS-specific pieces of Emacs
between 25.3 and 26.1 which started causing this crash. It's bizarre
that the problem goes away when run under a debugger environment.
(Sometimes I've seen this happen with a race condition or other
contention which goes away when the program is slowed down a
little...)
[dtrace-second-process-sample-1.txt (text/plain, attachment)]
[dtrace-second-process-sample-after-sigint.txt (text/plain, attachment)]
[dtrace-main-process-sample-2.txt (text/plain, attachment)]
[dtrace-main-process-sample-1.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Fri, 21 Sep 2018 00:36:01 GMT) Full text and rfc822 format available.

Message #14 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Thu, 20 Sep 2018 20:35:10 -0400
[Message part 1 (text/plain, inline)]
Constantine Vetoshev <vetoshev <at> gmail.com> writes:

> 5. When I run Emacs 26.1 under lldb, everything works! No crashes, no
> error messages.

Ugh, one of those.

> After copying term.el from 25.3 into 26.1 (and deleting term.elc),
> 26.1 still exhibits the bug.
>
> Something must have changed in the macOS-specific pieces of Emacs
> between 25.3 and 26.1 which started causing this crash.

Okay, I'm going to guess it's the change to use vfork.  Here's a patch
which should undo that change (I can't test it on macOS, and it's just
assembled from git log --grep=vfork, so I may have missed something).
Try it out and see if it avoids the crash.

[0001-Revert-vfork-for-Darwin-changes.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sat, 22 Sep 2018 18:13:01 GMT) Full text and rfc822 format available.

Message #17 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Constantine Vetoshev <vetoshev <at> gmail.com>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sat, 22 Sep 2018 11:12:15 -0700
On Thu, Sep 20, 2018 at 5:35 PM Noam Postavsky <npostavs <at> gmail.com> wrote:
> Okay, I'm going to guess it's the change to use vfork.  Here's a patch
> which should undo that change (I can't test it on macOS, and it's just
> assembled from git log --grep=vfork, so I may have missed something).
> Try it out and see if it avoids the crash.

Thanks! I just applied the patch to the 26.1 release source tree (it
applied cleanly) and rebuilt Emacs. This build still hangs, but
there's one difference: it no longer prints "Fatal error 11:
Segmentation fault" immediately after forking. When I sigint the Emacs
process after the hang, it still prints the NSAutoreleasePool error
message, and it still requires a sigkill to avoid eating CPU.

Is there anything else I should try to help track this down before
going down the road of running 'git bisect'?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sat, 22 Sep 2018 22:15:01 GMT) Full text and rfc822 format available.

Message #20 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sat, 22 Sep 2018 18:14:25 -0400
Constantine Vetoshev <vetoshev <at> gmail.com> writes:

> On Thu, Sep 20, 2018 at 5:35 PM Noam Postavsky <npostavs <at> gmail.com> wrote:

> Is there anything else I should try to help track this down before
> going down the road of running 'git bisect'?

Hmm, I can't think of anything else.  I think my guess was wrong, but
I'm not even sure.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sun, 23 Sep 2018 05:43:01 GMT) Full text and rfc822 format available.

Message #23 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org, vetoshev <at> gmail.com
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sun, 23 Sep 2018 08:42:22 +0300
> From: Noam Postavsky <npostavs <at> gmail.com>
> Date: Sat, 22 Sep 2018 18:14:25 -0400
> Cc: 32338 <at> debbugs.gnu.org
> 
> Constantine Vetoshev <vetoshev <at> gmail.com> writes:
> 
> > On Thu, Sep 20, 2018 at 5:35 PM Noam Postavsky <npostavs <at> gmail.com> wrote:
> 
> > Is there anything else I should try to help track this down before
> > going down the road of running 'git bisect'?
> 
> Hmm, I can't think of anything else.  I think my guess was wrong, but
> I'm not even sure.

Thanks for your efforts.  I hope one of the NS experts will chime in
soon.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sat, 29 Sep 2018 23:53:02 GMT) Full text and rfc822 format available.

Message #26 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Constantine Vetoshev <vetoshev <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 32338 <at> debbugs.gnu.org, Noam Postavsky <npostavs <at> gmail.com>
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sat, 29 Sep 2018 16:52:40 -0700
On Sat, Sep 22, 2018 at 10:42 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> Thanks for your efforts.  I hope one of the NS experts will chime in
> soon.

After bisecting between 25.3 and 26.1, I tracked down the breaking
commit, 4cdd14eabe5a6121691daa2d9c5e814c5f53f3e5.

It seems like it was supposed to only impact Windows. I was so
surprised that I checked twice to confirm. The breaking change was to
src/emacs.c, in the type change from long to rlim_t (two places).
Reverting just that one file's change fixes the crash. I
double-checked by applying the relevant patch to the 26.1 release
source tree, and that fixed it.

Checking my (macOS) system, rlim_t seems to be typedefed to
__uint64_t, which definitely makes a difference, though I admit the
nature of the crash is rather mystifying compared to what has actually
changed. Eli, since you were working on this, any ideas about the
right approach to fixing the problem other than blindly reverting the
change?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sun, 30 Sep 2018 06:01:01 GMT) Full text and rfc822 format available.

Message #29 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org, npostavs <at> gmail.com
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sun, 30 Sep 2018 08:59:57 +0300
> From: Constantine Vetoshev <vetoshev <at> gmail.com>
> Date: Sat, 29 Sep 2018 16:52:40 -0700
> Cc: Noam Postavsky <npostavs <at> gmail.com>, 32338 <at> debbugs.gnu.org
> 
> On Sat, Sep 22, 2018 at 10:42 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > Thanks for your efforts.  I hope one of the NS experts will chime in
> > soon.
> 
> After bisecting between 25.3 and 26.1, I tracked down the breaking
> commit, 4cdd14eabe5a6121691daa2d9c5e814c5f53f3e5.

Amazing!  Thank you for your efforts.

> Checking my (macOS) system, rlim_t seems to be typedefed to
> __uint64_t, which definitely makes a difference, though I admit the
> nature of the crash is rather mystifying compared to what has actually
> changed. Eli, since you were working on this, any ideas about the
> right approach to fixing the problem

Please show the definition of 'struct rlimit' on your system.  It
should be in the header sys/resource.h, I think.  Then perhaps I will
have some insight.

> other than blindly reverting the change?

Reverting the change is out of the question, sorry.  The 'long' data
type used there before the change is too narrow to support some of the
systems which use this code.  We will have to find another solution,
if indeed this is the problem.  But first we need to understand the
problem.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sun, 30 Sep 2018 09:17:01 GMT) Full text and rfc822 format available.

Message #32 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Noam Postavsky <npostavs <at> gmail.com>
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sun, 30 Sep 2018 10:16:30 +0100
On Sat, Sep 29, 2018 at 04:52:40PM -0700, Constantine Vetoshev wrote:
> On Sat, Sep 22, 2018 at 10:42 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > Thanks for your efforts.  I hope one of the NS experts will chime in
> > soon.
> 
> After bisecting between 25.3 and 26.1, I tracked down the breaking
> commit, 4cdd14eabe5a6121691daa2d9c5e814c5f53f3e5.

I can’t reproduce this (although I haven’t checked with the actual
release version of Emacs 26 as I don’t have a copy lying around).

I’m on 10.13.6, are you still running 10.13.4?
-- 
Alan Third




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sun, 30 Sep 2018 15:26:01 GMT) Full text and rfc822 format available.

Message #35 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Constantine Vetoshev <vetoshev <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>, alan <at> idiocy.org
Cc: 32338 <at> debbugs.gnu.org, Noam Postavsky <npostavs <at> gmail.com>
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sun, 30 Sep 2018 08:25:20 -0700
On Sat, Sep 29, 2018 at 11:00 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> Please show the definition of 'struct rlimit' on your system.  It
> should be in the header sys/resource.h, I think.  Then perhaps I will
> have some insight.

Here it is:

struct rlimit {
    rlim_t    rlim_cur;        /* current (soft) limit */
    rlim_t    rlim_max;        /* maximum value for rlim_cur */
};

On Sun, Sep 30, 2018 at 2:16 AM Alan Third <alan <at> idiocy.org> wrote:
> I can’t reproduce this (although I haven’t checked with the actual
> release version of Emacs 26 as I don’t have a copy lying around).
>
> I’m on 10.13.6, are you still running 10.13.4?

I'm on 10.13.6. Early-2015 MBP with a Core i7 and 16GB RAM. I just
rebooted the machine just to see if maybe it's a transient problem,
but no dice.

Did you definitely run the binary as
/path/to/Emacs.app/Contents/MacOS/Emacs -Q
? I can't reproduce the crash under a debugger, but it happens 100% of
the time otherwise. That, and the error message

2018-09-30 08:19:22.656 Emacs[1407:25736] *** -[NSAutoreleasePool
release]: This pool has already been released, do not drain it (double
release)

make me think there's a race condition in a memory release somewhere
that doesn't get hit when the system runs a little slower.

It might be helpful to compare with 10.14/Mojave, but I'm holding off
installing it until 10.14.1 comes out.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Sun, 30 Sep 2018 17:15:02 GMT) Full text and rfc822 format available.

Message #38 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org, alan <at> idiocy.org, npostavs <at> gmail.com
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Sun, 30 Sep 2018 20:13:43 +0300
> From: Constantine Vetoshev <vetoshev <at> gmail.com>
> Date: Sun, 30 Sep 2018 08:25:20 -0700
> Cc: Noam Postavsky <npostavs <at> gmail.com>, 32338 <at> debbugs.gnu.org
> 
> On Sat, Sep 29, 2018 at 11:00 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > Please show the definition of 'struct rlimit' on your system.  It
> > should be in the header sys/resource.h, I think.  Then perhaps I will
> > have some insight.
> 
> Here it is:
> 
> struct rlimit {
>     rlim_t    rlim_cur;        /* current (soft) limit */
>     rlim_t    rlim_max;        /* maximum value for rlim_cur */
> };

Then please step with a debugger through the code starting from the
call to getrlimit, and please show the values of related variables,
such as newlim, all the way until the call to setrlimit and the
computed value of emacs_re_safe_alloca.  Please do that once with the
current code and then once again with the code before the offending
commit.  I'd like to see the differences, because I meanwhile see
nothing wrong with using rlim_t here.

Than ks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Tue, 02 Oct 2018 22:52:01 GMT) Full text and rfc822 format available.

Message #41 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Constantine Vetoshev <vetoshev <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 32338 <at> debbugs.gnu.org, Alan Third <alan <at> idiocy.org>,
 Noam Postavsky <npostavs <at> gmail.com>
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Tue, 2 Oct 2018 15:51:11 -0700
[Message part 1 (text/plain, inline)]
On Sun, Sep 30, 2018 at 10:14 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> Then please step with a debugger through the code starting from the
> call to getrlimit, and please show the values of related variables,
> such as newlim, all the way until the call to setrlimit and the
> computed value of emacs_re_safe_alloca.  Please do that once with the
> current code and then once again with the code before the offending
> commit.  I'd like to see the differences, because I meanwhile see
> nothing wrong with using rlim_t here.

One change from my past reports: after compiling Emacs with -g flags,
I have now managed to reproduce the crash under lldb, including
attaching to the forked process which eats CPU after the crash.
Backtrace from that process is attached.

Here are my results from stepping through the code. Note this all runs
at Emacs startup, long before anything forks.

The highlights (as far as I noticed) are:
- emacs_re_max_failures and the older re_max_failures are not
initialized at this point
- in the working branch, newlim is reset to rlim.rlim_max; in the
broken branch, it is not
- in the working branch, setrlimit does not get called; in the broken
branch, it does

I'm guessing the problem is with the uninitialized values for
*_re_max_failures and the resulting values being assigned to lim and
newlim. It seems to only work on the working branch by accident
because, for whatever reason, newlim always gets reset to
rlim.rlim_max and setrlimit doesn't get called.

-----
master branch (commit 3eedabaef37e), use of rlim_t:

- immediately after getrlimit call, lim is assigned, value: 0
- lim is then assigned rlim.rlim_cur, value: 67104768
- min_ratio is initialized, value: 160
- ratio is initialized, value: 213
- try_to_grow_stack ends up assigned, value: true

The code proceeds into the try_to_grow_stack condition:

- newlim is assigned, value: 10020000
- BUT: emacs_re_max_failures defined at that point and used to
calculate newlim has a very large size_t value: 6500256977556508423
- looks like newlim has overflown here to fit unsigned long long
- pagesize is assigned, value 4096
- newlim is decremented, value: 10024095
- condition checking if rlim.rlim_max < newlim; rlim.rlim_max is
67104768 so the condition evaluates to false (emacs.c:880)
- condition checking if pagesize <= (newlim - lim) evaluates to true:
this happens because (newlim < rlim), and the subtraction causes an
overflow (newlim - lim returns an unsigned long long with value
18446744073652469760); consequently, setrlimit is called and succeeds

The try_to_grow_stack condition ends.

- emacs_re_safe_alloca is assigned, value: 4435280473597425792. I'm
not sure if that's a reasonable value for a value of type ptrdiff_t.
-----

-----
last working revision (commit 6cdd1c333034b), use of long:

Please note that this code predates the introduction of emacs_re_safe_alloca.

- immediately after getrlimit call, lim is assigned, value: 0
- lim then is assigned rlim.rlim_curr, value: 67104768
- ratio is then initialized: 160
- and subsequently incremented, value: 213
- try_to_grow_stack ends up assigned, value: true

The code proceeds into the try_to_grow_stack condition:

- newlim is assigned, value: 67104578
- BUT: re_max_failures defined at that point and used to calculate
newlim has a very large size_t value: 16107485546189635934
- newlim has obviously overflown here to fit a signed long
- pagesize is assigned, value 4096
- newlim is decremented, value: 67108673
- condition checking if rlim.rlim_max < newlim; rlim.rlim_max is
67104768 so the condition evaluates to true and newlim is set to
rlim.rlim_max (emacs.c:862)
- newlim decrement by newlim % pagesize is a noop
- condition checking if pagesize <= (newlim - lim) evaluates to false,
skipping the setrlimit call
-----

I am attaching lldb session transcripts for both runs in case you want
to look more closely at what's going on.
[fork-process-crash-backtrace.txt (text/plain, attachment)]
[lldb-session-working.txt (text/plain, attachment)]
[lldb-session-broken.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Wed, 03 Oct 2018 15:00:02 GMT) Full text and rfc822 format available.

Message #44 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338 <at> debbugs.gnu.org, alan <at> idiocy.org, npostavs <at> gmail.com
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Wed, 03 Oct 2018 17:59:14 +0300
> From: Constantine Vetoshev <vetoshev <at> gmail.com>
> Date: Tue, 2 Oct 2018 15:51:11 -0700
> Cc: Alan Third <alan <at> idiocy.org>, Noam Postavsky <npostavs <at> gmail.com>, 32338 <at> debbugs.gnu.org
> 
> One change from my past reports: after compiling Emacs with -g flags,
> I have now managed to reproduce the crash under lldb, including
> attaching to the forked process which eats CPU after the crash.
> Backtrace from that process is attached.

Great, thanks.

> The highlights (as far as I noticed) are:
> - emacs_re_max_failures and the older re_max_failures are not
> initialized at this point

I believe this is incorrect.  re_max_failures is statically assigned a
value of 40000 in regex-emacs.c, and should be initialized at link
time.  Your build is with optimizations, isn't it?  I think the
optimizer reordered instructions, which creates the illusion that
re_max_failures has a garbled value at that point.  The value of
newlim, 10022912, is correct, you can confirm that by calculating it
by hand assuming that re_max_failures is 40000 and using the other
values your debugging session shows.

> - in the working branch, newlim is reset to rlim.rlim_max; in the
> broken branch, it is not
> - in the working branch, setrlimit does not get called; in the broken
> branch, it does

Right.

> I'm guessing the problem is with the uninitialized values for
> *_re_max_failures and the resulting values being assigned to lim and
> newlim. It seems to only work on the working branch by accident
> because, for whatever reason, newlim always gets reset to
> rlim.rlim_max and setrlimit doesn't get called.

No, I think the problem is with this line:

   > 884            if (pagesize <= newlim - lim)

In your case newlim is smaller than lim, but rlim_t is an unsigned
data type on your system, so the subtraction wraps around and produces
a large positive value, which then tricks Emacs into thinking it needs
to enlarge the stack, whereas in reality the stack space already
available, 67MB, is large enough.  (Btw, that value sounds too large,
I wonder if it's some problem with getrlimit on your system.)

So please try the patch below with the emacs-26 branch, and see if the
problem goes away.

diff --git a/src/emacs.c b/src/emacs.c
index 483e848..c0b4bd9 100644
--- a/src/emacs.c
+++ b/src/emacs.c
@@ -875,7 +875,8 @@ main (int argc, char **argv)
 	    newlim = rlim.rlim_max;
 	  newlim -= newlim % pagesize;
 
-	  if (pagesize <= newlim - lim)
+	  if (newlim > lim	/* in case rlim_t is an unsigned type */
+	      && pagesize <= newlim - lim)
 	    {
 	      rlim.rlim_cur = newlim;
 	      if (setrlimit (RLIMIT_STACK, &rlim) == 0)
@@ -884,9 +885,9 @@ main (int argc, char **argv)
 	}
       /* If the stack is big enough, let regex.c more of it before
          falling back to heap allocation.  */
-      emacs_re_safe_alloca = max
-        (min (lim - extra, SIZE_MAX) * (min_ratio / ratio),
-         MAX_ALLOCA);
+      emacs_re_safe_alloca =
+	max (min (min (0, lim - extra), SIZE_MAX) * (min_ratio / ratio),
+	     MAX_ALLOCA);
     }
 #endif /* HAVE_SETRLIMIT and RLIMIT_STACK and not CYGWIN */
 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#32338; Package emacs. (Wed, 03 Oct 2018 19:22:02 GMT) Full text and rfc822 format available.

Message #47 received at 32338 <at> debbugs.gnu.org (full text, mbox):

From: Constantine Vetoshev <vetoshev <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 32338 <at> debbugs.gnu.org, Alan Third <alan <at> idiocy.org>,
 Noam Postavsky <npostavs <at> gmail.com>
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Wed, 3 Oct 2018 12:21:26 -0700
On Wed, Oct 3, 2018 at 7:59 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> So please try the patch below with the emacs-26 branch, and see if the
> problem goes away.

Yes, that fixed it! Thanks!




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Thu, 04 Oct 2018 16:16:02 GMT) Full text and rfc822 format available.

Notification sent to Constantine Vetoshev <vetoshev <at> gmail.com>:
bug acknowledged by developer. (Thu, 04 Oct 2018 16:16:02 GMT) Full text and rfc822 format available.

Message #52 received at 32338-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Constantine Vetoshev <vetoshev <at> gmail.com>
Cc: 32338-done <at> debbugs.gnu.org, alan <at> idiocy.org, npostavs <at> gmail.com
Subject: Re: bug#32338: 26.1; term.el broken on macOS
Date: Thu, 04 Oct 2018 19:15:06 +0300
> From: Constantine Vetoshev <vetoshev <at> gmail.com>
> Date: Wed, 3 Oct 2018 12:21:26 -0700
> Cc: Alan Third <alan <at> idiocy.org>, Noam Postavsky <npostavs <at> gmail.com>, 32338 <at> debbugs.gnu.org
> 
> On Wed, Oct 3, 2018 at 7:59 AM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > So please try the patch below with the emacs-26 branch, and see if the
> > problem goes away.
> 
> Yes, that fixed it! Thanks!

Thanks, pushed to the release branch.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 02 Nov 2018 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 147 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.