GNU bug report logs - #34655
26.1.92; Segfault in module with --module-assertions

Previous Next

Package: emacs;

Reported by: "Basil L. Contovounesios" <contovob <at> tcd.ie>

Date: Mon, 25 Feb 2019 21:02:01 UTC

Severity: normal

Merged with 31238

Found in version 26.1.92

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 34655 in the body.
You can then email your comments to 34655 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Mon, 25 Feb 2019 21:02:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Basil L. Contovounesios" <contovob <at> tcd.ie>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Mon, 25 Feb 2019 21:02:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.1.92; Segfault in module with --module-assertions
Date: Mon, 25 Feb 2019 21:00:41 +0000
[gdb.txt (text/plain, attachment)]
[Message part 2 (text/plain, inline)]
I have written a dynamic module which provides a function
realpath-truename similar to file-truename, but which uses
canonicalize_file_name(3) internally and which doesn't respect file name
handlers.  It is hosted at the following URL:

https://gitlab.com/basil-conto/realpath

Repeatedly calling realpath-truename on a directory name in an Emacs
started with --module-assertions segfaults in a call to
file-name-as-directory.  The precise recipe I am using is the following:

0. git clone https://gitlab.com/basil-conto/realpath.git
1. cd realpath
2. make
3. cd /path/to/emacs/src
4. gdb emacs
5. set logging on
6. run -Q --module-assertions
7. (progn (module-load "/path/to/realpath.so")
          (dotimes (_ 1000)
            (realpath-truename user-emacs-directory)))
   C-j
9. bt full

I attach the corresponding GDB log.

I am not confident that the module is written properly, in particular
w.r.t. nonlocal exits, but I don't see what, if anything, I am doing
wrong, so any help would be greatly appreciated.

Thanks,

-- 
Basil

In GNU Emacs 26.1.92 (build 1, x86_64-pc-linux-gnu, X toolkit, Xaw3d scroll bars)
 of 2019-02-25 built on thunk
Repository revision: 1dff09739346037a588a3b9290800c09a9b3409a
Windowing system distributor 'The X.Org Foundation', version 11.0.12003000
System Description:	Debian GNU/Linux buster/sid

Configured using:
 'configure 'CC=ccache gcc' 'CFLAGS=-O0 -g3 -ggdb -gdwarf-4 -pipe'
 --config-cache --prefix=/home/blc/.local --program-suffix=26
 --enable-checking=yes,glyphs --enable-check-lisp-object-type
 --with-mailutils --with-x-toolkit=lucid --with-modules
 --with-file-notification=yes --with-x'

Configured features:
XAW3D XPM JPEG TIFF GIF PNG RSVG IMAGEMAGICK SOUND GPM DBUS GSETTINGS
GLIB NOTIFY ACL LIBSELINUX GNUTLS LIBXML2 FREETYPE M17N_FLT LIBOTF XFT
ZLIB TOOLKIT_SCROLL_BARS LUCID X11 XDBE XIM MODULES THREADS LIBSYSTEMD
LCMS2

Important settings:
  value of $LANG: en_IE.UTF-8
  locale-coding-system: utf-8-unix

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Tue, 26 Feb 2019 03:00:03 GMT) Full text and rfc822 format available.

Message #8 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Mon, 25 Feb 2019 21:59:40 -0500
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > I have written a dynamic module which provides a function
  > realpath-truename

The function might be useful, but that is not the right name for it.
In the GNU system we do not use "path" to mean a file name.

Could you describe in words what job the function does?
Then I could suggest a name.

-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Tue, 26 Feb 2019 11:18:02 GMT) Full text and rfc822 format available.

Message #11 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Richard Stallman <rms <at> gnu.org>
Cc: 34655 <at> debbugs.gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Tue, 26 Feb 2019 11:16:56 +0000
Richard Stallman <rms <at> gnu.org> writes:

>   > I have written a dynamic module which provides a function
>   > realpath-truename
>
> The function might be useful, but that is not the right name for it.
> In the GNU system we do not use "path" to mean a file name.

Thanks, I am aware of this.  See below for how the name came to be.

> Could you describe in words what job the function does?
> Then I could suggest a name.

Thanks, but just to clarify: I have not written this module with the
intention of advertising it for others to use; I don't think anyone
would find it of great use.

I was motivated to write it to test Emacs' new (at the time) module
system, and because, with enough packages installed, file-truename
slowed down package-initialize (and thus my emacs-init-time) by a
non-negligible amount.  I thus overrode file-truename with
realpath-truename using advice, with the caveat that realpath-truename
does not respect file name handlers (in practice I never needed this).

The new package-quickstart feature has made the module largely
unnecessary.  Either way, I regard it as a personal experiment.

Re: the name, I chose 'realpath' because the original implementation was
just that: a wrapper around realpath(3).  It was only in a later version
that I switched to using canonicalize_file_name(3) instead.  Perhaps a
better name would have been 'truename' or similar, but I don't think
this is important enough a matter to justify renaming it now.  Do you?

The reason I submitted this bug report is because I don't know whether
the switch --module-assertions has unveiled an issue in my module or
Emacs' module implementation.

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Tue, 26 Feb 2019 15:28:01 GMT) Full text and rfc822 format available.

Message #14 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Tue, 26 Feb 2019 17:27:30 +0200
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Date: Tue, 26 Feb 2019 11:16:56 +0000
> Cc: 34655 <at> debbugs.gnu.org
> 
> The reason I submitted this bug report is because I don't know whether
> the switch --module-assertions has unveiled an issue in my module or
> Emacs' module implementation.

So without using --module-assertions the crash doesn't happen, is that
what you are saying?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Tue, 26 Feb 2019 15:46:01 GMT) Full text and rfc822 format available.

Message #17 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Tue, 26 Feb 2019 17:45:21 +0200
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Date: Mon, 25 Feb 2019 21:00:41 +0000
> 
> Starting program: /home/blc/.local/src/emacs26/src/emacs -Q --module-assertions
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> [New Thread 0x7ffff01cb700 (LWP 8299)]
> [New Thread 0x7fffef9ac700 (LWP 8300)]
> [New Thread 0x7fffef1ab700 (LWP 8301)]
> 
> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
> re_search_2 (bufp=0xbf5d00 <searchbufs+384>, str1=0x0, size1=0, str2=0x0, size2=18, startpos=0, 
>     range=18, regs=0x0, stop=18) at regex.c:4354
> 4354				buf_ch = STRING_CHAR_AND_LENGTH (d, buf_charlen);
> #0  0x0000000000608594 in re_search_2
>     (bufp=0xbf5d00 <searchbufs+384>, str1=0x0, size1=0, str2=0x0, size2=18, startpos=0, range=18, regs=0x0, stop=18) at regex.c:4354
>         buf_charlen = 0
>         irange = 18
>         lim = 0
>         d = 0x0
>         buf_ch = 18
>         val = 691541629
>         string1 = 0x0
>         string2 = 0x0
>         fastmap = 0xbf5d38 <searchbufs+440> ""
>         translate = make_number(0)
>         total_size = 18
>         endpos = 18
>         anchored_start = 0 '\000'
>         multibyte = 1 '\001'
> #1  0x0000000000607f91 in re_search
>     (bufp=0xbf5d00 <searchbufs+384>, string=0x0, size=18, startpos=0, range=18, regs=0x0)
>     at regex.c:4181
> #2  0x00000000005f3fd0 in fast_string_match_internal
>     (regexp=XIL(0x8c761c), string=XIL(0x3036ec4), table=XIL(0)) at search.c:485
>         val = 140737488336288
>         bufp = 0xbf5d00 <searchbufs+384>

Here's your problem: fast_string_match_internal got a Lisp
string=XIL(0x3036ec4), but its data passed to re_search as the 2nd arg
is a NULL pointer.  You need to find out how this happens, e.g. by
setting a watchpoint on string's data inside Ffile_name_as_directory.
Or maybe the string is already corrupted there?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Tue, 26 Feb 2019 18:43:02 GMT) Full text and rfc822 format available.

Message #20 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34655 <at> debbugs.gnu.org, rms <at> gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Tue, 26 Feb 2019 18:42:29 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Date: Tue, 26 Feb 2019 11:16:56 +0000
>> Cc: 34655 <at> debbugs.gnu.org
>> 
>> The reason I submitted this bug report is because I don't know whether
>> the switch --module-assertions has unveiled an issue in my module or
>> Emacs' module implementation.
>
> So without using --module-assertions the crash doesn't happen, is that
> what you are saying?

Yes.  I can't guarantee the crash doesn't happen without
--module-assertions of course, but in my tests it only seems to crop up
with the switch (and reliably so).

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Wed, 27 Feb 2019 04:11:01 GMT) Full text and rfc822 format available.

Message #23 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Richard Stallman <rms <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Tue, 26 Feb 2019 23:10:11 -0500
[[[ To any NSA and FBI agents reading my email: please consider    ]]]
[[[ whether defending the US Constitution against all enemies,     ]]]
[[[ foreign or domestic, requires you to follow Snowden's example. ]]]

  > Re: the name, I chose 'realpath' because the original implementation was
  > just that: a wrapper around realpath(3).  It was only in a later version
  > that I switched to using canonicalize_file_name(3) instead.  Perhaps a
  > better name would have been 'truename' or similar, but I don't think
  > this is important enough a matter to justify renaming it now.  Do you?

If you don't think it is worth publicizing, and you don't really want
to use it, I won't claim that minor fixes in it are important.

A good name might be

   nonmagic-file-truename


-- 
Dr Richard Stallman
President, Free Software Foundation (https://gnu.org, https://fsf.org)
Internet Hall-of-Famer (https://internethalloffame.org)






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Sun, 17 Mar 2019 16:40:02 GMT) Full text and rfc822 format available.

Message #26 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34655 <at> debbugs.gnu.org, Philipp Stephani <p.stephani2 <at> gmail.com>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Sun, 17 Mar 2019 16:38:58 +0000
[gdb.txt (text/plain, attachment)]
[Message part 2 (text/plain, inline)]
[CCing Philipp as the author of module assertions.]
       
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Date: Mon, 25 Feb 2019 21:00:41 +0000
>> 
>> Starting program: /home/blc/.local/src/emacs26/src/emacs -Q --module-assertions
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
>> [New Thread 0x7ffff01cb700 (LWP 8299)]
>> [New Thread 0x7fffef9ac700 (LWP 8300)]
>> [New Thread 0x7fffef1ab700 (LWP 8301)]
>> 
>> Thread 1 "emacs" received signal SIGSEGV, Segmentation fault.
>> re_search_2 (bufp=0xbf5d00 <searchbufs+384>, str1=0x0, size1=0, str2=0x0, size2=18, startpos=0, 
>>     range=18, regs=0x0, stop=18) at regex.c:4354
>> 4354				buf_ch = STRING_CHAR_AND_LENGTH (d, buf_charlen);
>> #0  0x0000000000608594 in re_search_2
>>     (bufp=0xbf5d00 <searchbufs+384>, str1=0x0, size1=0, str2=0x0, size2=18, startpos=0, range=18, regs=0x0, stop=18) at regex.c:4354
>>         buf_charlen = 0
>>         irange = 18
>>         lim = 0
>>         d = 0x0
>>         buf_ch = 18
>>         val = 691541629
>>         string1 = 0x0
>>         string2 = 0x0
>>         fastmap = 0xbf5d38 <searchbufs+440> ""
>>         translate = make_number(0)
>>         total_size = 18
>>         endpos = 18
>>         anchored_start = 0 '\000'
>>         multibyte = 1 '\001'
>> #1  0x0000000000607f91 in re_search
>>     (bufp=0xbf5d00 <searchbufs+384>, string=0x0, size=18, startpos=0, range=18, regs=0x0)
>>     at regex.c:4181
>> #2  0x00000000005f3fd0 in fast_string_match_internal
>>     (regexp=XIL(0x8c761c), string=XIL(0x3036ec4), table=XIL(0)) at search.c:485
>>         val = 140737488336288
>>         bufp = 0xbf5d00 <searchbufs+384>
>
> Here's your problem: fast_string_match_internal got a Lisp
> string=XIL(0x3036ec4), but its data passed to re_search as the 2nd arg
> is a NULL pointer.  You need to find out how this happens, e.g. by
> setting a watchpoint on string's data inside Ffile_name_as_directory.
> Or maybe the string is already corrupted there?

The file name string is already corrupted when Ffile_name_as_directory
is called; see below.

First, I rewrote the dynamic module[1] to add nonlocal exit checks after
almost every call to the module API.

[1]: https://gitlab.com/basil-conto/realpath

I also added the following assertions to src/emacs-module.c:

[module-asserts.patch (text/x-diff, inline)]
diff --git a/src/emacs-module.c b/src/emacs-module.c
index 0abfd3f6f1..4f2b0ecd4b 100644
--- a/src/emacs-module.c
+++ b/src/emacs-module.c
@@ -458,6 +458,8 @@ module_make_function (emacs_env *env, ptrdiff_t min_arity, ptrdiff_t max_arity,
   return lisp_to_value (env, result);
 }
 
+static bool my_check = false;
+
 static emacs_value
 module_funcall (emacs_env *env, emacs_value fun, ptrdiff_t nargs,
 		emacs_value args[])
@@ -473,6 +475,7 @@ module_funcall (emacs_env *env, emacs_value fun, ptrdiff_t nargs,
     xsignal0 (Qoverflow_error);
   SAFE_ALLOCA_LISP (newargs, nargs1);
   newargs[0] = value_to_lisp (fun);
+  my_check = EQ (newargs[0], Qfile_name_as_directory);
   for (ptrdiff_t i = 0; i < nargs; i++)
     newargs[1 + i] = value_to_lisp (args[i]);
   emacs_value result = lisp_to_value (env, Ffuncall (nargs1, newargs));
@@ -581,8 +584,10 @@ module_make_string (emacs_env *env, const char *str, ptrdiff_t length)
   /* FIXME: AUTO_STRING_WITH_LEN requires STR to be null-terminated,
      but we shouldn't require that.  */
   AUTO_STRING_WITH_LEN (lstr, str, length);
-  return lisp_to_value (env,
-                        code_convert_string_norecord (lstr, Qutf_8, false));
+  Lisp_Object lisp = code_convert_string_norecord (lstr, Qutf_8, false);
+  eassert (STRINGP (lisp) && SSDATA (lisp));
+  my_check = true;
+  return lisp_to_value (env, lisp);
 }
 
 static emacs_value
@@ -955,6 +960,8 @@ value_to_lisp_bits (emacs_value v)
 static Lisp_Object
 value_to_lisp (emacs_value v)
 {
+  bool b = my_check;
+  my_check = false;
   if (module_assertions)
     {
       /* Check the liveness of the value by iterating over all live
@@ -972,7 +979,11 @@ value_to_lisp (emacs_value v)
             {
               Lisp_Object *p = XSAVE_POINTER (XCAR (values), 0);
               if (p == optr)
-                return *p;
+                {
+                  if (b)
+                    eassert (STRINGP (*p) && SSDATA (*p));
+                  return *p;
+                }
               ++num_values;
             }
           ++num_environments;
@@ -1008,6 +1019,8 @@ lisp_to_value_bits (Lisp_Object o)
 static emacs_value
 lisp_to_value (emacs_env *env, Lisp_Object o)
 {
+  bool b = my_check;
+  my_check = false;
   if (module_assertions)
     {
       /* Add the new value to the list of values allocated from this
@@ -1020,6 +1033,11 @@ lisp_to_value (emacs_env *env, Lisp_Object o)
       ATTRIBUTE_MAY_ALIAS emacs_value ret = vptr;
       struct emacs_env_private *priv = env->private_members;
       priv->values = Fcons (make_save_ptr (ret), priv->values);
+      if (b)
+        {
+          eassert (STRINGP (o) && SSDATA (o));
+          eassert (STRINGP (*optr) && SSDATA (*optr));
+        }
       return ret;
     }
 
[Message part 4 (text/plain, inline)]
These reveal that value_to_lisp eventually returns a corrupted string,
but I don't know why.  I've seen comments in src/fileio.c referring to
string-relocation during GC; could this be at play here?  Either way, do
you have any suggestions on how to proceed?

Here's the full recipe again, now with debugging symbols for the module:

0. git clone https://gitlab.com/basil-conto/realpath.git
1. cd realpath
2. make DEBUG=1
3. cd /path/to/emacs/src
4. gdb emacs
5. set logging on
6. r -Q --module-assertions
7. (progn (module-load "/path/to/realpath.so")
          (dotimes (_ 1000)
            (realpath-truename user-emacs-directory)))
   C-j

   [The loop usually trips the assertions on its first run, but if it
    doesn't, just try again once or twice.]

8. bt full
9. f 2
10. p p
11. pr [#<INVALID_LISP_OBJECT 0x03059c90>]
12. xpr

I attach the resulting GDB log.

Thanks,

-- 
Basil

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Sun, 17 Mar 2019 17:09:01 GMT) Full text and rfc822 format available.

Message #29 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Sun, 17 Mar 2019 19:08:01 +0200
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: <34655 <at> debbugs.gnu.org>, Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Sun, 17 Mar 2019 16:38:58 +0000
> 
> These reveal that value_to_lisp eventually returns a corrupted string,
> but I don't know why.

Did you try to identify the code which causes the corruption, i.e. the
data is valid before that code runs, and invalid after that?  If not,
can you try?  The way to do that is by painstakingly stepping through
the code while examining the relevant data, possibly with help of
watchpoints and displays set up by the GDB "display" command.

> I've seen comments in src/fileio.c referring to string-relocation
> during GC; could this be at play here?

It could be, if your module code holds onto C pointers to Lisp string
data while Emacs runs parts of the interpreter which could GC.  Does
that happen anywhere in your code or in the code involved in
module-assertions?

> Either way, do you have any suggestions on how to proceed?

See above.

I tried at the time to reproduce your problem, and failed.  But I did
that on Windows, where I needed to replace the non-existent realpath
by an equivalent function, so it's not a faithful reproduction.  I
will see if I can find time to look at this on a GNU machine, unless
someone beats me to it.

> 8. bt full
> 9. f 2
> 10. p p
> 11. pr [#<INVALID_LISP_OBJECT 0x03059c90>]
> 12. xpr

Why did you expect 'p' to be a valid Lisp object?  It's actually a
pointer to a Lisp object, i.e. try

  (gdb) p *p
  (gdb) xpr




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Sun, 17 Mar 2019 23:54:02 GMT) Full text and rfc822 format available.

Message #32 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Sun, 17 Mar 2019 23:52:55 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
>> Cc: <34655 <at> debbugs.gnu.org>, Philipp Stephani <p.stephani2 <at> gmail.com>
>> Date: Sun, 17 Mar 2019 16:38:58 +0000
>> 
>> These reveal that value_to_lisp eventually returns a corrupted string,
>> but I don't know why.
>
> Did you try to identify the code which causes the corruption, i.e. the
> data is valid before that code runs, and invalid after that?  If not,
> can you try?  The way to do that is by painstakingly stepping through
> the code while examining the relevant data, possibly with help of
> watchpoints and displays set up by the GDB "display" command.

The patch adding assertions to emacs-module.c narrows the problematic
code to lines 123--127 of the dynamic module[1]:

      if (rp_lisp_string (env, &file, nbuf)
          && rp_funcall (env, &dir, "directory-name-p", 1, &dir)
          && env->is_not_nil (env, dir))
        /* Return directory name when given one à la Ffile_truename.  */
        rp_funcall (env, &file, "file-name-as-directory", 1, &file);

[1]: https://gitlab.com/basil-conto/realpath/blob/master/realpath.c#L123-127

On line 123, 'file' is set to an Emacs string created from the C string
'nbuf' ('rp_lisp_string' wraps 'module_make_string' along with a
nonlocal exit check, and similarly 'rp_funcall' wraps 'module_funcall').
On line 127, 'file' is passed to 'file-name-as-directory'.

The assertions added to 'module_make_string' and 'lisp_to_value' never
fail, suggesting the string returned by them is fine (though the
assertions in 'lisp_to_value' only target intermediate Lisp_Objects, not
the returned emacs_value).

The assertion added to 'value_to_lisp' via 'module_funcall', OTOH, does
fail.  I'll see if I can step through this, though I'm not yet sure how
I'll distinguish the problematic call to the module function from the
hundreds of unproblematic ones before it.  There's probably a way to
teach GDB how to inspect emacs_values which I'm not yet familiar with.

>> I've seen comments in src/fileio.c referring to string-relocation
>> during GC; could this be at play here?
>
> It could be, if your module code holds onto C pointers to Lisp string
> data while Emacs runs parts of the interpreter which could GC.  Does
> that happen anywhere in your code or in the code involved in
> module-assertions?

I can't speak for emacs-module.c (I haven't yet understood how
Vmodule_environments and its save pointers work), but the only exchange
between C and Lisp strings in my code is via the module API,
i.e. module_make_string and module_copy_string_contents.  I would hope
the API and its opaque emacs_value type make it difficult for such
issues to arise.

>> Either way, do you have any suggestions on how to proceed?
>
> See above.
>
> I tried at the time to reproduce your problem, and failed.  But I did
> that on Windows, where I needed to replace the non-existent realpath
> by an equivalent function, so it's not a faithful reproduction.  I
> will see if I can find time to look at this on a GNU machine, unless
> someone beats me to it.

Replacing 'canonicalize_file_name' with 'strdup' still reproduces the
issue for me.  Perhaps increasing the number of calls to
realpath-truename from 1000 to 5000 will also help.

>> 8. bt full
>> 9. f 2
>> 10. p p
>> 11. pr [#<INVALID_LISP_OBJECT 0x03059c90>]
>> 12. xpr
>
> Why did you expect 'p' to be a valid Lisp object?  It's actually a
> pointer to a Lisp object, i.e. try
>
>   (gdb) p *p
>   (gdb) xpr

Oops, that was a thinko.  The only difference is GDB reports XIL(...)
instead of (Lisp_Object *), though.

Thank you for your help, I'll report more as time allows.

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Mon, 18 Mar 2019 16:22:01 GMT) Full text and rfc822 format available.

Message #35 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Mon, 18 Mar 2019 18:21:25 +0200
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: <34655 <at> debbugs.gnu.org>,  <p.stephani2 <at> gmail.com>
> Date: Sun, 17 Mar 2019 23:52:55 +0000
> 
> > I tried at the time to reproduce your problem, and failed.  But I did
> > that on Windows, where I needed to replace the non-existent realpath
> > by an equivalent function, so it's not a faithful reproduction.  I
> > will see if I can find time to look at this on a GNU machine, unless
> > someone beats me to it.
> 
> Replacing 'canonicalize_file_name' with 'strdup' still reproduces the
> issue for me.  Perhaps increasing the number of calls to
> realpath-truename from 1000 to 5000 will also help.

Right, the strdup part did that for me.  (My previous attempt also
used strdup as part of the replacement, but still failed to reproduce.
Memory corruption bugs are frequently weird that way.)

So I modified your recipe slightly, like this:

  (progn (module-load "/path/to/realpath.so")
	 (setq garbage-collection-messages t)
	 (dotimes (i 5000)
	   (message "%d" i)
	   (realpath-truename user-emacs-directory)))

put it on a file named rp.el, and then ran it under GDB:

  (gdb) r -batch --module-assertions -l ./rp.el

Here's what I see:

  [...]
  3077
  3078
  3079
  3080
  Garbage collecting...
  Garbage collecting...done

  Thread 1 received signal SIGSEGV, Segmentation fault.
  0x011e9918 in rpl_re_search_2 (bufp=0x17c0260 <searchbufs+4736>, str1=0x0,
      size1=0, str2=0x0, size2=20, startpos=0, range=20, regs=0x0, stop=20)
      at regex-emacs.c:3322
  3322                            buf_ch = STRING_CHAR_AND_LENGTH (d, buf_charlen)

Looks like it indeed crashes after a GC, and on my system needs more
than 3000 iterations.  So let's run it with a breakpoint at the
beginning of GC:

  (gdb) break alloc.c:6044
  Breakpoint 3 at 0x11fa1bb: file alloc.c, line 6044.
  (gdb) r -batch --module-assertions -l ./rp.el
  [...]
  3080

  Thread 1 hit Breakpoint 3, garbage_collect_1 (gcst=0x82bb84) at alloc.c:6044
  6044      record_in_backtrace (QAutomatic_GC, 0, 0);

The backtrace at this point:

  (gdb) bt
  #0  garbage_collect_1 (gcst=0x82bb84) at alloc.c:6044
  #1  0x011fa88e in garbage_collect () at alloc.c:6241
  #2  0x01149adc in maybe_gc () at lisp.h:5028
  #3  0x0123b012 in Ffuncall (nargs=2, args=0x82bcb0) at eval.c:2829
  #4  0x0128c3cf in module_funcall (env=0x6122730, fun=0x6122868, nargs=1,
      args=0x82bd50) at emacs-module.c:483
  #5  0x62d8136f in rp_funcall (env=0x6122730, value=0x82bd50,
      name=0x62d83060 "directory-name-p", nargs=1, args=0x82bd50)
      at realpath.c:62
  #6  0x62d815cc in Frealpath_truename (env=0x6122730, nargs=1, args=0x82bd90,
      data=0x0) at realpath.c:124
  [...]

  Lisp Backtrace:
  "directory-name-p" (0x82bcb8)
  "realpath-truename" (0x82bf80)
  "while" (0x82c2c8)
  "let" (0x82c538)
  "eval-buffer" (0x82cab0)
  "load-with-code-conversion" (0x82d0f0)
  "load" (0x82d9b8)
  "command-line-1" (0x82e3d0)
  "command-line" (0x82efe8)
  "normal-top-level" (0x82f690)

As you see, the call to Ffuncall is the one that triggers GC from time
to time.

What happens with our 'file' at this point?

  (gdb) fr 6
  (gdb) p file
  $1 = (emacs_value) 0x6122848
  (gdb) p *file
  $2 = <incomplete type>
  (gdb) p *(Lisp_Object *)file
  $3 = XIL(0x8000000006121ed0)
  (gdb) xtype
  Lisp_String
  (gdb) xstring
  $4 = (struct Lisp_String *) 0x6121ed0
  "d:/usr/eli/.emacs.d/"

Still valid.  Now let's see who thrashes it:

  (gdb) p *$4
  $5 = {
    u = {
      s = {
	size = 20,
	size_byte = 20,
	intervals = 0x0,
	data = 0x611e9fc "d:/usr/eli/.emacs.d/"
      },
      next = 0x14,
      gcaligned = 20 '\024'
    }
  }
  (gdb) watch -l $4->u.s.data
  Hardware watchpoint 4: -location $4->u.s.data
  (gdb) c
  Continuing.
  Garbage collecting...

  Thread 1 hit Hardware watchpoint 4: -location $4->u.s.data

  Old value = (unsigned char *) 0x611e9fc "\024"
  New value = (unsigned char *) 0x0
  sweep_strings () at alloc.c:2163
  2163                      NEXT_FREE_LISP_STRING (s) = string_free_list;
  (gdb) list
  2158                      /* Reset the strings's `data' member so that we
  2159                         know it's free.  */
  2160                      s->u.s.data = NULL;
  2161
  2162                      /* Put the string on the free-list.  */
  2163                      NEXT_FREE_LISP_STRING (s) = string_free_list;
  2164                      string_free_list = ptr_bounds_clip (s, sizeof *s);
  2165                      ++nfree;
  2166                    }
  2167                }

Bingo!  This is sweep_strings freeing our string, because it evidently
doesn't think it's a Lisp object that is being still referenced.

The culprit is this fragment from emacs-module.c, which is called each
time you receive a Lisp object from Emacs which you want to use in
your module:

  /* Convert O to an emacs_value.  Allocate storage if needed; this can
     signal if memory is exhausted.  Must be an injective function.  */
  static emacs_value
  lisp_to_value (emacs_env *env, Lisp_Object o)
  {
    if (module_assertions)
      {
	/* Add the new value to the list of values allocated from this
	   environment.  The value is actually a pointer to the
	   Lisp_Object cast to emacs_value.  We make a copy of the
	   object on the free store to guarantee unique addresses.  */
	ATTRIBUTE_MAY_ALIAS Lisp_Object *optr = xmalloc (sizeof o);
	*optr = o;
	void *vptr = optr;
	ATTRIBUTE_MAY_ALIAS emacs_value ret = vptr;
	struct emacs_env_private *priv = env->private_members;
	priv->values = Fcons (make_mint_ptr (ret), priv->values);
	return ret;
      }

What this does is make a copy of each Lisp object you get from Emacs,
store that copy in memory allocated off the heap, and hand your module
a pointer to the copy instead of the original object.  So when you
call, e.g., directory-name-p, an Emacs function, it gets that copy of
the object.  But memory allocation by xmalloc doesn't record the
allocated memory in the red-black tree we maintain for the purposes of
detecting Lisp objects referenced by C stack-based variables.  So when
GC comes to examine the C stack, it doesn't consider the variable
'file' in your module as being a pointer to a live Lisp object, and so
it doesn't mark it.  Then the sweep phase of GC recycles your Lisp
object, which in this case involves setting the string's data to a
NULL pointer.

The patch to fix this is below; it simply marks these copied values by
hand, thus preventing them from being GCed.  It ran successfully with
even 50,000 iterations.

Philipp, any comments/objections?

--- src/emacs-module.c~0	2019-02-25 10:18:35.000000000 +0200
+++ src/emacs-module.c	2019-03-18 08:33:00.564476000 +0200
@@ -1133,6 +1133,15 @@ mark_modules (void)
       mark_object (priv->non_local_exit_symbol);
       mark_object (priv->non_local_exit_data);
       mark_object (priv->values);
+      if (module_assertions)
+	{
+	  for (Lisp_Object values = priv->values;
+	       CONSP (values); values = XCDR (values))
+	    {
+	      Lisp_Object *p = xmint_pointer (XCAR (values));
+	      mark_object (*p);
+	    }
+	}
     }
 }
 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Mon, 18 Mar 2019 16:59:02 GMT) Full text and rfc822 format available.

Message #38 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Mon, 18 Mar 2019 16:58:35 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

> The patch to fix this is below; it simply marks these copied values by
> hand, thus preventing them from being GCed.  It ran successfully with
> even 50,000 iterations.

I can confirm that your patch fixes the issue.

I am very grateful not only for your looking into this, but also for
taking the time to explain the whole process; it has been enlightening
and would have taken me a lot of time to figure out alone.

Thanks,

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Mon, 18 Mar 2019 17:54:02 GMT) Full text and rfc822 format available.

Message #41 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Mon, 18 Mar 2019 19:53:03 +0200
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: <34655 <at> debbugs.gnu.org>,  <p.stephani2 <at> gmail.com>
> Date: Mon, 18 Mar 2019 16:58:35 +0000
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > The patch to fix this is below; it simply marks these copied values by
> > hand, thus preventing them from being GCed.  It ran successfully with
> > even 50,000 iterations.
> 
> I can confirm that your patch fixes the issue.

Great, thanks for testing.

> I am very grateful not only for your looking into this, but also for
> taking the time to explain the whole process; it has been enlightening
> and would have taken me a lot of time to figure out alone.

You are welcome.

I will wait for a few days to give Philipp and others a chance to
comment, and push then if no one comes up with objections.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 16:12:01 GMT) Full text and rfc822 format available.

Message #44 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 17:11:41 +0100
Am Mo., 18. März 2019 um 18:53 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>
> > From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> > Cc: <34655 <at> debbugs.gnu.org>,  <p.stephani2 <at> gmail.com>
> > Date: Mon, 18 Mar 2019 16:58:35 +0000
> >
> > Eli Zaretskii <eliz <at> gnu.org> writes:
> >
> > > The patch to fix this is below; it simply marks these copied values by
> > > hand, thus preventing them from being GCed.  It ran successfully with
> > > even 50,000 iterations.
> >
> > I can confirm that your patch fixes the issue.
>
> Great, thanks for testing.
>
> > I am very grateful not only for your looking into this, but also for
> > taking the time to explain the whole process; it has been enlightening
> > and would have taken me a lot of time to figure out alone.
>
> You are welcome.
>
> I will wait for a few days to give Philipp and others a chance to
> comment, and push then if no one comes up with objections.

I haven't checked everything in detail, but my impression is that this
is rather another instance of bug#31238. Fixing this only when module
assertions are enabled will probably not fix anything, but rather mask
issues. Reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a is
still the right approach here. Can you please hold off a bit? I've
almost completed the revert, but haven't pushed it yet. Once that's in
we can check whether it also fixes this issue.

Thanks!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 17:01:02 GMT) Full text and rfc822 format available.

Message #47 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 19:00:42 +0200
> From: Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Thu, 21 Mar 2019 17:11:41 +0100
> Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> 
> I haven't checked everything in detail, but my impression is that this
> is rather another instance of bug#31238. Fixing this only when module
> assertions are enabled will probably not fix anything, but rather mask
> issues. Reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a is
> still the right approach here. Can you please hold off a bit? I've
> almost completed the revert, but haven't pushed it yet. Once that's in
> we can check whether it also fixes this issue.

I will CC Stefan, who committed 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.

I'm not sure we should revert that; we could instead add GC protection
for those parts that need it.  And the list consed by
module-assertions certainly is one of those.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 18:29:02 GMT) Full text and rfc822 format available.

Message #50 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 19:28:07 +0100
Am Do., 21. März 2019 um 18:00 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>
> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Thu, 21 Mar 2019 17:11:41 +0100
> > Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> >
> > I haven't checked everything in detail, but my impression is that this
> > is rather another instance of bug#31238. Fixing this only when module
> > assertions are enabled will probably not fix anything, but rather mask
> > issues. Reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a is
> > still the right approach here. Can you please hold off a bit? I've
> > almost completed the revert, but haven't pushed it yet. Once that's in
> > we can check whether it also fixes this issue.
>
> I will CC Stefan, who committed 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.
>
> I'm not sure we should revert that; we could instead add GC protection
> for those parts that need it.

Yes, that's what reverting that commit does :-)
We need to mark the objects in all cases, not just when module
assertions are enabled.
Note that both the designer of the module API (Daniel) and I as one of
its main implementers disagree with commit
3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a. I'm happy to discuss
alternatives, but for now we should revert it and discuss the
alternatives *before* implementing them. I've already confirmed that
reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a fixes
bug#31238, and I can try it with this bug as well.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 19:24:02 GMT) Full text and rfc822 format available.

Message #53 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 20:23:30 +0100
Am Do., 21. März 2019 um 19:28 Uhr schrieb Philipp Stephani
<p.stephani2 <at> gmail.com>:
>
> Am Do., 21. März 2019 um 18:00 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
> >
> > > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > > Date: Thu, 21 Mar 2019 17:11:41 +0100
> > > Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> > >
> > > I haven't checked everything in detail, but my impression is that this
> > > is rather another instance of bug#31238. Fixing this only when module
> > > assertions are enabled will probably not fix anything, but rather mask
> > > issues. Reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a is
> > > still the right approach here. Can you please hold off a bit? I've
> > > almost completed the revert, but haven't pushed it yet. Once that's in
> > > we can check whether it also fixes this issue.
> >
> > I will CC Stefan, who committed 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.
> >
> > I'm not sure we should revert that; we could instead add GC protection
> > for those parts that need it.
>
> Yes, that's what reverting that commit does :-)
> We need to mark the objects in all cases, not just when module
> assertions are enabled.
> Note that both the designer of the module API (Daniel) and I as one of
> its main implementers disagree with commit
> 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a. I'm happy to discuss
> alternatives, but for now we should revert it and discuss the
> alternatives *before* implementing them. I've already confirmed that
> reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a fixes
> bug#31238, and I can try it with this bug as well.

I wasn't able to reproduce bug#34655 myself (these things tend to be
rather flaky), but I've now reverted commit
3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a, and at least bug#31238 is
now consistently fixed (for me at least). Basil, can you check whether
you can still reproduce bug#34655 with the current master?

Thanks!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 19:28:02 GMT) Full text and rfc822 format available.

Message #56 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 21:27:25 +0200
> > I will CC Stefan, who committed 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.
> >
> > I'm not sure we should revert that; we could instead add GC protection
> > for those parts that need it.
> 
> Yes, that's what reverting that commit does :-)

AFAIU, it does much more.  Stefan intended for the conservative stack
marking to do the job, so maybe there's a little more that should be
done to get there.  Or maybe Stefan didn't consider some important
factor(s).  In either case, I'd like to hear his POV on this before we
decide how to proceed.

> We need to mark the objects in all cases, not just when module
> assertions are enabled.

If we get stack marking to work, we won't need to mark objects
explicitly.

> Note that both the designer of the module API (Daniel) and I as one of
> its main implementers disagree with commit
> 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.

OK, but I think Stefan's opinion is not less important.

> I've already confirmed that reverting commit
> 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a fixes bug#31238, and I can
> try it with this bug as well.

Please do, it's important to know that, I think.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 19:35:01 GMT) Full text and rfc822 format available.

Message #59 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 21:34:06 +0200
> From: Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Thu, 21 Mar 2019 20:23:30 +0100
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> 
> I wasn't able to reproduce bug#34655 myself (these things tend to be
> rather flaky), but I've now reverted commit
> 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a

And I've just reverted the revert.

I'm sorry, but you cannot apply rules unilaterally: if you think it's
not OK to make changes without waiting for consensus, please adhere to
the same principles when you want to make your own changes.

Let's wait for this discussion to run to completion, before we are
doing any more changes in this area.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 19:38:02 GMT) Full text and rfc822 format available.

Message #62 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 20:37:24 +0100
Am Do., 21. März 2019 um 20:27 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>
> > > I will CC Stefan, who committed 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.
> > >
> > > I'm not sure we should revert that; we could instead add GC protection
> > > for those parts that need it.
> >
> > Yes, that's what reverting that commit does :-)
>
> AFAIU, it does much more.  Stefan intended for the conservative stack
> marking to do the job, so maybe there's a little more that should be
> done to get there.  Or maybe Stefan didn't consider some important
> factor(s).  In either case, I'd like to hear his POV on this before we
> decide how to proceed.

Let's go back to the known good state first, and then discuss how to
go from there.

>
> > We need to mark the objects in all cases, not just when module
> > assertions are enabled.
>
> If we get stack marking to work, we won't need to mark objects
> explicitly.

We can't get stack marking to work, even theoretically.

A module is free to do

emacs_value x = ...;
uintptr_t y = ((uintrptr_t) x) ^ 0x123456u;
(garbage-collect)
emacs_value z = (emacs_value) (y ^ 0x123456u);
... use z ...

During the garbage collection, x isn't on the stack anywhere, and the
garbage collector couldn't possibly find it.

Or:

emacs_value x = ...;
emacs_value *y = malloc (sizeof emacs_value);
*y = x;
... stop using x...
(garbage-collect)
...use *y ...

Again, during garbage collection x is no longer on the stack.

We can only use stack scanning in Emacs because we control the Emacs
source code and can make sure these patterns don't occur. Module code
is completely arbitrary.

>
> > Note that both the designer of the module API (Daniel) and I as one of
> > its main implementers disagree with commit
> > 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.
>
> OK, but I think Stefan's opinion is not less important.

I value his opinion, but again: let's make the thing work first, and
then discuss options.

>
> > I've already confirmed that reverting commit
> > 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a fixes bug#31238, and I can
> > try it with this bug as well.
>
> Please do, it's important to know that, I think.

Basil, could you check that with the revert your code now works? Thanks!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 19:51:02 GMT) Full text and rfc822 format available.

Message #65 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 21:50:36 +0200
> From: Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Thu, 21 Mar 2019 20:37:24 +0100
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> 
> Let's go back to the known good state first, and then discuss how to
> go from there.

I don't see why that is better than discuss first and then go to where
we decide to go.  It's not like Emacs 27 will be released any time
soon, so there's no rush.

> We can't get stack marking to work, even theoretically.
> 
> A module is free to do
> 
> emacs_value x = ...;
> uintptr_t y = ((uintrptr_t) x) ^ 0x123456u;
> (garbage-collect)
> emacs_value z = (emacs_value) (y ^ 0x123456u);
> ... use z ...
> 
> During the garbage collection, x isn't on the stack anywhere

Why do you think x isn't on the stack in this case?

Moreover, emacs_value is actually a pointer to a Lisp object, so this
object is also somewhere on the stack, right?

> emacs_value x = ...;
> emacs_value *y = malloc (sizeof emacs_value);
> *y = x;
> ... stop using x...
> (garbage-collect)
> ...use *y ...
> 
> Again, during garbage collection x is no longer on the stack.

Why do you think it isn't on the stack?

> We can only use stack scanning in Emacs because we control the Emacs
> source code

Actually, we do nothing special about stack-based values in our
sources, except avoiding undefined behavior.

> > OK, but I think Stefan's opinion is not less important.
> 
> I value his opinion, but again: let's make the thing work first, and
> then discuss options.

Fixing one bug doesn't necessarily mean things now "work"; there's
always one more bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 20:03:02 GMT) Full text and rfc822 format available.

Message #68 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
 Daniel Colascione <dancol <at> dancol.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 21:01:43 +0100
Am Do., 21. März 2019 um 20:50 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>
> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Thu, 21 Mar 2019 20:37:24 +0100
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> >
> > Let's go back to the known good state first, and then discuss how to
> > go from there.
>
> I don't see why that is better than discuss first and then go to where
> we decide to go.  It's not like Emacs 27 will be released any time
> soon, so there's no rush.

For one, it becomes harder and harder to revert commits the older they
get. Also such discussions tend to turn into endless debates about the
"perfect" solution until one side gives up, without improving
anything. I strongly prefer fixing actual bugs that affect users in
practice and then discussing (or not discussing) the gold-plating
steps later.

>
> > We can't get stack marking to work, even theoretically.
> >
> > A module is free to do
> >
> > emacs_value x = ...;
> > uintptr_t y = ((uintrptr_t) x) ^ 0x123456u;
> > (garbage-collect)
> > emacs_value z = (emacs_value) (y ^ 0x123456u);
> > ... use z ...
> >
> > During the garbage collection, x isn't on the stack anywhere
>
> Why do you think x isn't on the stack in this case?

Because the compiler reused the stack slot for something else?
Because the module is written in a language that doesn't use the stack
in a way that the garbage collector expects? There's no reason to
assume modules have any form of C-compatible stack layout.

>
> Moreover, emacs_value is actually a pointer to a Lisp object, so this
> object is also somewhere on the stack, right?
>
> > emacs_value x = ...;
> > emacs_value *y = malloc (sizeof emacs_value);
> > *y = x;
> > ... stop using x...
> > (garbage-collect)
> > ...use *y ...
> >
> > Again, during garbage collection x is no longer on the stack.
>
> Why do you think it isn't on the stack?

Same as above.

>
> > We can only use stack scanning in Emacs because we control the Emacs
> > source code
>
> Actually, we do nothing special about stack-based values in our
> sources, except avoiding undefined behavior.

(Stack scanning is undefined behavior, but that's not the point.)
We do something very specific with the stack: we make sure that
Lisp_Objects are never manipulated in a way similar to the above, and
we use the C language.

>
> > > OK, but I think Stefan's opinion is not less important.
> >
> > I value his opinion, but again: let's make the thing work first, and
> > then discuss options.
>
> Fixing one bug doesn't necessarily mean things now "work"; there's
> always one more bug.

That might be theoretically true, but shouldn't impact decisions until
that bug is actually found. All regression tests still pass after
reverting the commit.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 20:15:02 GMT) Full text and rfc822 format available.

Message #71 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org, dancol <at> dancol.org,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 22:14:46 +0200
> From: Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Thu, 21 Mar 2019 21:01:43 +0100
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org, 
> 	Daniel Colascione <dancol <at> dancol.org>
> 
> Am Do., 21. März 2019 um 20:50 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
> >
> > > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > > Date: Thu, 21 Mar 2019 20:37:24 +0100
> > > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> > >
> > > Let's go back to the known good state first, and then discuss how to
> > > go from there.
> >
> > I don't see why that is better than discuss first and then go to where
> > we decide to go.  It's not like Emacs 27 will be released any time
> > soon, so there's no rush.
> 
> For one, it becomes harder and harder to revert commits the older they
> get. Also such discussions tend to turn into endless debates about the
> "perfect" solution until one side gives up, without improving
> anything. I strongly prefer fixing actual bugs that affect users in
> practice and then discussing (or not discussing) the gold-plating
> steps later.

I also prefer fixing bugs (which is why I spent several hours looking
into Basil's crash, when no one else was replying to that bug report),
but this is a community project, so we should discuss first and act
later.  Especially when controversial issues are involved.

> > > We can't get stack marking to work, even theoretically.
> > >
> > > A module is free to do
> > >
> > > emacs_value x = ...;
> > > uintptr_t y = ((uintrptr_t) x) ^ 0x123456u;
> > > (garbage-collect)
> > > emacs_value z = (emacs_value) (y ^ 0x123456u);
> > > ... use z ...
> > >
> > > During the garbage collection, x isn't on the stack anywhere
> >
> > Why do you think x isn't on the stack in this case?
> 
> Because the compiler reused the stack slot for something else?

How can it?  You are using the same pointer later.  Garbage collection
cannot happen unless you call an Emacs function, such as Ffuncall.
Calling a function means that even if the pointer to a Lisp object was
in a register, it will be put on the stack when calling Emacs.

> Because the module is written in a language that doesn't use the stack
> in a way that the garbage collector expects?

Which language is that, and how can it use the emacs-module machinery?

> > Moreover, emacs_value is actually a pointer to a Lisp object, so this
> > object is also somewhere on the stack, right?

No answer?

> We do something very specific with the stack: we make sure that
> Lisp_Objects are never manipulated in a way similar to the above, and
> we use the C language.

If worse comes to worst, we can request module writers to adhere to
the same discipline.  We already request them to do/not to do quite a
few extraordinary things.

> All regression tests still pass after reverting the commit.

Didn't they also pass before?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 20:27:01 GMT) Full text and rfc822 format available.

Message #74 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Philipp Stephani <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
 Daniel Colascione <dancol <at> dancol.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 21:26:25 +0100
Am Do., 21. März 2019 um 21:14 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>
> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > Date: Thu, 21 Mar 2019 21:01:43 +0100
> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
> >       Daniel Colascione <dancol <at> dancol.org>
> >
> > Am Do., 21. März 2019 um 20:50 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
> > >
> > > > From: Philipp Stephani <p.stephani2 <at> gmail.com>
> > > > Date: Thu, 21 Mar 2019 20:37:24 +0100
> > > > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
> > > >
> > > > Let's go back to the known good state first, and then discuss how to
> > > > go from there.
> > >
> > > I don't see why that is better than discuss first and then go to where
> > > we decide to go.  It's not like Emacs 27 will be released any time
> > > soon, so there's no rush.
> >
> > For one, it becomes harder and harder to revert commits the older they
> > get. Also such discussions tend to turn into endless debates about the
> > "perfect" solution until one side gives up, without improving
> > anything. I strongly prefer fixing actual bugs that affect users in
> > practice and then discussing (or not discussing) the gold-plating
> > steps later.
>
> I also prefer fixing bugs (which is why I spent several hours looking
> into Basil's crash, when no one else was replying to that bug report),
> but this is a community project, so we should discuss first and act
> later.  Especially when controversial issues are involved.

Well, as you can see, these discussions seem to lead nowhere, and both
bug#34655 and bug#31238 remain unfixed.

>
> > > > We can't get stack marking to work, even theoretically.
> > > >
> > > > A module is free to do
> > > >
> > > > emacs_value x = ...;
> > > > uintptr_t y = ((uintrptr_t) x) ^ 0x123456u;
> > > > (garbage-collect)
> > > > emacs_value z = (emacs_value) (y ^ 0x123456u);
> > > > ... use z ...
> > > >
> > > > During the garbage collection, x isn't on the stack anywhere
> > >
> > > Why do you think x isn't on the stack in this case?
> >
> > Because the compiler reused the stack slot for something else?
>
> How can it?  You are using the same pointer later.

Assume I don't use x later, and only y is on the stack during GC.

>  Garbage collection
> cannot happen unless you call an Emacs function, such as Ffuncall.
> Calling a function means that even if the pointer to a Lisp object was
> in a register, it will be put on the stack when calling Emacs.

No matter whether y here is in a register or on the stack, it's not a
Lisp_Value, so the GC can't find it.

>
> > Because the module is written in a language that doesn't use the stack
> > in a way that the garbage collector expects?
>
> Which language is that, and how can it use the emacs-module machinery?

Go, for example. It uses green threads with its own stack management
and calling conventions. The GC couldn't ever find such a stack.

>
> > > Moreover, emacs_value is actually a pointer to a Lisp object, so this
> > > object is also somewhere on the stack, right?
>
> No answer?

An emacs_value in the current implementation *is* a Lisp_Object cast
to emacs_value. If the emacs_value is not on the stack, then there's
no way to find the Lisp_Object.

>
> > We do something very specific with the stack: we make sure that
> > Lisp_Objects are never manipulated in a way similar to the above, and
> > we use the C language.
>
> If worse comes to worst, we can request module writers to adhere to
> the same discipline.  We already request them to do/not to do quite a
> few extraordinary things.

No we can't. Modules can contain arbitrary code and call arbitrary
libraries, which we can't ever control. We need to work with
everything that provides a C-compatible interface.

>
> > All regression tests still pass after reverting the commit.
>
> Didn't they also pass before?

Yes, but the reproduction steps in
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=31238 didn't.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 20:46:01 GMT) Full text and rfc822 format available.

Message #77 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org, dancol <at> dancol.org,
 monnier <at> iro.umontreal.ca
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 22:44:58 +0200
> From: Philipp Stephani <p.stephani2 <at> gmail.com>
> Date: Thu, 21 Mar 2019 21:26:25 +0100
> Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org, 
> 	Daniel Colascione <dancol <at> dancol.org>
> 
> > I also prefer fixing bugs (which is why I spent several hours looking
> > into Basil's crash, when no one else was replying to that bug report),
> > but this is a community project, so we should discuss first and act
> > later.  Especially when controversial issues are involved.
> 
> Well, as you can see, these discussions seem to lead nowhere, and both
> bug#34655 and bug#31238 remain unfixed.

We have been talking about this just a few minutes, so please don't
exaggerate.  And bug#34655 could be fixed days ago, it isn't yet
because I wanted to hear your opinion, and you asked me to hold off
the changes.

> > > > Why do you think x isn't on the stack in this case?
> > >
> > > Because the compiler reused the stack slot for something else?
> >
> > How can it?  You are using the same pointer later.
> 
> Assume I don't use x later, and only y is on the stack during GC.

If you don't use x, it can be GC'ed.

> >  Garbage collection
> > cannot happen unless you call an Emacs function, such as Ffuncall.
> > Calling a function means that even if the pointer to a Lisp object was
> > in a register, it will be put on the stack when calling Emacs.
> 
> No matter whether y here is in a register or on the stack, it's not a
> Lisp_Value, so the GC can't find it.

But x is.  And there's the original Lisp object too, somewhere on the
stack.

> > > Because the module is written in a language that doesn't use the stack
> > > in a way that the garbage collector expects?
> >
> > Which language is that, and how can it use the emacs-module machinery?
> 
> Go, for example. It uses green threads with its own stack management
> and calling conventions. The GC couldn't ever find such a stack.

So you are saying that Emacs modules cannot be written in Go?

> > > > Moreover, emacs_value is actually a pointer to a Lisp object, so this
> > > > object is also somewhere on the stack, right?
> >
> > No answer?
> 
> An emacs_value in the current implementation *is* a Lisp_Object cast
> to emacs_value.

Under module-assertions, it's a pointer to a (copy of a) Lisp object.

> > If worse comes to worst, we can request module writers to adhere to
> > the same discipline.  We already request them to do/not to do quite a
> > few extraordinary things.
> 
> No we can't. Modules can contain arbitrary code and call arbitrary
> libraries, which we can't ever control. We need to work with
> everything that provides a C-compatible interface.

Emacs modules are written to work with Emacs, so surely we can request
them to observe certain rules, especially if they don't want to crash
Emacs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 20:49:02 GMT) Full text and rfc822 format available.

Message #80 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Daniel Colascione" <dancol <at> dancol.org>
To: "Philipp Stephani" <p.stephani2 <at> gmail.com>
Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>, Daniel Colascione <dancol <at> dancol.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 13:48:28 -0700
> Am Do., 21. März 2019 um 21:14 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>>
>> > From: Philipp Stephani <p.stephani2 <at> gmail.com>
>> > Date: Thu, 21 Mar 2019 21:01:43 +0100
>> > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L.
>> Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org,
>> >       Daniel Colascione <dancol <at> dancol.org>
>> >
>> > Am Do., 21. März 2019 um 20:50 Uhr schrieb Eli Zaretskii
>> <eliz <at> gnu.org>:
>> > >
>> > > > From: Philipp Stephani <p.stephani2 <at> gmail.com>
>> > > > Date: Thu, 21 Mar 2019 20:37:24 +0100
>> > > > Cc: Stefan Monnier <monnier <at> iro.umontreal.ca>, "Basil L.
>> Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
>> > > >
>> > > > Let's go back to the known good state first, and then discuss how
>> to
>> > > > go from there.
>> > >
>> > > I don't see why that is better than discuss first and then go to
>> where
>> > > we decide to go.  It's not like Emacs 27 will be released any time
>> > > soon, so there's no rush.
>> >
>> > For one, it becomes harder and harder to revert commits the older they
>> > get. Also such discussions tend to turn into endless debates about the
>> > "perfect" solution until one side gives up, without improving
>> > anything. I strongly prefer fixing actual bugs that affect users in
>> > practice and then discussing (or not discussing) the gold-plating
>> > steps later.
>>
>> I also prefer fixing bugs (which is why I spent several hours looking
>> into Basil's crash, when no one else was replying to that bug report),
>> but this is a community project, so we should discuss first and act
>> later.  Especially when controversial issues are involved.
>
> Well, as you can see, these discussions seem to lead nowhere, and both
> bug#34655 and bug#31238 remain unfixed.
>
>>
>> > > > We can't get stack marking to work, even theoretically.
>> > > >
>> > > > A module is free to do
>> > > >
>> > > > emacs_value x = ...;
>> > > > uintptr_t y = ((uintrptr_t) x) ^ 0x123456u;
>> > > > (garbage-collect)
>> > > > emacs_value z = (emacs_value) (y ^ 0x123456u);
>> > > > ... use z ...
>> > > >
>> > > > During the garbage collection, x isn't on the stack anywhere
>> > >
>> > > Why do you think x isn't on the stack in this case?
>> >
>> > Because the compiler reused the stack slot for something else?
>>
>> How can it?  You are using the same pointer later.
>
> Assume I don't use x later, and only y is on the stack during GC.
>
>>  Garbage collection
>> cannot happen unless you call an Emacs function, such as Ffuncall.
>> Calling a function means that even if the pointer to a Lisp object was
>> in a register, it will be put on the stack when calling Emacs.
>
> No matter whether y here is in a register or on the stack, it's not a
> Lisp_Value, so the GC can't find it.
>
>>
>> > Because the module is written in a language that doesn't use the stack
>> > in a way that the garbage collector expects?
>>
>> Which language is that, and how can it use the emacs-module machinery?
>
> Go, for example. It uses green threads with its own stack management
> and calling conventions. The GC couldn't ever find such a stack.
>
>>
>> > > Moreover, emacs_value is actually a pointer to a Lisp object, so
>> this
>> > > object is also somewhere on the stack, right?
>>
>> No answer?
>
> An emacs_value in the current implementation *is* a Lisp_Object cast
> to emacs_value. If the emacs_value is not on the stack, then there's
> no way to find the Lisp_Object.
>
>>
>> > We do something very specific with the stack: we make sure that
>> > Lisp_Objects are never manipulated in a way similar to the above, and
>> > we use the C language.
>>
>> If worse comes to worst, we can request module writers to adhere to
>> the same discipline.  We already request them to do/not to do quite a
>> few extraordinary things.
>
> No we can't. Modules can contain arbitrary code and call arbitrary
> libraries, which we can't ever control. We need to work with
> everything that provides a C-compatible interface.

Modules can contain arbitrary code, but they don't have to do arbitrary
things with that code. Right now, the contract between modules and Emacs
is something like "any value that, I, Emacs, can't find on an
Emacs-findable thread is fair game for memory reclaimation." In practice,
that works okay most of the time, but if we have to deal with environments
that can't guarantee that Emacs values remain on the C stack, we can
extend the contract with something like "I, module, am handing you, Emacs,
an array of emacs_values, and in addition to my stack, you should check
this array before considering a value dead" --- that is, we could just
provide a way for a module to associate a bunch of additional GC roots
with a given context. Then something like Go could stick any temporary
Emacs values in this array.

Or we could just have these environments create permanent references.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 21:30:02 GMT) Full text and rfc822 format available.

Message #83 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: 34655 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 21:29:14 +0000
Philipp Stephani <p.stephani2 <at> gmail.com> writes:

> Am Do., 21. März 2019 um 19:28 Uhr schrieb Philipp Stephani
> <p.stephani2 <at> gmail.com>:
>>
>> Am Do., 21. März 2019 um 18:00 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>> >
>> > > From: Philipp Stephani <p.stephani2 <at> gmail.com>
>> > > Date: Thu, 21 Mar 2019 17:11:41 +0100
>> > > Cc: "Basil L. Contovounesios" <contovob <at> tcd.ie>, 34655 <at> debbugs.gnu.org
>> > >
>> > > I haven't checked everything in detail, but my impression is that this
>> > > is rather another instance of bug#31238. Fixing this only when module
>> > > assertions are enabled will probably not fix anything, but rather mask
>> > > issues. Reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a is
>> > > still the right approach here. Can you please hold off a bit? I've
>> > > almost completed the revert, but haven't pushed it yet. Once that's in
>> > > we can check whether it also fixes this issue.
>> >
>> > I will CC Stefan, who committed 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a.
>> >
>> > I'm not sure we should revert that; we could instead add GC protection
>> > for those parts that need it.
>>
>> Yes, that's what reverting that commit does :-)
>> We need to mark the objects in all cases, not just when module
>> assertions are enabled.
>> Note that both the designer of the module API (Daniel) and I as one of
>> its main implementers disagree with commit
>> 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a. I'm happy to discuss
>> alternatives, but for now we should revert it and discuss the
>> alternatives *before* implementing them. I've already confirmed that
>> reverting commit 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a fixes
>> bug#31238, and I can try it with this bug as well.
>
> I wasn't able to reproduce bug#34655 myself (these things tend to be
> rather flaky), but I've now reverted commit
> 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a, and at least bug#31238 is
> now consistently fixed (for me at least). Basil, can you check whether
> you can still reproduce bug#34655 with the current master?

FWIW, I cannot reproduce bug#34655 after reverting Stefan's commit.

Thanks,

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Thu, 21 Mar 2019 21:32:02 GMT) Full text and rfc822 format available.

Message #86 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
To: Philipp Stephani <p.stephani2 <at> gmail.com>
Cc: 34655 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 21:31:23 +0000
Philipp Stephani <p.stephani2 <at> gmail.com> writes:

> Am Do., 21. März 2019 um 20:27 Uhr schrieb Eli Zaretskii <eliz <at> gnu.org>:
>>
>> > I've already confirmed that reverting commit
>> > 3eb93c07f7a60ac9ce8a16f10c3afd5a3a31243a fixes bug#31238, and I can
>> > try it with this bug as well.
>>
>> Please do, it's important to know that, I think.
>
> Basil, could you check that with the revert your code now works? Thanks!

The revert indeed makes bug#34655 go away.

Thanks,

-- 
Basil




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Fri, 22 Mar 2019 00:57:02 GMT) Full text and rfc822 format available.

Message #89 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org,
 Philipp Stephani <p.stephani2 <at> gmail.com>
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Thu, 21 Mar 2019 20:56:17 -0400
> OK, but I think Stefan's opinion is not less important.

I think the module API is already so completely different from what I'd
like it to be that it's OK to revert my change here.


        Stefan




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Fri, 22 Mar 2019 07:12:01 GMT) Full text and rfc822 format available.

Message #92 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Basil L. Contovounesios" <contovob <at> tcd.ie>
Cc: 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com, monnier <at> iro.umontreal.ca
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Fri, 22 Mar 2019 10:11:13 +0300
> From: "Basil L. Contovounesios" <contovob <at> tcd.ie>
> Cc: Eli Zaretskii <eliz <at> gnu.org>,  Stefan Monnier <monnier <at> iro.umontreal.ca>,  34655 <at> debbugs.gnu.org
> Date: Thu, 21 Mar 2019 21:29:14 +0000
> 
> FWIW, I cannot reproduce bug#34655 after reverting Stefan's commit.

Great, thanks for testing.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Fri, 22 Mar 2019 08:17:02 GMT) Full text and rfc822 format available.

Message #95 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Fri, 22 Mar 2019 11:16:28 +0300
merge 31238 34655
close 34655
thanks

> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Philipp Stephani <p.stephani2 <at> gmail.com>,  contovob <at> tcd.ie,  34655 <at> debbugs.gnu.org
> Date: Thu, 21 Mar 2019 20:56:17 -0400
> 
> > OK, but I think Stefan's opinion is not less important.
> 
> I think the module API is already so completely different from what I'd
> like it to be that it's OK to revert my change here.

OK, so I reverted my revert, and I'm marking this bug done.

Thanks.




Merged 31238 34655. Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 22 Mar 2019 08:17:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 34655 <at> debbugs.gnu.org and "Basil L. Contovounesios" <contovob <at> tcd.ie> Request was from Eli Zaretskii <eliz <at> gnu.org> to control <at> debbugs.gnu.org. (Fri, 22 Mar 2019 08:17:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#34655; Package emacs. (Fri, 22 Mar 2019 08:19:01 GMT) Full text and rfc822 format available.

Message #102 received at 34655 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: "Daniel Colascione" <dancol <at> dancol.org>
Cc: contovob <at> tcd.ie, 34655 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com,
 dancol <at> dancol.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#34655: 26.1.92; Segfault in module with --module-assertions
Date: Fri, 22 Mar 2019 11:17:56 +0300
> Date: Thu, 21 Mar 2019 13:48:28 -0700
> From: "Daniel Colascione" <dancol <at> dancol.org>
> Cc: "Eli Zaretskii" <eliz <at> gnu.org>,
>  "Stefan Monnier" <monnier <at> iro.umontreal.ca>,
>  "Basil L. Contovounesios" <contovob <at> tcd.ie>,
>  34655 <at> debbugs.gnu.org,
>  "Daniel Colascione" <dancol <at> dancol.org>
> 
> > No we can't. Modules can contain arbitrary code and call arbitrary
> > libraries, which we can't ever control. We need to work with
> > everything that provides a C-compatible interface.
> 
> Modules can contain arbitrary code, but they don't have to do arbitrary
> things with that code. Right now, the contract between modules and Emacs
> is something like "any value that, I, Emacs, can't find on an
> Emacs-findable thread is fair game for memory reclaimation." In practice,
> that works okay most of the time, but if we have to deal with environments
> that can't guarantee that Emacs values remain on the C stack, we can
> extend the contract with something like "I, module, am handing you, Emacs,
> an array of emacs_values, and in addition to my stack, you should check
> this array before considering a value dead" --- that is, we could just
> provide a way for a module to associate a bunch of additional GC roots
> with a given context. Then something like Go could stick any temporary
> Emacs values in this array.
> 
> Or we could just have these environments create permanent references.

OK, thanks.

What are your opinions regarding usability of stack marking for
emacs_value variables used by modules?




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Fri, 19 Apr 2019 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 9 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.