GNU bug report logs - #12471
Avoid some signal-handling races, and simplify.

Package: emacs;

Reported by: Paul Eggert <eggert <at> cs.ucla.edu>

Date: Tue, 18 Sep 2012 23:42:02 UTC

Severity: normal

Tags: patch

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 12471 in the body.
You can then email your comments to 12471 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Tue, 18 Sep 2012 23:42:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Paul Eggert <eggert <at> cs.ucla.edu>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 18 Sep 2012 23:42:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: bug-gnu-emacs <at> gnu.org
Cc: Juanma Barranquero <lekktu <at> gmail.com>, Eli Zaretskii <eliz <at> gnu.org>
Subject: Avoid some signal-handling races, and simplify.
Date: Tue, 18 Sep 2012 16:39:18 -0700

[Message part 1 (text/plain, inline)]

Tags: patch

I looked through emacs for race conditions involving signals and found
some problems, including:

 * Signal handlers can interrupt each other, leading to races.

 * Signals can be mishandled if they arrive right in the middle of
   code that is keeping track of whether signals have arrived.

 * Some variables are modified by signal handlers but are not
   declared 'volatile', which means accesses to them could be
   incorrectly optimized.

 * When debugging, the debugging code can get into an infinite
   signal-handling loop if there's a bug in the fatal error handler.

 * There are some bugs involving running out of memory in the middle
   of a vfork, in which signal handlers aren't restored correctly and
   Emacs is likely to misbehave.

 * Signals are always redirected to the main thread, resulting in
   incorrect backtraces when, for example, a subsidiary thread has
   a segmentation violation.  Thread-specific signals like SIGSEGV
   should have thread-specific backtraces.

 * When in batch mode, Emacs doesn't ignore SIGINT if the invoker has
   purposely ignored SIGINT.  Similarly for SIGTERM.  Emacs gets
   SIGHUP right, but the other two signals should be consistent
   with the usual behavior for batch programs.

 * Emacs isn't consistent about what tests it uses to decide whether
   it is in batch mode, leading to glitches when the tests disagree.

 * Emacs catches SIGPIPE, but it's better to ignore it, as this avoids
   races.

 * Emacs catches SIGFPE, but on IEEE hosts catching SIGFPE isn't
   needed and can mask bugs; it's better to catch SIGFPE only on (the
   now quite rare) non-IEEE hosts.

Attached is a patch to fix some of the problems that I found, and to
simplify nearby signal-handling code.  I'd like to install this into
the trunk before it freezes.  The patch is about 3300 lines so I've
taken the liberty of compressing it with gzip.

This patch affects a few bits of the Windows code; I haven't tested
that part.  I'm CC:ing this to Eli and Juanma to give them a heads-up.

[syssignap.txt.gz (application/x-gzip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Wed, 19 Sep 2012 16:21:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: lekktu <at> gmail.com, bug-gnu-emacs <at> gnu.org
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Wed, 19 Sep 2012 19:18:41 +0300

> Date: Tue, 18 Sep 2012 16:39:18 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: Eli Zaretskii <eliz <at> gnu.org>, Juanma Barranquero <lekktu <at> gmail.com>
> 
> Attached is a patch to fix some of the problems that I found, and to
> simplify nearby signal-handling code.

Thanks.  I have a few questions/comments:

  -#define UNBLOCK_INPUT 				\
  -  do						\
  -    {						\
  -      --interrupt_input_blocked;		\
  -      if (interrupt_input_blocked == 0)		\
  -	{					\
  -	  if (interrupt_input_pending)		\
  -	    reinvoke_input_signal ();		\
  -	  if (pending_atimers)			\
  -	    do_pending_atimers ();		\
  -	}					\
  -      else if (interrupt_input_blocked < 0)	\
  -	emacs_abort ();				\
  -    }						\
  -  while (0)
  +extern void unblock_input (void);

Why is it a good idea to replace this macro with a function?  The
macro is used quite frequently in the code; replacing it with a
function might slow down Emacs, and for what good reason?

  +extern void UNBLOCK_INPUT_TO (int);

A function whose name is in CAPS?  Why?
 
   /* Report a fatal error due to signal SIG, output a backtrace of at
      most BACKTRACE_LIMIT lines, and exit.  */
   _Noreturn void
  -fatal_error_backtrace (int sig, int backtrace_limit)
  +terminate_due_to_signal (int sig, int backtrace_limit)
   {
     fatal_error_code = sig;
  -  signal (sig, SIG_DFL);

     TOTALLY_UNBLOCK_INPUT;

  @@ -316,9 +302,8 @@
       }

     /* Signal the same code; this time it will really be fatal.
  -     Remember that since we're in a signal handler, the signal we're
  -     going to send is probably blocked, so we have to unblock it if we
  -     want to really receive it.  */
  +     Since we're in a signal handler, the signal is blocked, so we
  +     have to unblock it if we want to really receive it.  */
   #ifndef MSDOS
     {
       sigset_t unblocked;
  @@ -328,7 +313,7 @@
     }
   #endif

  -  kill (getpid (), fatal_error_code);
  +  raise (fatal_error_code);

If there are good reasons to prefer 'raise' to 'kill' in this context,
can we please special-case SIGABRT here and call 'emacs_abort'
directly, at least for hosts that cannot produce the backtrace?
Otherwise, assertion violations will not end up in 'emacs_abort',
AFAICS.
 
   static void
   deliver_interrupt_signal (int sig)
   {
  -  handle_on_main_thread (sig, handle_interrupt_signal);
  +  deliver_process_signal (sig, handle_interrupt_signal);
   }

Do we really need this double indirection?  Why not call
deliver_process_signal directly (here and elsewhere)?

  -#ifdef SIGTERM
	 parse_signal ("term", SIGTERM);
  -#endif

I don't understand why did you remove the conditional.  This won't
compile if SIGTERM isn't defined.  Did you assume that it is always
available?

  +  if (! IEEE_FLOATING_POINT)
  +    sigaddset (&action->sa_mask, SIGFPE);

Why is IEEE_FLOATING_POINT, which detects the representation of FP
numbers, related to SIGFPE?

  -# ifdef SIGTERM
	 sys_siglist[SIGTERM] = "Terminated";
  -# endif

This won't compile if SIGTERM is not defined.

  +  maybe_fatal_sig (SIGTERM);

Same here.

  +  sigaction (SIGTERM, &process_fatal_action, 0);

And here.

  +  if (! IEEE_FLOATING_POINT)
  +    {
  +      emacs_sigaction_init (&action, deliver_arith_signal);
  +      sigaction (SIGFPE, &action, 0);
  +    }

See above: why?

  -  return ret;
  +  return nev;

This should be in a separate commit, as it is unrelated.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Wed, 19 Sep 2012 16:48:01 GMT) Full text and rfc822 format available.

Message #11 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, Juanma Barranquero <lekktu <at> gmail.com>
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Wed, 19 Sep 2012 18:45:39 +0200

Hello.

19 sep 2012 kl. 01:39 skrev Paul Eggert <eggert <at> cs.ucla.edu>:

> * Signals are always redirected to the main thread, resulting in
>   incorrect backtraces when, for example, a subsidiary thread has
>   a segmentation violation.  Thread-specific signals like SIGSEGV
>   should have thread-specific backtraces.

and

+       * atimer.c (deliver_alarm_signal):
+       Remove.  No need to deliver this to the parent; any thread can
+       handle this signal now.  All uses replaced by underlying handler.

and

-static void
-deliver_alarm_signal (int sig)
-{
-  handle_on_main_thread (sig, handle_alarm_signal);
-}
-
-
-/* Call alarm signal handler for pending timers.  */

is totally wrong.  Thread sarted by Gnome/gtk+ plugins can not handle SIGALRM, so Emacs will terminate.   This is the reason for this in the first place. Don't install this and other stuff that removes redirecting signals to the main thread.

	Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Wed, 19 Sep 2012 20:00:01 GMT) Full text and rfc822 format available.

Message #14 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: 12471 <at> debbugs.gnu.org, Juanma Barranquero <lekktu <at> gmail.com>
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Wed, 19 Sep 2012 12:58:23 -0700

On 09/19/2012 09:45 AM, Jan Djärv wrote:> Hello.

> Thread sarted by Gnome/gtk+ plugins can not handle SIGALRM,
> so Emacs will terminate.

Thanks for looking at the patch.  Emacs doesn't terminate for me
(Fedora 17, GTK3), but perhaps that's because the problem is
specific to non-GNU/Linux platforms.  So could you please
explain the issue a bit more?

Here are some more details to help explain that part of the
proposed change.  In the Emacs trunk, a thread started by
those plugins can already get SIGALRM.  If it does, the
Emacs-supplied signal handler masks out SIGALRM in the
thread, resignals the process with SIGALRM so that some other
thread will get the signal next time, and then exits.  The
thread then resumes doing whatever it was doing, and the
main thread eventually gets signaled by SIGALRM.  So each
Gnome/gtk+ plugin thread might get signaled by SIGALRM,
altough it should handle that signal at most once.

Under the patch, a thread may handle SIGALRM more than once.
Each time it does so, it does something *very* simple (it
sets pending_signals to 1).  This shouldn't disturb what's
happening in the plugin thread, since the plugin thread
shouldn't be looking at pending_signals.

I've looked at the existing code, and tracked it back to
Emacs trunk bzr 58781 dated 2004-12-07, but couldn't find
any discussion of what the exact problem was.  Back then,
alarm timers could run lots of fancy code so it made sense
to insist that they run only in the main thread.  But
nowadays this should no longer be necessary.

The patch could be altered so that it retains the
signal-mask dance in the Gnome/gtk+ threads, but why bother?
Fidding with signal masks in a signal handler is
error-prone, because there are race conditions if the
interrupted code is itself fiddling with signal masks.  It's
OK to do this with handlers of fatal signals, because the
interrupted code won't be returned to, but for non-fatal
signals like SIGALRM it's problematic, and it's better to
avoid it if we don't need it.

If the above explanation doesn't suffice, could you please
give a recipe for reproducing the problem, or point me at a
bug report?  Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Wed, 19 Sep 2012 21:39:02 GMT) Full text and rfc822 format available.

Message #17 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Wed, 19 Sep 2012 22:36:44 +0100

On Wed 19 Sep 2012, Paul Eggert wrote:

> Attached is a patch to fix some of the problems that I found, and to
> simplify nearby signal-handling code.  I'd like to install this into
> the trunk before it freezes.  The patch is about 3300 lines so I've
> taken the liberty of compressing it with gzip.

I'm not qualified to comment on the subtleties of signal handling
addressed by this patch, but it does seem that several issues have been
conflated together.

It would surely be easier to review and to bisect for bug hunting if it
were split into several smaller patches.

    AndyM

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Thu, 20 Sep 2012 06:10:01 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: lekktu <at> gmail.com, bug-gnu-emacs <at> gnu.org
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Wed, 19 Sep 2012 23:07:21 -0700

[Message part 1 (text/plain, inline)]

On 09/19/2012 09:18 AM, Eli Zaretskii wrote:

> Why is it a good idea to replace this macro with a function?

The function makes it easier to control access to volatile variables,
that is, to access them in a minimal number of times to lessen the
number of race conditions.   This requires a local variable.  This
can also be done with macros, but functions are nicer.

An advantage of using a function, is that it shrinks the code size
(Emacs's text size is about 1.3% smaller with the patch, not all due
to this part of the change).

> The macro is used quite frequently in the code; replacing
> it with a function might slow down Emacs

Thanks for mentioning that.  I benchmarked the patch, and found out
that one of my changes (to the QUIT macro) slowed things down
significantly.  I fixed that.  The patched Emacs now runs slightly
faster than the original on my benchmark.

> A function whose name is in CAPS?  Why?

I was lazy and left the callers alone.  That's easy to fix;
revised patch attached.  It renames BLOCK_INPUT_TO to block_input_to,
etc.

>   -  kill (getpid (), fatal_error_code);
>   +  raise (fatal_error_code);
>
> If there are good reasons to prefer 'raise' to 'kill' in this context,

On POSIX hosts, the change is helpful because it fixes some
race conditions.  raise (SIGABRT) is equivalent to
pthread_kill (pthread_self (), SIGABRT), and we do want the
signal to be delivered to the main thread rather than to
some other thread (which 'kill' might do).

> can we please special-case SIGABRT here and call 'emacs_abort'
> directly, at least for hosts that cannot produce the backtrace?
> Otherwise, assertion violations will not end up in 'emacs_abort',
> AFAICS.

Sorry, I'm not following.  By "assertion violations" do you
mean eassert failure?  If so, currently:

  eassert fails.
  'die' prints "Emacs fatal error" with some details.
  fatal_error_backtrace resets SIGABRT to SIG_DFL
  fatal_error_backtrace outputs a backtrace if available
  fatal_error_backtrace unblocks SIGABRT
  fatal_error_backtrace uses 'kill' with SIGABRT to terminate Emacs

so emacs_abort is never called.  Under the proposed patch, the
actions do basically the same thing:

  eassert fails
  'die' resets SIGABRT to SIG_DFL
  'die' prints "Emacs fatal error" with some details.
  terminate_due_to_signal resets SIGABRT to SIG_DFL
  terminate_due_to_signal outputs a backtrace if available
  terminate_due_to_signal unblocks SIGABRT
  terminate_due_to_signal uses 'raise' with SIGABRT to terminate Emacs

emacs_abort is not called in either the old code or the new.

>   -  handle_on_main_thread (sig, handle_interrupt_signal);
>   +  deliver_process_signal (sig, handle_interrupt_signal);
>    }
>
> Do we really need this double indirection?

Yes, because Gnome/gtk+ threads shouldn't execute handle_interrupt_signal
at the same time that the Emacs main thread is running.  That signal
handler doesn't do an atomic action, and if its actions are done in
parallel with the Emacs main thread, bad things could happen.
There's commentary to this effect in deliver_process_signal.

>   -#ifdef SIGTERM
> 	 parse_signal ("term", SIGTERM);
>   -#endif
>
> I don't understand why did you remove the conditional.  This won't
> compile if SIGTERM isn't defined.

Emacs has been assuming C89 for a while, and C89 specifies SIGABRT,
SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM.  These days any C implementation
that has <signal.h> should have these signals defined to something.

> Why is IEEE_FLOATING_POINT, which detects the representation of FP
> numbers, related to SIGFPE?

Most hosts nowadays use IEEE floating point, so they do not trap on
exceptions and don't need an exception handler.  The exception handler
is problematic (it longjmps out of a signal handler, which leads
to some real problems) and should be omitted unless it's really
needed.  I'll add a comment along those lines.

>   -  return ret;
>   +  return nev;
>
> This should be in a separate commit, as it is unrelated.

OK, thanks, I did that in trunk bzr 110104 and 110106.  It still seems
to me that the windows socket hook isn't returning the proper positive
value when it should, but evidently the code's been working with it
returning 0 so I'm not sure how high a priority it is to look into
that.

Thanks for the review.  Revised patch (with above comments taken into
account) attached.  This patch is relative to trunk bzr 110110.

[syssignaq.txt.gz (application/x-gzip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Thu, 20 Sep 2012 06:29:01 GMT) Full text and rfc822 format available.

Message #23 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, Juanma Barranquero <lekktu <at> gmail.com>
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Thu, 20 Sep 2012 08:27:09 +0200

19 sep 2012 kl. 21:58 skrev Paul Eggert <eggert <at> cs.ucla.edu>:

> On 09/19/2012 09:45 AM, Jan Djärv wrote:> Hello.
> 
>> Thread sarted by Gnome/gtk+ plugins can not handle SIGALRM,
>> so Emacs will terminate.
> 
> Thanks for looking at the patch.  Emacs doesn't terminate for me
> (Fedora 17, GTK3), but perhaps that's because the problem is
> specific to non-GNU/Linux platforms.  So could you please
> explain the issue a bit more?

I think this is the starting point:
http://lists.gnu.org/archive/html/emacs-devel/2005-02/msg01142.html

I found some other relevant threads that show how the code evolved:

http://lists.gnu.org/archive/html/emacs-pretest-bug/2006-08/msg00005.html
http://lists.gnu.org/archive/html/emacs-devel/2006-08/msg00633.html
http://lists.gnu.org/archive/html/bug-gnu-emacs/2007-06/msg00051.html

Basically other threads are created by some libraries (DBus, Gtk+ file dialog plugins).
It is generally undefined to which thread a signal get delivered, and some operations are only safe to do in the main thread (i.e. malloc).

> 
> Here are some more details to help explain that part of the
> proposed change.  In the Emacs trunk, a thread started by
> those plugins can already get SIGALRM.  If it does, the
> Emacs-supplied signal handler masks out SIGALRM in the
> thread, resignals the process with SIGALRM so that some other
> thread will get the signal next time, and then exits.  The
> thread then resumes doing whatever it was doing, and the
> main thread eventually gets signaled by SIGALRM.  So each
> Gnome/gtk+ plugin thread might get signaled by SIGALRM,
> altough it should handle that signal at most once.
> 
> Under the patch, a thread may handle SIGALRM more than once.
> Each time it does so, it does something *very* simple (it
> sets pending_signals to 1).  This shouldn't disturb what's
> happening in the plugin thread, since the plugin thread
> shouldn't be looking at pending_signals.

Ok, I missed that part, if we only do simple stuff, it will be ok.  But previously a lot of stuff happend in the signal handler.  If we can remove those cases, all is well.

	Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Thu, 20 Sep 2012 17:47:02 GMT) Full text and rfc822 format available.

Message #26 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Thu, 20 Sep 2012 20:44:51 +0300

> Date: Wed, 19 Sep 2012 23:07:21 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: bug-gnu-emacs <at> gnu.org, lekktu <at> gmail.com
> 
> On 09/19/2012 09:18 AM, Eli Zaretskii wrote:
> 
> > Why is it a good idea to replace this macro with a function?
> 
> The function makes it easier to control access to volatile variables,
> that is, to access them in a minimal number of times to lessen the
> number of race conditions.   This requires a local variable.

I don't understand.  The only volatile variable that is involved is
interrupt_input_blocked, which is a global variable.  How does a
function help in this regard?

> >   -  kill (getpid (), fatal_error_code);
> >   +  raise (fatal_error_code);
> >
> > If there are good reasons to prefer 'raise' to 'kill' in this context,
> 
> On POSIX hosts, the change is helpful because it fixes some
> race conditions.  raise (SIGABRT) is equivalent to
> pthread_kill (pthread_self (), SIGABRT), and we do want the
> signal to be delivered to the main thread rather than to
> some other thread (which 'kill' might do).

In that case, can we use 'pthread_kill' as you show above, instead of
'raise'?  I'd like to avoid redefining 'raise' on Windows; see below.

> > can we please special-case SIGABRT here and call 'emacs_abort'
> > directly, at least for hosts that cannot produce the backtrace?
> > Otherwise, assertion violations will not end up in 'emacs_abort',
> > AFAICS.
> 
> Sorry, I'm not following.  By "assertion violations" do you
> mean eassert failure?  If so, currently:
> 
>   eassert fails.
>   'die' prints "Emacs fatal error" with some details.
>   fatal_error_backtrace resets SIGABRT to SIG_DFL
>   fatal_error_backtrace outputs a backtrace if available
>   fatal_error_backtrace unblocks SIGABRT
>   fatal_error_backtrace uses 'kill' with SIGABRT to terminate Emacs
> 
> so emacs_abort is never called.  Under the proposed patch, the
> actions do basically the same thing:
> 
>   eassert fails
>   'die' resets SIGABRT to SIG_DFL
>   'die' prints "Emacs fatal error" with some details.
>   terminate_due_to_signal resets SIGABRT to SIG_DFL
>   terminate_due_to_signal outputs a backtrace if available
>   terminate_due_to_signal unblocks SIGABRT
>   terminate_due_to_signal uses 'raise' with SIGABRT to terminate Emacs
> 
> emacs_abort is not called in either the old code or the new.

It does, on Windows.  (I thought other platforms shared that, but
perhaps I was wrong.)  On Windows, 'kill' is redefined to call
'emacs_abort', when its first arg is Emacs's own PID and the second
arg is SIGABRT.

(Originally, 'die' called 'abort', which on Windows was redirected to
'w32_abort'.  Then calls to 'abort' were changed to call
'emacs_abort', and then 'w32_abort' was renamed to 'emacs_abort'.  But
then, when you made 'die' call fatal_error_backtrace, that made Emacs
on Windows exit silently upon assertion violations.  So I changed
'sys_kill' to call 'emacs_abort' on Windows.)

If you leave 'raise' there, it will invoke 'abort' from the system
library on Windows, which is not what we want.  And I don't want to
redefine yet another library function, if it can be avoided.

Incidentally, I think this feature of producing a backtrace on a fatal
error will be hated by both users and Emacs maintainers, but that's
another subject for another thread.

> >   -  handle_on_main_thread (sig, handle_interrupt_signal);
> >   +  deliver_process_signal (sig, handle_interrupt_signal);
> >    }
> >
> > Do we really need this double indirection?
> 
> Yes, because Gnome/gtk+ threads shouldn't execute handle_interrupt_signal
> at the same time that the Emacs main thread is running.  That signal
> handler doesn't do an atomic action, and if its actions are done in
> parallel with the Emacs main thread, bad things could happen.
> There's commentary to this effect in deliver_process_signal.

We are miscommunicating.  I meant why we need a bunch of functions,
called deliver_FOO_signal, whose body is a single line that calls
handle_on_main_thread (now deliver_process_signal, which is even a
more confusing name, IMO -- the previous one at least hinted at what
it did).  Then handle_on_main_thread does some thread-related magic
and/or calls the _real_ handler for the signal.  Thus "double
indirection": instead of calling the handler, we call a function that
calls another function, which calls the real handler.  (And btw, some
of these handlers are also one-liners: they call
fatal_error_backtrace, now renamed to terminate_due_to_signal, yet
another function.  Can you say "triple indirection"?)

This makes the code hard to understand, especially since these
functions are spread all over in several different source files.

Can we do better than that?  Ideally, there should be a single
function, the signal handler, which does everything.  Then a given
signal will have just two places to look up: the place where you call
sigaction to set up the handler, and another one where the handler is
implemented.

As for the FORWARD_SIGNAL_TO_MAIN_THREAD stuff, you could make each
signal handler invoke a function that does just this part.

Or maybe it would work to set up things so that no non-main threads
ever get any signals -- is there some pthread_* function that can
setup a callback to run each time a thread is created? if so, perhaps
we could use such a callback to block signals in each thread.

If any of that can be done, I think the signal-related code will
really become much easier to study, understand, and maintain.

> > Why is IEEE_FLOATING_POINT, which detects the representation of FP
> > numbers, related to SIGFPE?
> 
> Most hosts nowadays use IEEE floating point, so they do not trap on
> exceptions and don't need an exception handler.

Well, SIGFPE could happen for reasons entirely unrelated to math
calculations and functions.  A hardware problem, for example.

> The exception handler is problematic (it longjmps out of a signal
> handler, which leads to some real problems) and should be omitted
> unless it's really needed.

Given that SIGFPE will not happen due to calculations, how about
changing its handler to call fatal_error_backtrace, rather than signal
a Lisp-level error?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Fri, 21 Sep 2012 07:45:01 GMT) Full text and rfc822 format available.

Message #29 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Fri, 21 Sep 2012 00:42:45 -0700

On 09/20/2012 10:44 AM, Eli Zaretskii wrote:
> The only volatile variable that is involved is
> interrupt_input_blocked, which is a global variable.  How does a
> function help in this regard?

UNBLOCK_INPUT (the macro in the current trunk) accesses that variable
four times in the usual case.  unblock_input (the function in the
proposed patch) accesses it just twice.  This lessens the number of
race conditions and makes the code run a bit faster.  (It doesn't
eliminate the races, but that's OK, one bug fix at a time.)

unblock_input does fewer accesses by using a local variable 'level'.
Local variables like that are cleaner when they're in functions.  One
can put them into macros too, but then one has the problem of capture.
It's a relatively minor thing; it could still be done with a macro.
But since the whole thing basically needed to be rewritten anyway, I
figured we might as well do it more cleanly, with a fucntion.

> can we use 'pthread_kill' as you show above, instead of 'raise'?

That would not work on older POSIXish hosts that lack pthread_kill.
In contrast, 'raise' should be portable to all C89-or-better hosts.

> On Windows, 'kill' is redefined to call 'emacs_abort', when its
> first arg is Emacs's own PID and the second arg is SIGABRT.

For consistency, how about if Windows also defines 'raise' to call
'emacs_abort', when the signal is SIGABRT?  That should fix the
problem, and it shouldn't require any change to the mainline code.  It
could be something as simple as this in a Windows-specific include
file, perhaps:

	#define raise(sig) kill (getpid (), sig)

> And I don't want to redefine yet another library function, if it can
> be avoided.

But it does seem the simpler approach here, overall.

> I think this feature of producing a backtrace on a fatal
> error will be hated by both users and Emacs maintainers,
> but that's another subject for another thread.

> Can we do better than that?  Ideally, there should be a single
> function, the signal handler, which does everything.

I tried that initially but the code was bulkier and more error-prone.
Each handler needed to have an extra local variable (to save errno)
and to call two auxiliary-library functions, one for setup and the
other for teardown.  In contrast, the proposed approach requires no
additional locals and one auxiliary-library function.  It's true that
it requires two functions for each per-process signal, one for the
thread catching the signal and one for the main process, but I find
that actually simpler, since it makes it clearer which handler code is
run where.  Anyway, pointers to signal handlers are standard practice
in POSIXish code, and it's not particularly exotic to pass such a
pointer around or to call the handler via the pointer.

> Or maybe it would work to set up things so that no non-main threads
> ever get any signals

Yes, that was my first thought too!  I'd rather do that; it'd be simpler.

> is there some pthread_* function that can
> setup a callback to run each time a thread is created?

Unfortunately not, which is why I settled with the proposed approach

> Given that SIGFPE will not happen due to calculations, how about
> changing its handler to call fatal_error_backtrace, rather than signal
> a Lisp-level error?

Yes, that makes sense, and it's what the proposed patch already
does on IEEE hosts.  But I see that the code isn't very clear
about that.  I'll replace it with the following, which I hope
makes it clearer.

  /* Typically SIGFPE is thread-specific and is fatal, like SIGILL.
     But on a non-IEEE host SIGFPE can come from a trap in the Lisp
     interpreter's floating point operations, so treat SIGFPE as an
     arith-error if it arises in the main thread.  */
  if (IEEE_FLOATING_POINT)
    sigaction (SIGFPE, &thread_fatal_action, 0);
  else
    {
      emacs_sigaction_init (&action, deliver_arith_signal);
      sigaction (SIGFPE, &action, 0);
    }

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Fri, 21 Sep 2012 08:34:02 GMT) Full text and rfc822 format available.

Message #32 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Fri, 21 Sep 2012 11:31:25 +0300

> Date: Fri, 21 Sep 2012 00:42:45 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
> 
> > On Windows, 'kill' is redefined to call 'emacs_abort', when its
> > first arg is Emacs's own PID and the second arg is SIGABRT.
> 
> For consistency, how about if Windows also defines 'raise' to call
> 'emacs_abort', when the signal is SIGABRT?  That should fix the
> problem, and it shouldn't require any change to the mainline code.  It
> could be something as simple as this in a Windows-specific include
> file, perhaps:
> 
> 	#define raise(sig) kill (getpid (), sig)

I don't want to do that.  I guess I'd special-case WINDOWSNT then, if
we cannot do better.

> > Can we do better than that?  Ideally, there should be a single
> > function, the signal handler, which does everything.
> 
> I tried that initially but the code was bulkier and more error-prone.
> Each handler needed to have an extra local variable (to save errno)
> and to call two auxiliary-library functions, one for setup and the
> other for teardown.  In contrast, the proposed approach requires no
> additional locals and one auxiliary-library function.  It's true that
> it requires two functions for each per-process signal, one for the
> thread catching the signal and one for the main process, but I find
> that actually simpler, since it makes it clearer which handler code is
> run where.

To me, the code looks overly complicated.  By the time you get to the
meat, you have already done 2 or 3 hops, which makes tracing
processing of a specific signal unduly complex.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Fri, 21 Sep 2012 17:29:01 GMT) Full text and rfc822 format available.

Message #35 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Fri, 21 Sep 2012 10:26:59 -0700

On 09/21/2012 01:31 AM, Eli Zaretskii wrote:
>> 	#define raise(sig) kill (getpid (), sig)
> I don't want to do that.  I guess I'd special-case WINDOWSNT then, if
> we cannot do better.

If the special case is in a w32 or dos file or somewhere in the nt
directory, that's obviously fine.  I hope any such special case
doesn't clutter the mainline code, though, as it's Windows-specific.

> To me, the code looks overly complicated.

I'd also like a simpler way to do it.  But so far this
is the simplest code we've found, that avoids the bugs
in question.  And it's not *that* bad.  (Though perhaps my
standards are too high -- I've been reading the garbage
collector....)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Fri, 21 Sep 2012 17:41:01 GMT) Full text and rfc822 format available.

Message #38 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Fri, 21 Sep 2012 20:38:56 +0300

> Date: Fri, 21 Sep 2012 10:26:59 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
> 
> On 09/21/2012 01:31 AM, Eli Zaretskii wrote:
> >> 	#define raise(sig) kill (getpid (), sig)
> > I don't want to do that.  I guess I'd special-case WINDOWSNT then, if
> > we cannot do better.
> 
> If the special case is in a w32 or dos file or somewhere in the nt
> directory, that's obviously fine.  I hope any such special case
> doesn't clutter the mainline code, though, as it's Windows-specific.

I see no way to do that in any place but where 'raise' is called,
sorry.  That's why I asked if there was a way to do that without
calling 'raise'.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Fri, 21 Sep 2012 18:16:01 GMT) Full text and rfc822 format available.

Message #41 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Fri, 21 Sep 2012 11:13:47 -0700

On 09/21/2012 10:38 AM, Eli Zaretskii wrote:
> I see no way to do that in any place but where 'raise' is called,
> sorry.

Well I'm no Windows expert, but if I understand things correctly,
the following should do the trick, since it uses the same
pattern that the Windows code already uses for
'kill', 'signal', and 'sigaction'.  This approach should avoid
the macro problems that my earlier suggestion had.

=== modified file 'nt/ChangeLog'
--- nt/ChangeLog	2012-09-18 10:49:33 +0000
+++ nt/ChangeLog	2012-09-21 18:10:25 +0000
@@ -1,3 +1,7 @@
+2012-09-21  Paul Eggert  <eggert <at> cs.ucla.edu>
+
+	* inc/ms-w32.h (raise): New macro.
+
 2012-09-18  Eli Zaretskii  <eliz <at> gnu.org>
 
 	* configure.bat: Include stddef.h before gif_lib.h, to have size_t

=== modified file 'nt/inc/ms-w32.h'
--- nt/inc/ms-w32.h	2012-09-07 08:20:07 +0000
+++ nt/inc/ms-w32.h	2012-09-21 18:10:25 +0000
@@ -200,6 +200,7 @@
 /* Subprocess calls that are emulated.  */
 #define spawnve sys_spawnve
 #define wait    sys_wait
+#define raise   sys_raise
 #define kill    sys_kill
 #define signal  sys_signal
 

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-09-21 07:21:06 +0000
+++ src/ChangeLog	2012-09-21 18:10:25 +0000
@@ -124,6 +124,7 @@
 	(emacs_backtrace): Output backtrace for the appropriate thread,
 	which is not necessarily the main thread.
 	* syssignal.h: Include <stdbool.h>.
+	* w32proc.c (sys_raise): New function.
 	* xterm.c (x_connection_signal): Remove; no longer needed
 	now that we use sigaction.
 	(x_connection_closed): No need to mess with sigmask now.

=== modified file 'src/w32proc.c'
--- src/w32proc.c	2012-09-15 08:03:11 +0000
+++ src/w32proc.c	2012-09-21 18:10:25 +0000
@@ -1421,6 +1421,12 @@
 }
 
 int
+sys_raise (int sig)
+{
+  sys_kill (getpid (), sig);
+}
+
+int
 sys_kill (int pid, int sig)
 {
   child_process *cp;

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Fri, 21 Sep 2012 18:30:02 GMT) Full text and rfc822 format available.

Message #44 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Fri, 21 Sep 2012 21:27:28 +0300

> Date: Fri, 21 Sep 2012 11:13:47 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
> 
> +#define raise   sys_raise

I already told that I don't want to do that.  Shadowing library
functions is a maintenance burden in the long run.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Fri, 21 Sep 2012 20:01:01 GMT) Full text and rfc822 format available.

Message #47 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Fri, 21 Sep 2012 12:59:07 -0700

On 09/21/2012 11:27 AM, Eli Zaretskii wrote:>> +#define raise   sys_raise
> I already told that I don't want to do that.

Yes, but the most-recent patch did something else than what you had
objected to, and should avoid its problems.

> Shadowing library functions is a maintenance burden in the long run.

It's a maintenance burden no matter how we do it.  When it's easy, as
is the case here, it's better to address Windows-specific problems in
the Windows code.  This relieves the mainline maintainers of the
maintenance burden of reading Windows code, and that is a win.

Besides, this particular patch has almost no burden on the Windows
side.  We're talking about six simple lines that fit nicely into an
already-existing framework for this sort of thing.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 08:05:02 GMT) Full text and rfc822 format available.

Message #50 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 11:02:18 +0300

> Date: Fri, 21 Sep 2012 12:59:07 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
> 
> On 09/21/2012 11:27 AM, Eli Zaretskii wrote:>> +#define raise   sys_raise
> > I already told that I don't want to do that.
> 
> Yes, but the most-recent patch did something else than what you had
> objected to, and should avoid its problems.

What I object to is replacing existing library functions when that can
be avoided.  If the most-recent patch allows avoiding that, please
tell how and where.

> > Shadowing library functions is a maintenance burden in the long run.
> 
> It's a maintenance burden no matter how we do it.

I will always prefer a small burden to a larger one.

> When it's easy, as is the case here, it's better to address
> Windows-specific problems in the Windows code.  This relieves the
> mainline maintainers of the maintenance burden of reading Windows
> code, and that is a win.

It's a win for you, but it isn't a win in general.  And anyway, I
don't understand why you take the liberty of removing me and Juanma
from the list of "mainline maintainers".  May I suggest that you
review the commit logs and look up our names?  On my part, I can tell
that there would have been a few more entries there, if it were not
for the constant flux of low-level changes lately that need constant
attention to keep the Windows port in working condition.  How do you
enter that into the "win" equation?

> Besides, this particular patch has almost no burden on the Windows
> side.  We're talking about six simple lines that fit nicely into an
> already-existing framework for this sort of thing.

No, it does not fit in the existing framework.  The existing framework
is to replace library functions and APIs that do not exist on Windows,
or are severely limited compared to the expectations of the Emacs
application code.  This isn't the case here: 'raise' has no known
problems on Windows, when called with signals it supports (SIGABRT is
one of them).

Please note that 'sys_kill' was written to emulate delivery of fatal
signals to Emacs subprocesses, not to the Emacs process.  Adding the
two lines there to support aborting Emacs was already too far-fetched;
I did that as a temporary measure of getting a sane behavior while
waiting for the dust to settle -- as I was certain (and now proved
right) that the changes done then are not the last word on this.  What
you suggest now is that I add to 'sys_kill' a lot of stuff for it to
be able to replace 'raise', and then maintain that stuff for whatever
changes will be done (and I'm sure they will) in this area, because a
replacement of a library function needs to be at least as good as the
function it replaces.  _That_ is the maintenance burden I want to
avoid.  If you can compare it with a couple of #ifdef's and still
claim with a straight face that it's a lesser evil, then we will just
have to agree to disagree.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 08:50:01 GMT) Full text and rfc822 format available.

Message #53 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 01:47:47 -0700

On 09/22/2012 01:02 AM, Eli Zaretskii wrote:

> What you suggest now is that I add to 'sys_kill' a lot of stuff for it to
> be able to replace 'raise'

Sorry, I don't follow.  I did not suggest that, at any rate.

Currently, emacs.c does the equivalent of this:

  kill (getpid (), sig);

which on Windows is macro-expanded to

  sys_kill (_getpid (), sig);

The proposed patch causes Emacs to do the
same Windows actions via the redefined 'raise'.  That is,
emacs.c will do 'raise (sig)', and the implementation
of 'raise (sig)' will do 'sys_kill (_getpid (), sig)'.

If there is a problem with the proposed patch with respect
to Emacs sending a signal to itself, why doesn't the
current Emacs trunk have the same problem?  And if so,
shouldn't that problem be addressed independently, regardless
of the patch proposed for Bug#12471?

> I will always prefer a small burden to a larger one.

I was referring to the overall burden, for all maintainers,
not just the burden for the Windows maintainers in particular.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 09:13:01 GMT) Full text and rfc822 format available.

Message #56 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 12:10:04 +0300

> Date: Sat, 22 Sep 2012 01:47:47 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
> 
> On 09/22/2012 01:02 AM, Eli Zaretskii wrote:
> 
> > What you suggest now is that I add to 'sys_kill' a lot of stuff for it to
> > be able to replace 'raise'
> 
> Sorry, I don't follow.  I did not suggest that, at any rate.
> 
> Currently, emacs.c does the equivalent of this:
> 
>   kill (getpid (), sig);
> 
> which on Windows is macro-expanded to
> 
>   sys_kill (_getpid (), sig);
> 
> The proposed patch causes Emacs to do the
> same Windows actions via the redefined 'raise'.  That is,
> emacs.c will do 'raise (sig)', and the implementation
> of 'raise (sig)' will do 'sys_kill (_getpid (), sig)'.

Which means 'sys_kill' will need to do everything that's expected from
'raise'.  Like I wrote:

> because a replacement of a library function needs to be at least as
> good as the function it replaces.

> If there is a problem with the proposed patch with respect
> to Emacs sending a signal to itself, why doesn't the
> current Emacs trunk have the same problem?

I already explained that, too:

> Please note that 'sys_kill' was written to emulate delivery of fatal
> signals to Emacs subprocesses, not to the Emacs process.  Adding the
> two lines there to support aborting Emacs was already too far-fetched;
> I did that as a temporary measure of getting a sane behavior while
> waiting for the dust to settle -- as I was certain (and now proved
> right) that the changes done then are not the last word on this.

> And if so, shouldn't that problem be addressed independently,
> regardless of the patch proposed for Bug#12471?

It will be, once the dust settles on this (or we enter the feature
freeze, whichever comes first).  I was asking not to aggravate the
situation, if it's possible.

> > I will always prefer a small burden to a larger one.
> 
> I was referring to the overall burden, for all maintainers,
> not just the burden for the Windows maintainers in particular.

So was I.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 09:43:02 GMT) Full text and rfc822 format available.

Message #59 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 02:40:41 -0700

On 09/22/2012 02:10 AM, Eli Zaretskii wrote:
> Which means 'sys_kill' will need to do everything that's expected from
> 'raise'. 

Unless I'm missing something, the only thing that Emacs will
expect from 'raise' is that it act like 'sys_kill' on itself,
so the proposed patch shouldn't affect what 'sys_kill' needs
to do.

> I was asking not to aggravate the situation, if it's possible.

Yes, but I don't see how the proposed patch aggravates the
situation.  It causes Emacs to execute the same Windows
primitives as before.  These primitives may have problems,
but those problems are the same problems as before.

I'm perhaps stating the obvious here, but just in case
I'm misunderstanding please bear with me: the Windows
substitutes for kill, raise, etc. need not support every
POSIX feature.  They need to support only the features that
Emacs uses.  So sys_raise need not support every feature that
POSIX requires for 'raise'; it needs to support only the
'raise' features that Emacs uses.  Likewise for 'sys_kill'
versus POSIX 'kill'.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 10:11:02 GMT) Full text and rfc822 format available.

Message #62 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 13:07:52 +0300

> Date: Sat, 22 Sep 2012 02:40:41 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
> 
> On 09/22/2012 02:10 AM, Eli Zaretskii wrote:
> > Which means 'sys_kill' will need to do everything that's expected from
> > 'raise'. 
> 
> Unless I'm missing something, the only thing that Emacs will
> expect from 'raise' is that it act like 'sys_kill' on itself,
> so the proposed patch shouldn't affect what 'sys_kill' needs
> to do.

'sys_kill' currently supports only SIGABRT when called for the Emacs
process itself.  E.g., it will happily ignore SIGSEGV or SIGILL, and
when passed SIGTRAP, it will try to kill some subprocess.  It's quite
apparent from the code and from the comments that in its current shape
'sys_kill' does not do what's expected from 'raise', far from it.

> I'm perhaps stating the obvious here, but just in case
> I'm misunderstanding please bear with me: the Windows
> substitutes for kill, raise, etc. need not support every
> POSIX feature.  They need to support only the features that
> Emacs uses.  So sys_raise need not support every feature that
> POSIX requires for 'raise'; it needs to support only the
> 'raise' features that Emacs uses.  Likewise for 'sys_kill'
> versus POSIX 'kill'.

This is true for APIs, like 'kill', that don't exist in the library.
Those can be extended one feature at a time, because the missing
features don't work anyway.

But for existing library functions, the only sane way to replace them
is to have the replacement support all the features supported by the
function being replaced.  Otherwise, one would need to analyze every
possible call to the function, both directly and indirectly, in order
to understand what needs and what needs not be supported; that
analysis needs to be done for every change in related code, to reveal
the new features that need to be supported.

And even if we want to support "only" the features you had in mind,
'sys_kill' will need to be extended to support all the fatal signals
that wind up in 'fatal_error_backtrace'/'terminate_due_to_signal' or
whatever it is called these days.  All that in order to call a single
function anyway.  I don't want to do it.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 10:57:02 GMT) Full text and rfc822 format available.

Message #65 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 03:55:13 -0700

On 09/22/2012 03:07 AM, Eli Zaretskii wrote:

> 'sys_kill' does not do what's expected from 'raise', far from it.

It may not do what's expected for 'raise' for arbitrary Windows
applications, yes.  But it doesn't need to do that.  All that's
needed is what Emacs expects for 'raise'.

> for existing library functions, the only sane way to replace them
> is to have the replacement support all the features supported by the
> function being replaced.

But the proposed patch does not replace the existing 'raise'. so
it doesn't need to worry about supporting everything that Windows
'raise' does.  All the patch does is to tell Emacs to call
'sys_raise' where emacs.c would normally call 'raise'.  This
doesn't affect other calls to 'raise'.  The Windows 'raise'
function still exists, and will still do what it normally does
on Windows, when any non-Emacs module calls it.

> even if we want to support "only" the features you had in mind,
> 'sys_kill' will need to be extended to support all the fatal signals

The current trunk is already invoking sys_kill with all those signals.
If this behavior isn't correct for Windows, it needs to be fixed,
regardless of whether the proposed patch is applied.  The proposed
patch doesn't make this problem any worse, or any better.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 11:19:01 GMT) Full text and rfc822 format available.

Message #68 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
Subject: Re: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 14:16:12 +0300

> Date: Sat, 22 Sep 2012 03:55:13 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com
> 
> On 09/22/2012 03:07 AM, Eli Zaretskii wrote:
> 
> > 'sys_kill' does not do what's expected from 'raise', far from it.
> 
> It may not do what's expected for 'raise' for arbitrary Windows
> applications, yes.  But it doesn't need to do that.  All that's
> needed is what Emacs expects for 'raise'.

'sys_kill' doesn't support what Emacs expects from 'raise', except for
SIGABRT.  Please read the code, it's quite self-explanatory.

> > for existing library functions, the only sane way to replace them
> > is to have the replacement support all the features supported by the
> > function being replaced.
> 
> But the proposed patch does not replace the existing 'raise'.

It does, as far as Emacs code is concerned.

> > even if we want to support "only" the features you had in mind,
> > 'sys_kill' will need to be extended to support all the fatal signals
> 
> The current trunk is already invoking sys_kill with all those signals.
> If this behavior isn't correct for Windows, it needs to be fixed,
> regardless of whether the proposed patch is applied.  The proposed
> patch doesn't make this problem any worse, or any better.

Yes, it does, in that it enters 'raise' into this messy picture.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 20:31:01 GMT) Full text and rfc822 format available.

Message #71 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 16:28:26 -0400

> But for existing library functions, the only sane way to replace them
> is to have the replacement support all the features supported by the
> function being replaced.

Agreed.  Maybe a better solution is to use `emacs_raise' which can then
either use `raise' (on POSIX hosts) or something else (on Windows
hosts).


        Stefan

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sat, 22 Sep 2012 21:58:01 GMT) Full text and rfc822 format available.

Message #74 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Sat, 22 Sep 2012 14:55:23 -0700

[Message part 1 (text/plain, inline)]

On 09/22/2012 01:28 PM, Stefan Monnier wrote:
> Maybe a better solution is to use `emacs_raise' which can then
> either use `raise' (on POSIX hosts) or something else (on Windows
> hosts).

I still don't see why that helps, but since you and
Eli both seem to prefer that sort of solution I wrote an additional
patch to do it that way, as follows.  Updated total
patch (compressed) attached.

=== modified file 'nt/ChangeLog'
--- nt/ChangeLog	2012-09-21 18:10:25 +0000
+++ nt/ChangeLog	2012-09-22 21:44:27 +0000
@@ -1,6 +1,6 @@
-2012-09-21  Paul Eggert  <eggert <at> cs.ucla.edu>
+2012-09-22  Paul Eggert  <eggert <at> cs.ucla.edu>
 
-	* inc/ms-w32.h (raise): New macro.
+	* inc/ms-w32.h (emacs_raise): New macro.
 
 2012-09-18  Eli Zaretskii  <eliz <at> gnu.org>
 

=== modified file 'nt/inc/ms-w32.h'
--- nt/inc/ms-w32.h	2012-09-21 18:10:25 +0000
+++ nt/inc/ms-w32.h	2012-09-22 21:44:27 +0000
@@ -200,10 +200,12 @@
 /* Subprocess calls that are emulated.  */
 #define spawnve sys_spawnve
 #define wait    sys_wait
-#define raise   sys_raise
 #define kill    sys_kill
 #define signal  sys_signal
 
+/* Internal signals.  */
+#define emacs_raise(sig) kill (getpid (), sig)
+
 /* termcap.c calls that are emulated.  */
 #define tputs   sys_tputs
 #define tgetstr sys_tgetstr

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-09-22 21:13:11 +0000
+++ src/ChangeLog	2012-09-22 21:44:27 +0000
@@ -40,7 +40,7 @@
 	(terminate_due_to_signal): Rename from fatal_error_backtrace, since
 	it doesn't always backtrace.  All uses changed.  No need to reset
 	signal to default, since sigaction and/or die does that for us now.
-	Use raise (FOO), not kill (getpid (), FOO).
+	Use emacs_raise (FOO), not kill (getpid (), FOO).
 	(main): Check more-accurately whether we're dumping.
 	Move fatal-error setup to sysdep.c
 	* floatfns.c: Do not include "syssignal.h"; no longer needed.
@@ -124,7 +124,7 @@
 	(emacs_backtrace): Output backtrace for the appropriate thread,
 	which is not necessarily the main thread.
 	* syssignal.h: Include <stdbool.h>.
-	* w32proc.c (sys_raise): New function.
+	(emacs_raise): New macro.
 	* xterm.c (x_connection_signal): Remove; no longer needed
 	now that we use sigaction.
 	(x_connection_closed): No need to mess with sigmask now.

=== modified file 'src/emacs.c'
--- src/emacs.c	2012-09-22 21:13:11 +0000
+++ src/emacs.c	2012-09-22 21:44:27 +0000
@@ -311,7 +311,7 @@
   }
 #endif
 
-  raise (sig);
+  emacs_raise (sig);
 
   /* This shouldn't be executed, but it prevents a warning.  */
   exit (1);

=== modified file 'src/syssignal.h'
--- src/syssignal.h	2012-09-18 21:00:00 +0000
+++ src/syssignal.h	2012-09-22 21:44:27 +0000
@@ -40,6 +40,10 @@
 # define NSIG NSIG_MINIMUM
 #endif
 
+#ifndef emacs_raise
+# define emacs_raise(sig) raise (sig)
+#endif
+
 /* On bsd, [man says] kill does not accept a negative number to kill a pgrp.
    Must do that using the killpg call.  */
 #ifdef BSD_SYSTEM

=== modified file 'src/w32proc.c'
--- src/w32proc.c	2012-09-21 18:10:25 +0000
+++ src/w32proc.c	2012-09-22 21:44:27 +0000
@@ -1421,12 +1421,6 @@
 }
 
 int
-sys_raise (int sig)
-{
-  sys_kill (getpid (), sig);
-}
-
-int
 sys_kill (int pid, int sig)
 {
   child_process *cp;

[syssignaq.txt.gz (application/x-gzip, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sun, 23 Sep 2012 03:55:02 GMT) Full text and rfc822 format available.

Message #77 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com, eggert <at> cs.ucla.edu
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Sun, 23 Sep 2012 05:52:16 +0200

> From: Stefan Monnier <monnier <at> iro.umontreal.ca>
> Cc: Paul Eggert <eggert <at> cs.ucla.edu>,  12471 <at> debbugs.gnu.org,  lekktu <at> gmail.com
> Date: Sat, 22 Sep 2012 16:28:26 -0400
> 
> > But for existing library functions, the only sane way to replace them
> > is to have the replacement support all the features supported by the
> > function being replaced.
> 
> Agreed.  Maybe a better solution is to use `emacs_raise' which can then
> either use `raise' (on POSIX hosts) or something else (on Windows
> hosts).

Yes, that'd be good, I think.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sun, 23 Sep 2012 03:58:01 GMT) Full text and rfc822 format available.

Message #80 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 12471 <at> debbugs.gnu.org, lekktu <at> gmail.com, monnier <at> iro.umontreal.ca
Subject: Re: bug#12471: Avoid some signal-handling races, and simplify.
Date: Sun, 23 Sep 2012 05:55:44 +0200

> Date: Sat, 22 Sep 2012 14:55:23 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: Eli Zaretskii <eliz <at> gnu.org>, 12471 <at> debbugs.gnu.org, 
>  lekktu <at> gmail.com
> 
> +/* Internal signals.  */
> +#define emacs_raise(sig) kill (getpid (), sig)

Windows will use a different definition.  I'll probably just implement
'emacs_raise' as a function.  I want to keep 'sys_kill' only for
sending "signals" to subprocesses.

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Sun, 23 Sep 2012 09:39:01 GMT) Full text and rfc822 format available.

Notification sent to Paul Eggert <eggert <at> cs.ucla.edu>:
bug acknowledged by developer. (Sun, 23 Sep 2012 09:39:02 GMT) Full text and rfc822 format available.

Message #85 received at 12471-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 12471-done <at> debbugs.gnu.org
Subject: installed into trunk
Date: Sun, 23 Sep 2012 02:36:18 -0700

As all the issues seem to be resolved
I installed this into the trunk as bzr 110152
and am marking this as done.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sun, 23 Sep 2012 15:25:02 GMT) Full text and rfc822 format available.

Message #88 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#12471: installed into trunk
Date: Sun, 23 Sep 2012 16:22:23 +0100

On Sun 23 Sep 2012, Paul Eggert wrote:

> As all the issues seem to be resolved
> I installed this into the trunk as bzr 110152
> and am marking this as done.

...and again breaks the Windows build.

This is with MinGW GCC 4.6.2.

gcc -I. -c -gdwarf-2 -g3  -DEMACSDEBUG -fno-crossjumping  -IC:/emacs/devel/libxml2-2.7.8/include/libxml2 -IC:/emacs/devel/giflib-4.1.4-1/include -IC:/emacs/devel/jpeg-6b-4/include -IC:/emacs/devel/tiff-3.8.2-1/include -IC:/emacs/devel/libpng-1.4.3-1/include -IC:/emacs/devel/xpm-3.5.1-1/include -IC:/emacs/devel/xpm-3.5.1-1/src/xpm/3.5.1/libXpm-3.5.1-src/lib -IC:/emacs/devel/zlib-1.2.5-2/include -Demacs=1 -I../lib -I../nt/inc -DHAVE_NTGUI=1 -DUSE_CRT_DLL=1 -o oo/i386/sysdep.o sysdep.c
sysdep.c: In function 'init_signals':
sysdep.c:1766:62: error: 'SA_NODEFER' undeclared (first use in this function)
sysdep.c:1766:62: note: each undeclared identifier is reported only once for each function it appears in
makefile:323: recipe for target 'oo/i386/sysdep.o' failed

It seems MinGW does not define SA_NODEFER at all.

   AndyM

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sun, 23 Sep 2012 16:26:01 GMT) Full text and rfc822 format available.

Message #91 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#12471: installed into trunk
Date: Sun, 23 Sep 2012 17:23:15 +0100

On Sun 23 Sep 2012, Andy Moreton wrote:

> On Sun 23 Sep 2012, Paul Eggert wrote:
>
>> As all the issues seem to be resolved
>> I installed this into the trunk as bzr 110152
>> and am marking this as done.
>
> ...and again breaks the Windows build.
>
> sysdep.c: In function 'init_signals':
> sysdep.c:1766:62: error: 'SA_NODEFER' undeclared (first use in this function)
> sysdep.c:1766:62: note: each undeclared identifier is reported only once for
> each function it appears in
> makefile:323: recipe for target 'oo/i386/sysdep.o' failed
>
> It seems MinGW does not define SA_NODEFER at all.

There's also a stray use of interrupt_input_pending in w32term.c which
should be removed.

     AndyM

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sun, 23 Sep 2012 17:40:01 GMT) Full text and rfc822 format available.

Message #94 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 12471 <at> debbugs.gnu.org
Subject: Re: bug#12471: installed into trunk
Date: Sun, 23 Sep 2012 19:37:03 +0200

> From: Andy Moreton <andrewjmoreton <at> gmail.com>
> Date: Sun, 23 Sep 2012 16:22:23 +0100
> 
> It seems MinGW does not define SA_NODEFER at all.

MinGW doesn't have 'sigaction', it's all being faked.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#12471; Package emacs. (Sun, 23 Sep 2012 17:40:02 GMT) Full text and rfc822 format available.

Message #97 received at 12471 <at> debbugs.gnu.org (full text, mbox):

From: Juanma Barranquero <lekktu <at> gmail.com>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 12471 <at> debbugs.gnu.org
Subject: Re: bug#12471: installed into trunk
Date: Sun, 23 Sep 2012 19:37:21 +0200

On Sun, Sep 23, 2012 at 6:23 PM, Andy Moreton <andrewjmoreton <at> gmail.com> wrote:

> There's also a stray use of interrupt_input_pending in w32term.c which
> should be removed.

Done.

    Juanma

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 22 Oct 2012 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 254 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #12471 Avoid some signal-handling races, and simplify.

GNU bug report logs - #12471
Avoid some signal-handling races, and simplify.