GNU bug report logs - #36609
27.0.50; Possible race-condition in threading implementation

Previous Next

Package: emacs;

Reported by: Andreas Politz <politza <at> hochschule-trier.de>

Date: Thu, 11 Jul 2019 20:52:02 UTC

Severity: normal

Tags: fixed

Found in version 27.0.50

Fixed in version 28.1

Done: dick <dick.r.chiang <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36609 in the body.
You can then email your comments to 36609 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 11 Jul 2019 20:52:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Andreas Politz <politza <at> hochschule-trier.de>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 11 Jul 2019 20:52:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andreas Politz <politza <at> hochschule-trier.de>
To: bug-gnu-emacs <at> gnu.org
Subject: 27.0.50; Possible race-condition in threading implementation
Date: Thu, 11 Jul 2019 22:51:10 +0200
[Message part 1 (text/plain, inline)]
I think there is a race-condition in the implementation of threads.  I
tried to find a minimal test-case, without success.  Thus, I've attached
a lengthy source-file.  Loading that file should trigger this bug and
may freeze your session.  

Indications:

1. The main-thread has the name of one of created threads (XEmacs in
   this case), instead of "emacs".

2. Emacs stops processing all keyboard/mouse input while looping in
   wait_reading_process_output.  Sending commands via emacsclient still
   works.

GDB output:

(gdb) info threads
  Id   Target Id                                        Frame 
* 1    Thread 0x7ffff17f5d40 (LWP 26264) "XEmacs"       0x000055555576eac0 in XPNTR (a=XIL(0x7ffff1312533)) at alloc.c:535
  2    Thread 0x7ffff0ac4700 (LWP 26265) "gmain"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
  3    Thread 0x7fffebd1a700 (LWP 26266) "gdbus"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
  4    Thread 0x7fffeb519700 (LWP 26267) "dconf worker" 0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6

(gdb) bt full
#0  0x00007ffff50d3f76 in pselect () at /usr/lib/libc.so.6
#1  0x0000555555832e48 in really_call_select (arg=0x7fffffffd150) at thread.c:586
        sa = 0x7fffffffd150
        self = 0x555555c154e0 <main_thread>
        oldset = {
          __val = {0, 93825011232085, 140737488343064, 31632, 31632, 51840, 140737488343136, 93824994672716, 4294967298, 140737488343184, 93824999850720, 0, 0, 140737488343136, 93824993869565, 4041340661}
        }
#2  0x0000555555774449 in flush_stack_call_func (func=0x555555832d81 <really_call_select>, arg=0x7fffffffd150) at alloc.c:4969
        end = 0x7fffffffd100
        self = 0x555555c154e0 <main_thread>
        sentry = {
          o = {
            __max_align_ll = 140737488343296, 
            __max_align_ld = <invalid float value>
          }
        }
#3  0x0000555555832f40 in thread_select (func=0x7ffff50d3eb0 <pselect>, max_fds=6, rfds=0x7fffffffd260, wfds=0x7fffffffd2e0, efds=0x0, timeout=0x7fffffffd8b0, sigmask=0x0) at thread.c:616
        sa = {
          func = 0x7ffff50d3eb0 <pselect>, 
          max_fds = 6, 
          rfds = 0x7fffffffd260, 
          wfds = 0x7fffffffd2e0, 
          efds = 0x0, 
          timeout = 0x7fffffffd8b0, 
          sigmask = 0x0, 
          result = 210347336
        }
#4  0x000055555585fea3 in xg_select (fds_lim=6, rfds=0x7fffffffd920, wfds=0x7fffffffd9a0, efds=0x0, timeout=0x7fffffffd8b0, sigmask=0x0) at xgselect.c:117
        all_rfds = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        all_wfds = {
          fds_bits = {0 <repeats 16 times>}
        }
        tmo = {
          tv_sec = 7, 
          tv_nsec = 140737488343264
        }
        tmop = 0x7fffffffd8b0
        context = 0x555555dae320
        have_wfds = true
        gfds_buf = {{
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 32, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 7, 
            events = 0, 
            revents = 0
          }, {
            fd = 8, 
            events = 0, 
            revents = 0
          }, {
            fd = 15, 
            events = 0, 
            revents = 0
          }, {
            fd = -11072, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1460278864, 
            events = 21845, 
            revents = 0
          }, {
            fd = 64, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 2, 
            events = 48, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 91, 
            events = 124, 
            revents = 0
          }, {
            fd = -397131776, 
            events = 35414, 
            revents = 11125
          }, {
            fd = 1460278864, 
            events = 21845, 
            revents = 0
          }, {
            fd = 1, 
            events = 0, 
            revents = 0
          }, {
            fd = 2, 
            events = 0, 
            revents = 0
          }, {
            fd = -11072, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1439161536, 
            events = 21845, 
            revents = 0
          }, {
            fd = -11024, 
            events = 32767, 
            revents = 0
          }, {
            fd = -11088, 
            events = 32767, 
            revents = 0
          }, {
            fd = -182671191, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1, 
            events = 0, 
            revents = 0
          }, {
            fd = -182670875, 
            events = 32767, 
            revents = 0
          }, {
            fd = 194871296, 
            events = 59160, 
            revents = 48198
          }, {
            fd = -1175563804, 
            events = 19, 
            revents = 0
          }, {
            fd = 24, 
            events = 0, 
            revents = 0
          }, {
            fd = 1460278864, 
            events = 21845, 
            revents = 0
          }, {
            fd = 2, 
            events = 0, 
            revents = 0
          }, {
            fd = 1439882448, 
            events = 21845, 
            revents = 0
          }, {
            fd = 2, 
            events = 0, 
            revents = 0
          }, {
            fd = 2, 
            events = 0, 
            revents = 0
          }, {
            fd = 1, 
            events = 2, 
            revents = 0
          }, {
            fd = 1460278864, 
            events = 21845, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = -397131776, 
            events = 35414, 
            revents = 11125
          }, {
            fd = 24, 
            events = 0, 
            revents = 0
          }, {
            fd = 1439161536, 
            events = 21845, 
            revents = 0
          }, {
            fd = 2, 
            events = 0, 
            revents = 0
          }, {
            fd = 1, 
            events = 0, 
            revents = 0
          }, {
            fd = 1439161536, 
            events = 21845, 
            revents = 0
          }, {
            fd = -182606722, 
            events = 32767, 
            revents = 0
          }, {
            fd = -10960, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1433487750, 
            events = 21845, 
            revents = 0
          }, {
            fd = -10976, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1439161536, 
            events = 21845, 
            revents = 0
          }, {
            fd = -30, 
            events = 0, 
            revents = 0
          }, {
            fd = 1, 
            events = 0, 
            revents = 0
          }, {
            fd = 1, 
            events = 64, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 31, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = -10960, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1434542700, 
            events = 21845, 
            revents = 0
          }, {
            fd = -10912, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1439161536, 
            events = 21845, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = -397131776, 
            events = 35414, 
            revents = 11125
          }, {
            fd = -10848, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1434543288, 
            events = 21845, 
            revents = 0
          }, {
            fd = 1460274293, 
            events = 21845, 
            revents = 0
          }, {
            fd = 1385447426, 
            events = 931, 
            revents = 0
          }, {
            fd = 95390, 
            events = 0, 
            revents = 0
          }, {
            fd = 1433865373, 
            events = 7256, 
            revents = 10134
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 1439161536, 
            events = 21845, 
            revents = 0
          }, {
            fd = 664149, 
            events = 0, 
            revents = 0
          }, {
            fd = 80000, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 664149080, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 664149080, 
            events = 0, 
            revents = 0
          }, {
            fd = -10752, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1434543494, 
            events = 21845, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = -10544, 
            events = 32767, 
            revents = 0
          }, {
            fd = 499622378, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 499622378, 
            events = 0, 
            revents = 0
          }, {
            fd = -10688, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1434939197, 
            events = 21845, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 164526702, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 664149080, 
            events = 0, 
            revents = 0
          }, {
            fd = -10544, 
            events = 32767, 
            revents = 0
          }, {
            fd = 499622378, 
            events = 41450, 
            revents = 7623
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = -1, 
            events = 65535, 
            revents = 65535
          }, {
            fd = -10432, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1433359581, 
            events = 21845, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 1443725184, 
            events = 85, 
            revents = 0
          }, {
            fd = 1450650965, 
            events = 21845, 
            revents = 0
          }, {
            fd = 1450650965, 
            events = 21845, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 16384, 
            events = 0, 
            revents = 0
          }, {
            fd = 5, 
            events = 0, 
            revents = 0
          }, {
            fd = -10496, 
            events = 32767, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = -1, 
            events = 65535, 
            revents = 65535
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 164526702, 
            events = 0, 
            revents = 0
          }, {
            fd = 719, 
            events = 0, 
            revents = 0
          }, {
            fd = 893997157, 
            events = 0, 
            revents = 0
          }, {
            fd = 1439269600, 
            events = 21845, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = -10496, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1433288445, 
            events = 21845, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 164532384, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562977925, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562977925, 
            events = 0, 
            revents = 0
          }, {
            fd = 164532384, 
            events = 0, 
            revents = 0
          }, {
            fd = -10368, 
            events = 32767, 
            revents = 0
          }, {
            fd = 1434938909, 
            events = 21845, 
            revents = 0
          }, {
            fd = 100000, 
            events = 0, 
            revents = 0
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }, {
            fd = 1562877925, 
            events = 0, 
            revents = 0
          }, {
            fd = 164532384, 
            events = 0, 
            revents = 0
          }, {
            fd = -1, 
            events = 65535, 
            revents = 65535
          }, {
            fd = 0, 
            events = 0, 
            revents = 0
          }}
        gfds = 0x7fffffffd360
        gfds_size = 128
        n_gfds = -1
        retval = 0
        our_fds = 0
        max_fds = 5
        context_acquired = false
        i = 0
        nfds = 135
        tmo_in_millisec = 1030
        must_free = 0
        need_to_dispatch = false
#5  0x0000555555802a78 in wait_reading_process_output (time_limit=0, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5423
        process_skipped = false
        channel = 6
        nfds = 1
        Available = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        Writeok = {
          fds_bits = {0 <repeats 16 times>}
        }
        check_write = true
        check_delay = 0
        no_avail = false
        xerrno = 11
        proc = XIL(0x7fffffffd9a0)
        timeout = {
          tv_sec = 0, 
          tv_nsec = 499622378
        }
        end_time = {
          tv_sec = 93824993869565, 
          tv_nsec = 0
        }
        timer_delay = {
          tv_sec = 0, 
          tv_nsec = 499622378
        }
        got_output_end_time = {
          tv_sec = 1562977205, 
          tv_nsec = 270561059
        }
        wait = FOREVER
        got_some_output = -1
        prev_wait_proc_nbytes_read = 0
        retry_for_async = false
        count = 4
        now = {
          tv_sec = 0, 
          tv_nsec = -1
        }
#6  0x00005555556f4718 in kbd_buffer_get_event (kbp=0x7fffffffdc70, used_mouse_menu=0x7fffffffe235, end_time=0x0) at keyboard.c:3836
        do_display = true
        obj = make_fixnum(1073741816)
#7  0x00005555556f06c5 in read_event_from_main_queue (end_time=0x0, local_getcjmp=0x7fffffffe040, used_mouse_menu=0x7fffffffe235) at keyboard.c:2138
        c = XIL(0)
        save_jump = {{
            __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0}, 
            __mask_was_saved = 0, 
            __saved_mask = {
              __val = {0 <repeats 16 times>}
            }
          }}
        kb = 0x7fffffffdcd0
        count = 3
#8  0x00005555556f099b in read_decoded_event_from_main_queue (end_time=0x0, local_getcjmp=0x7fffffffe040, prev_event=XIL(0), used_mouse_menu=0x7fffffffe235) at keyboard.c:2202
        nextevt = XIL(0)
        frame = 0x1dcd30d9
        terminal = 0x9
        events = {XIL(0), XIL(0x1dcd30d9), XIL(0xca80), XIL(0x2b758a56e8544000), XIL(0), XIL(0x555556772ff5), XIL(0x7fffffffde50), XIL(0x5555556f57d8), XIL(0x555555c982e0), XIL(0), XIL(0), XIL(0x7fffffffde50), XIL(0x5555556e3efd), XIL(0x1dcd30d9), XIL(0x7fffffffde90), XIL(0x5555556f3b29)}
        n = 0
#9  0x00005555556f2236 in read_char (commandflag=1, map=XIL(0x5555567756e3), prev_event=XIL(0), used_mouse_menu=0x7fffffffe235, end_time=0x0) at keyboard.c:2810
        c = XIL(0)
        jmpcount = 3
        local_getcjmp = {{
            __jmpbuf = {0, -245616553674816983, 93825011232757, 51840, 0, 0, -245616553305718231, -6214351577293929943}, 
            __mask_was_saved = 0, 
            __saved_mask = {
              __val = {0, 0, 140737488347312, 93824993869565, 0, 140737488347424, 93824994673374, 93825011242707, 3, 0, 0, 140737488347424, 93824994446493, 93824999850720, 0, 0}
            }
          }}
        save_jump = {{
            __jmpbuf = {93824993869565, 0, 140737488347584, 93824994020698, 0, 51840, -1, 0}, 
            __mask_was_saved = 1450661587, 
            __saved_mask = {
              __val = {93825011242723, 140737488347520, 93824993870282, 93825011242707, 93825011242723, 140737488347584, 93824994446493, 93824999885184, 34464, 34464, 140737488347584, 93824993869565, 40105367251, 93825004306304, 140737488347624, 93824999850720}
            }
          }}
        tem = XIL(0x2aaa9b29c970)
        save = XIL(0)
        previous_echo_area_message = XIL(0)
        also_record = XIL(0)
        reread = false
        recorded = false
        polling_stopped_here = true
        orig_kboard = 0x555555dfa570
#10 0x00005555556ff6c8 in read_key_sequence (keybuf=0x7fffffffe440, prompt=XIL(0), dont_downcase_last=false, can_return_switch_frame=true, fix_current_buffer=true, prevent_redisplay=false) at keyboard.c:9124
        interrupted_kboard = 0x555555dfa570
        interrupted_frame = 0x5555560d7f80
        key = XIL(0x5555557a7eff)
        used_mouse_menu = false
        echo_local_start = 0
        last_real_key_start = 0
        keys_local_start = 0
        new_binding = XIL(0x7fffffffe310)
        count = 3
        t = 0
        echo_start = 0
        keys_start = 0
        current_binding = XIL(0x5555567756e3)
        first_unbound = 31
        mock_input = 0
        used_mouse_menu_history = {false <repeats 30 times>}
        fkey = {
          parent = XIL(0x555555d6ce03), 
          map = XIL(0x555555d6ce03), 
          start = 0, 
          end = 0
        }
        keytran = {
          parent = XIL(0x7ffff14a7dbb), 
          map = XIL(0x7ffff14a7dbb), 
          start = 0, 
          end = 0
        }
        indec = {
          parent = XIL(0x555555d6cdf3), 
          map = XIL(0x555555d6cdf3), 
          start = 0, 
          end = 0
        }
        shift_translated = false
        delayed_switch_frame = XIL(0)
        original_uppercase = XIL(0)
        original_uppercase_position = -1
        dummyflag = false
        starting_buffer = 0x7ffff0e1f6f0
        fake_prefixed_keys = XIL(0)
        first_event = XIL(0)
        second_event = XIL(0)
#11 0x00005555556ee5fb in command_loop_1 () at keyboard.c:1348
        cmd = XIL(0)
        keybuf = {XIL(0x7fffffffe490), XIL(0x5555557a804c), make_fixnum(11545945833472), XIL(0x7fffffffe4c0), XIL(0x555555c982e0), XIL(0), XIL(0), XIL(0x7fffffffe490), XIL(0x5555556e3efd), XIL(0xf0e1f6f5), XIL(0x7fffffffe500), make_fixnum(23456248668343), XIL(0x555555d18953), XIL(0x3), XIL(0x555555c982e0), XIL(0), XIL(0), XIL(0x7fffffffe4e0), XIL(0x5555556e3efd), XIL(0xf0e1f6f5), XIL(0x7fffffffe520), XIL(0x5555557a2e69), XIL(0x1556e3efd), XIL(0x5760), XIL(0x7fffffffe540), XIL(0x555555d53470), XIL(0), XIL(0), XIL(0x7fffffffe550), make_fixnum(23456248662876)}
        i = 32767
        prev_modiff = 0
        prev_buffer = 0x0
        already_adjusted = false
#12 0x00005555557a2a64 in internal_condition_case (bfun=0x5555556ee1b3 <command_loop_1>, handlers=XIL(0x5760), hfun=0x5555556ed968 <cmd_error>) at eval.c:1351
        val = XIL(0x5555556e3efd)
        c = 0x555555d53470
#13 0x00005555556ede9b in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
        val = XIL(0)
#14 0x00005555557a22d5 in internal_catch (tag=XIL(0xd110), func=0x5555556ede6e <command_loop_2>, arg=XIL(0)) at eval.c:1112
        val = XIL(0x45b00000000)
        c = 0x555555d53340
#15 0x00005555556ede39 in command_loop () at keyboard.c:1070
#16 0x00005555556ed537 in recursive_edit_1 () at keyboard.c:714
        count = 1
        val = make_fixnum(23456248667937)
#17 0x00005555556ed6bb in Frecursive_edit () at keyboard.c:786
        count = 0
        buffer = XIL(0)
#18 0x00005555556eb49d in main (argc=4, argv=0x7fffffffe8b8) at emacs.c:2086
        stack_bottom_variable = 0x20
        do_initial_setlocale = true
        no_loadup = false
        junk = 0x0
        dname_arg = 0x0
        ch_to_dir = 0x0
        original_pwd = 0x0
        dump_mode = 0x0
        skip_args = 0
        temacs = 0x0
        attempt_load_pdump = true
        rlim = {
          rlim_cur = 10022912, 
          rlim_max = 18446744073709551615
        }
        sockfd = -1

[thread-bug-2.el (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 07:48:02 GMT) Full text and rfc822 format available.

Message #8 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Politz <politza <at> hochschule-trier.de>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 10:47:37 +0300
> From: Andreas Politz <politza <at> hochschule-trier.de>
> Date: Thu, 11 Jul 2019 22:51:10 +0200
> 
> I think there is a race-condition in the implementation of threads.

Not sure what you mean by "race condition".  Since only one Lisp
thread can run at any given time, where could this race happen, and
between what threads?

> I tried to find a minimal test-case, without success.  Thus, I've
> attached a lengthy source-file.  Loading that file should trigger
> this bug and may freeze your session.

FWIW, it doesn't freeze my Emacs here, neither on GNU/Linux nor on
MS-Windows (with today's master).  Are you getting freezes with 100%
reproducibility?  If so, please show all the details of your build, as
collected by report-emacs-bug.

> Indications:
> 
> 1. The main-thread has the name of one of created threads (XEmacs in
>    this case), instead of "emacs".

I don't think this is related to the problem.  I think we have a bug
in the implementation of the 'name' attribute of threads: we call
prctl(PR_SET_NAME), which AFAIU changes the name of the _calling_
thread, and the calling thread at that point is the main thread, see
sys_thread_create.  If I evaluate this:

  (make-thread 'ignore "XEmacs")

and then attach a debugger, I see that the name of the main thread has
changed to "XEmacs".

> 2. Emacs stops processing all keyboard/mouse input while looping in
>    wait_reading_process_output.  Sending commands via emacsclient still
>    works.
> 
> GDB output:
> 
> (gdb) info threads
>   Id   Target Id                                        Frame 
> * 1    Thread 0x7ffff17f5d40 (LWP 26264) "XEmacs"       0x000055555576eac0 in XPNTR (a=XIL(0x7ffff1312533)) at alloc.c:535
>   2    Thread 0x7ffff0ac4700 (LWP 26265) "gmain"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
>   3    Thread 0x7fffebd1a700 (LWP 26266) "gdbus"        0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
>   4    Thread 0x7fffeb519700 (LWP 26267) "dconf worker" 0x00007ffff50d1667 in poll () from /usr/lib/libc.so.6
> 
> (gdb) bt full

This "bt full" is not enough, because it shows the backtrace of one
thread only, the main thread.  Please show the backtrace of the other
3 threads by typing "thread apply all bt full" instead.

> #5  0x0000555555802a78 in wait_reading_process_output (time_limit=0, nsecs=0, read_kbd=-1, do_display=true, wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at process.c:5423
>         process_skipped = false
>         channel = 6
>         nfds = 1
>         Available = {
>           fds_bits = {32, 0 <repeats 15 times>}
>         }

Is the keyboard descriptor's bit in Available set or not?  What is the
contents of the fd_callback_info array at this point?

> ;; -*- lexical-binding: t -*-
> 
> (require 'threads)
> (require 'eieio)
> (require 'cl-lib)
> (require 'ring)
> 
> (defun debug (fmt &rest args)
>   (princ (apply #'format fmt args) #'external-debugging-output)
>   (terpri #'external-debugging-output))

Please describe what this program tries to accomplish, and how.  It's
not easy to reverse-engineer that from 200 lines of non-trivial code.
It's possible that the reason(s) for the freeze will be apparent from
describing your implementation ideas.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 08:06:02 GMT) Full text and rfc822 format available.

Message #11 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: politza <at> hochschule-trier.de
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 11:05:08 +0300
> Date: Fri, 12 Jul 2019 10:47:37 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 36609 <at> debbugs.gnu.org
> 
> Please describe what this program tries to accomplish, and how.  It's
> not easy to reverse-engineer that from 200 lines of non-trivial code.
> It's possible that the reason(s) for the freeze will be apparent from
> describing your implementation ideas.

Oh, and one more question: does the value of the global variable
last_thread_error tell something interesting at that point?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 09:04:01 GMT) Full text and rfc822 format available.

Message #14 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Andreas Politz <politza <at> hochschule-trier.de>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 09:02:22 +0000
[Message part 1 (text/plain, inline)]
On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
<politza <at> hochschule-trier.de> wrote:
> I think there is a race-condition in the implementation of threads.  I
> tried to find a minimal test-case, without success.  Thus, I've attached
> a lengthy source-file.  Loading that file should trigger this bug and
> may freeze your session.

It does here, so I can provide further debugging information if
needed. On first glance, it appears that xgselect returns abnormally
with g_main_context acquired in one thread, and then other threads
fail to acquire it and loop endlessly.

Can you confirm that this very dirty patch avoids (it's not a fix by
any means!) the problem?
[Avoid-glib-issue.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 12:43:02 GMT) Full text and rfc822 format available.

Message #17 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 15:42:20 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 09:02:22 +0000
> Cc: 36609 <at> debbugs.gnu.org
> 
> On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
> <politza <at> hochschule-trier.de> wrote:
> > I think there is a race-condition in the implementation of threads.  I
> > tried to find a minimal test-case, without success.  Thus, I've attached
> > a lengthy source-file.  Loading that file should trigger this bug and
> > may freeze your session.
> 
> It does here, so I can provide further debugging information if
> needed.

Thanks, can you provide the info I asked for?

> On first glance, it appears that xgselect returns abnormally with
> g_main_context acquired in one thread, and then other threads fail
> to acquire it and loop endlessly.

If you can describe what causes this to happen, I think we might be
half-way to a solution.

> +  ptrdiff_t count = SPECPDL_INDEX ();

I don't think we should do that at this low level.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 12:46:01 GMT) Full text and rfc822 format available.

Message #20 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Andreas Politz <politza <at> hochschule-trier.de>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 12:44:35 +0000
[Message part 1 (text/plain, inline)]
On Fri, Jul 12, 2019 at 9:02 AM Pip Cet <pipcet <at> gmail.com> wrote:
>
> On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
> <politza <at> hochschule-trier.de> wrote:
> > I think there is a race-condition in the implementation of threads.  I
> > tried to find a minimal test-case, without success.  Thus, I've attached
> > a lengthy source-file.  Loading that file should trigger this bug and
> > may freeze your session.
>
> It does here, so I can provide further debugging information if
> needed. On first glance, it appears that xgselect returns abnormally
> with g_main_context acquired in one thread, and then other threads
> fail to acquire it and loop endlessly.

Okay, I'm still not sure this is really the problem Andreas was
seeing, but this code fails to work with xg_select:

(let ((thread (make-thread (lambda ()
                 (sleep-for 3)))))
  (thread-yield)
  (thread-signal thread 'error nil))

Proposed patch attached.
[0001-Protect-against-abnormal-exit-in-xg_select.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 12:56:01 GMT) Full text and rfc822 format available.

Message #23 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 15:55:22 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 12:44:35 +0000
> Cc: 36609 <at> debbugs.gnu.org
> 
> Okay, I'm still not sure this is really the problem Andreas was
> seeing, but this code fails to work with xg_select:
> 
> (let ((thread (make-thread (lambda ()
>                  (sleep-for 3)))))
>   (thread-yield)
>   (thread-signal thread 'error nil))

What do you mean by "fails"?  What do you see when you eval this?

> Proposed patch attached.

As I said earlier, I don't think it's right to use stack unwinding at
this level.  (Did you try this in a -nw session?)

I'd appreciate a description of what you think happens here, and why.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 12:59:02 GMT) Full text and rfc822 format available.

Message #26 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 12:57:51 +0000
On Fri, Jul 12, 2019 at 12:42 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
>
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 12 Jul 2019 09:02:22 +0000
> > Cc: 36609 <at> debbugs.gnu.org
> >
> > On Thu, Jul 11, 2019 at 8:52 PM Andreas Politz
> > <politza <at> hochschule-trier.de> wrote:
> > > I think there is a race-condition in the implementation of threads.  I
> > > tried to find a minimal test-case, without success.  Thus, I've attached
> > > a lengthy source-file.  Loading that file should trigger this bug and
> > > may freeze your session.
> >
> > It does here, so I can provide further debugging information if
> > needed.
>
> Thanks, can you provide the info I asked for?

Yes, albeit not right now.

> > On first glance, it appears that xgselect returns abnormally with
> > g_main_context acquired in one thread, and then other threads fail
> > to acquire it and loop endlessly.
>
> If you can describe what causes this to happen, I think we might be
> half-way to a solution.

Here's the backtrace of the abnormal exit I see with the patch attached:


(gdb) bt full
#0  0x00000000006bf987 in release_g_main_context (ptr=0xc1d070) at xgselect.c:36
        context = 0x7fffedf79710
#1  0x0000000000616f03 in do_one_unbind
    (this_binding=0x7fffedf79770, unwinding=true, bindflag=SET_INTERNAL_UNBIND)
    at eval.c:3446
#2  0x0000000000617245 in unbind_to (count=0, value=XIL(0)) at eval.c:3567
        this_binding = {
          kind = SPECPDL_UNWIND_PTR,
          unwind = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = XIL(0xc1d070),
            eval_depth = 0
          },
          unwind_array = {
            kind = SPECPDL_UNWIND_PTR,
            nelts = 7076219,
            array = 0xc1d070
          },
          unwind_ptr = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = 0xc1d070
          },
          unwind_int = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>,
            arg = 12701808
          },
          unwind_excursion = {
            kind = SPECPDL_UNWIND_PTR,
            marker = XIL(0x6bf97b),
            window = XIL(0xc1d070)
          },
          unwind_void = {
            kind = SPECPDL_UNWIND_PTR,
            func = 0x6bf97b <release_g_main_context>
          },
          let = {
            kind = SPECPDL_UNWIND_PTR,
            symbol = XIL(0x6bf97b),
            old_value = XIL(0xc1d070),
            where = XIL(0),
            saved_value = XIL(0xef26a0)
          },
          bt = {
            kind = SPECPDL_UNWIND_PTR,
            debug_on_exit = false,
            function = XIL(0x6bf97b),
            args = 0xc1d070,
            nargs = 0
          }
        }
        quitf = XIL(0)
#3  0x00000000006116df in unwind_to_catch
    (catch=0x7fffd8000c50, type=NONLOCAL_EXIT_SIGNAL, value=XIL(0x14d3653))
    at eval.c:1162
        last_time = false
#4  0x00000000006126d9 in signal_or_quit
    (error_symbol=XIL(0x90), data=XIL(0), keyboard_quit=false) at eval.c:1674
        unwind_data = XIL(0x14d3653)
        conditions = XIL(0x7ffff05d676b)
        string = XIL(0x5f5e77)
        real_error_symbol = XIL(0x90)
        clause = XIL(0x30)
        h = 0x7fffd8000c50
#5  0x00000000006122e9 in Fsignal (error_symbol=XIL(0x90), data=XIL(0))
    at eval.c:1564
#6  0x0000000000698901 in post_acquire_global_lock (self=0xe09db0) at
thread.c:115
        sym = XIL(0x90)
        data = XIL(0)
        prev_thread = 0xa745c0 <main_thread>
#7  0x000000000069892b in acquire_global_lock (self=0xe09db0) at thread.c:123
#8  0x0000000000699303 in really_call_select (arg=0x7fffedf79a70) at
thread.c:596
        sa = 0x7fffedf79a70
        self = 0xe09db0
        oldset = {
          __val = {0, 0, 7, 0, 80, 140736817269952, 2031, 2080,
18446744073709550952, 32, 343597383808, 4, 0, 472446402655,
511101108348, 0}
        }
#9  0x00000000005e5ee0 in flush_stack_call_func
    (func=0x699239 <really_call_select>, arg=0x7fffedf79a70) at alloc.c:4969
        end = 0x7fffedf79a30
        self = 0xe09db0
        sentry = {
          o = {
            __max_align_ll = 0,
            __max_align_ld = <invalid float value>
          }
        }
#10 0x0000000000699389 in thread_select
    (func=0x419320 <pselect <at> plt>, max_fds=9, rfds=0x7fffedf79fa0,
wfds=0x7fffedf79f20, efds=0x0, timeout=0x7fffedf7a260, sigmask=0x0) at
thread.c:616
        sa = {
          func = 0x419320 <pselect <at> plt>,
          max_fds = 9,
          rfds = 0x7fffedf79fa0,
          wfds = 0x7fffedf79f20,
          efds = 0x0,
          timeout = 0x7fffedf7a260,
          sigmask = 0x0,
          result = 1
        }
#11 0x00000000006bfef5 in xg_select
    (fds_lim=9, rfds=0x7fffedf7a300, wfds=0x7fffedf7a280, efds=0x0,
timeout=0x7fffedf7a260, sigmask=0x0) at xgselect.c:130
        all_rfds = {
          fds_bits = {8, 0 <repeats 15 times>}
        }
        all_wfds = {
          fds_bits = {0 <repeats 16 times>}
        }
        tmo = {
          tv_sec = 0,
          tv_nsec = 0
        }
        tmop = 0x7fffedf7a260
        context = 0xc1d070
        have_wfds = true
        gfds_buf = {{
            fd = 5,
            events = 1,
            revents = 0
          }, {
            fd = 6,
            events = 1,
            revents = 0
          }, {
            fd = 8,
            events = 1,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          } <repeats 125 times>}
        gfds = 0x7fffedf79b10
        gfds_size = 128
        n_gfds = 3
        retval = 0
        our_fds = 0
        max_fds = 8
        context_acquired = true
        i = 3
        nfds = 0
        tmo_in_millisec = -1
        must_free = 0
        need_to_dispatch = false
        count = 3
#12 0x000000000066b757 in wait_reading_process_output
    (time_limit=3, nsecs=0, read_kbd=0, do_display=false,
wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at
process.c:5423
        process_skipped = false
        channel = 0
        nfds = 0
        Available = {
          fds_bits = {8, 0 <repeats 15 times>}
        }
        Writeok = {
          fds_bits = {0 <repeats 16 times>}
        }
        check_write = true
        check_delay = 0
        no_avail = false
        xerrno = 0
        proc = XIL(0x7fffedf7a440)
        timeout = {
          tv_sec = 3,
          tv_nsec = 0
        }
        end_time = {
          tv_sec = 1562935633,
          tv_nsec = 911868453
        }
        timer_delay = {
          tv_sec = 0,
          tv_nsec = -1
        }
        got_output_end_time = {
          tv_sec = 0,
          tv_nsec = -1
        }
        wait = TIMEOUT
        got_some_output = -1
        prev_wait_proc_nbytes_read = 0
        retry_for_async = false
        count = 2
        now = {
          tv_sec = 0,
          tv_nsec = -1
        }
#13 0x0000000000429bf6 in Fsleep_for (seconds=make_fixnum(3),
milliseconds=XIL(0))
    at dispnew.c:5825
        t = {
          tv_sec = 3,
          tv_nsec = 0
        }
        tend = {
          tv_sec = 1562935633,
          tv_nsec = 911868112
        }
        duration = 3
#14 0x0000000000613e99 in eval_sub (form=XIL(0xf6df73)) at eval.c:2273
        i = 2
        maxargs = 2
        args_left = XIL(0)
        numargs = 1
        original_fun = XIL(0x7fffefa9fb98)
        original_args = XIL(0xf6df83)
        count = 1
        fun = XIL(0xa756a5)
        val = XIL(0)
        funcar = make_fixnum(35184372085343)
        argvals =
          {make_fixnum(3), XIL(0), XIL(0), XIL(0), XIL(0), XIL(0),
XIL(0), XIL(0)}
#15 0x0000000000610032 in Fprogn (body=XIL(0)) at eval.c:462
        form = XIL(0xf6df73)
        val = XIL(0)
#16 0x0000000000616102 in funcall_lambda
    (fun=XIL(0xf6da43), nargs=0, arg_vector=0xe09dd8) at eval.c:3065
        val = XIL(0xc0)
        syms_left = XIL(0)
        next = XIL(0x3400000013)
        lexenv = XIL(0)
        count = 1
        i = 0
        optional = false
        rest = false
#17 0x0000000000615542 in Ffuncall (nargs=1, args=0xe09dd0) at eval.c:2813
        fun = XIL(0xf6da43)
        original_fun = XIL(0xf6da43)
        funcar = XIL(0xc0)
        numargs = 0
        val = XIL(0xaf72e0)
        count = 0
#18 0x000000000069956f in invoke_thread_function () at thread.c:702
        count = 0
#19 0x0000000000611d61 in internal_condition_case
    (bfun=0x69953e <invoke_thread_function>, handlers=XIL(0x30),
hfun=0x699596 <record_thread_error>) at eval.c:1351
        val = make_fixnum(1405386)
        c = 0x7fffd8000c50
#20 0x0000000000699697 in run_thread (state=0xe09db0) at thread.c:741
        stack_pos = {
          __max_align_ll = 0,
          __max_align_ld = 0
        }
        self = 0xe09db0
        iter = 0x0
        c = 0x7fffd8000b20
#21 0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737185822464, -1249422724209328276,
140737488341374, 140737488341375, 140737185822464, 0,
1249453444682727276, 1249398985402204012},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#22 0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Lisp Backtrace:
"sleep-for" (0xedf7a530)
0xf6da40 Lisp type 3

post_acquire_global_lock () can return abnormally (I didn't know
that), so really_call_select() can, too, so thread_select() can, too.

> > +  ptrdiff_t count = SPECPDL_INDEX ();
>
> I don't think we should do that at this low level.

You're right, it does stick out. I think we're safe because we're
calling Fsignal with the global lock held, but it's not a pretty or
well-documented situation.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 13:32:02 GMT) Full text and rfc822 format available.

Message #29 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 16:31:24 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 12:57:51 +0000
> Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> 
> Lisp Backtrace:
> "sleep-for" (0xedf7a530)
> 0xf6da40 Lisp type 3
> 
> post_acquire_global_lock () can return abnormally (I didn't know
> that), so really_call_select() can, too, so thread_select() can, too.

Do you know which code sets current_thread->error_symbol, and what is
that error symbol?

> > > +  ptrdiff_t count = SPECPDL_INDEX ();
> >
> > I don't think we should do that at this low level.
> 
> You're right, it does stick out. I think we're safe because we're
> calling Fsignal with the global lock held, but it's not a pretty or
> well-documented situation.

Is this the main thread which does that?  if so, there should be no
problem with holding the global lock in this case, is there?

If this is not the main thread, then the error handlers should be set
so as to log the error in last_thread_error, and then terminate the
thread (via exiting the thread function, AFAIR).

If this is not what happens, I'd like to know what does happen here,
and why.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 13:42:02 GMT) Full text and rfc822 format available.

Message #32 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 13:40:15 +0000
On Fri, Jul 12, 2019 at 12:55 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 12 Jul 2019 12:44:35 +0000
> > Cc: 36609 <at> debbugs.gnu.org
> >
> > Okay, I'm still not sure this is really the problem Andreas was
> > seeing, but this code fails to work with xg_select:
> >
> > (let ((thread (make-thread (lambda ()
> >                  (sleep-for 3)))))
> >   (thread-yield)
> >   (thread-signal thread 'error nil))
>
> What do you mean by "fails"?  What do you see when you eval this?

With `emacs -Q', a blinking cursor and an unresponsive emacs (no
response to key presses, not even triple C-g). Sometimes, at least.

Here's a full backtrace of a session where the hang did happen:

Thread 4 (Thread 0x7fffeeb8a700 (LWP 26117)):
#0  0x00007ffff4967819 in __GI___poll (fds=0xd93070, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar = 18446744073709551100
        sc_cancel_oldtype = 0
#1  0x00007ffff698e136 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007ffff698e25c in g_main_context_iteration ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007fffeebd7ffd in  ()
    at /usr/lib/x86_64-linux-gnu/gio/modules/libdconfsettings.so
#4  0x00007ffff69b6415 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737198466816, -2332898995959237690,
140737488337566, 140737488337567, 140737198466816, 13933232,
2332932536188793798, 2332875456844002246},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#6  0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7fffef59e700 (LWP 26116)):
#0  0x00007ffff4967819 in __GI___poll (fds=0xcb7a20, nfds=2, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar = 18446744073709551100
        sc_cancel_oldtype = 0
#1  0x00007ffff698e136 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007ffff698e4c2 in g_main_loop_run ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007ffff6bba0d6 in  () at /lib/x86_64-linux-gnu/libgio-2.0.so.0
#4  0x00007ffff69b6415 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737209034496, -2332898995959237690,
140737488336926, 140737488336927, 140737209034496, 13323152,
2332932821804118982, 2332875456844002246},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#6  0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7ffff017d700 (LWP 26115)):
#0  0x00007ffff4967819 in __GI___poll (fds=0xb400d0, nfds=1, timeout=-1)
    at ../sysdeps/unix/sysv/linux/poll.c:29
        resultvar = 18446744073709551100
        sc_cancel_oldtype = 0
#1  0x00007ffff698e136 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007ffff698e25c in g_main_context_iteration ()
    at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007ffff698e2a1 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#4  0x00007ffff69b6415 in  () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#5  0x00007ffff4b38fa3 in start_thread (arg=<optimized out>)
    at pthread_create.c:486
        ret = <optimized out>
        pd = <optimized out>
        now = <optimized out>
        unwind_buf = {
          cancel_jmp_buf = {{
              jmp_buf = {140737221482240, -2332898995959237690,
140737488347710, 140737488347711, 140737221482240, 0,
2332865253915489222, 2332875456844002246},
              mask_was_saved = 0
            }},
          priv = {
            pad = {0x0, 0x0, 0x0, 0x0},
            data = {
              prev = 0x0,
              cleanup = 0x0,
              canceltype = 0
            }
          }
        }
        not_first_call = <optimized out>
#6  0x00007ffff49724cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ffff0e30e00 (LWP 26108)):
#0  0x00007ffff4b404c6 in pthread_sigmask
    (how=2, newmask=<optimized out>, oldmask=0x0)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:48
        local_newmask = {
          __val = {7, 17697270867008781056, 11498208, 5790296, 0,
140737488342800, 2, 0, 0, 0, 0, 0, 0, 0, 0, 6922490}
        }
        result = 0
#1  0x0000000000585a7d in restore_signal_mask (oldset=0x7fffffffcf10)
    at sysdep.c:876
#2  0x0000000000699292 in really_call_select (arg=0x7fffffffd060) at
thread.c:584
        sa = 0x7fffffffd060
        self = 0xa745c0 <main_thread>
        oldset = {
          __val = {0, 5710159, 8192, 126116988112, 140737488344608,
140737298839344, 7, 0, 11381952, 0, 8192, 1562936528, 140737488342720,
5968906924941129, 0, 16538645}
        }
#3  0x00000000005e5ee0 in flush_stack_call_func
    (func=0x699239 <really_call_select>, arg=0x7fffffffd060) at alloc.c:4969
        end = 0x7fffffffd020
        self = 0xa745c0 <main_thread>
        sentry = {
          o = {
            __max_align_ll = 0,
            __max_align_ld = <invalid float value>
          }
        }
#4  0x0000000000699389 in thread_select
    (func=0x419320 <pselect <at> plt>, max_fds=6, rfds=0x7fffffffd590,
wfds=0x7fffffffd510, efds=0x0, timeout=0x7fffffffd840, sigmask=0x0) at
thread.c:616
        sa = {
          func = 0x419320 <pselect <at> plt>,
          max_fds = 6,
          rfds = 0x7fffffffd590,
          wfds = 0x7fffffffd510,
          efds = 0x0,
          timeout = 0x7fffffffd840,
          sigmask = 0x0,
          result = -157757095
        }
#5  0x00000000006bfeac in xg_select
    (fds_lim=6, rfds=0x7fffffffd8e0, wfds=0x7fffffffd860, efds=0x0,
timeout=0x7fffffffd840, sigmask=0x0) at xgselect.c:117
        all_rfds = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        all_wfds = {
          fds_bits = {0 <repeats 16 times>}
        }
        tmo = {
          tv_sec = 5621546,
          tv_nsec = 15785445
        }
        tmop = 0x7fffffffd840
        context = 0xc1c200
        have_wfds = true
        gfds_buf = {{
            fd = 895,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 8096,
            events = 65535,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 16,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          } <repeats 11 times>, {
            fd = -2057698731,
            events = 39706,
            revents = 28662
          }, {
            fd = 3,
            events = 0,
            revents = 0
          }, {
            fd = -2057698731,
            events = 39706,
            revents = 28662
          }, {
            fd = -161788823,
            events = 32767,
            revents = 0
          }, {
            fd = -2057698731,
            events = 39642,
            revents = 28662
          }, {
            fd = -190624704,
            events = 32767,
            revents = 0
          }, {
            fd = 16,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = -504530176,
            events = 22038,
            revents = 62873
          }, {
            fd = 12553216,
            events = 0,
            revents = 0
          }, {
            fd = -1252392363,
            events = 44399,
            revents = 28662
          }, {
            fd = 32,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 12553216,
            events = 0,
            revents = 0
          }, {
            fd = -191458327,
            events = 32767,
            revents = 0
          }, {
            fd = -190624704,
            events = 32767,
            revents = 0
          }, {
            fd = 139264,
            events = 0,
            revents = 0
          }, {
            fd = 17472,
            events = 0,
            revents = 0
          }, {
            fd = -191893943,
            events = 32767,
            revents = 0
          }, {
            fd = 128,
            events = 0,
            revents = 0
          }, {
            fd = -191909562,
            events = 32767,
            revents = 0
          }, {
            fd = 48,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 7,
            events = 0,
            revents = 0
          }, {
            fd = 64,
            events = 0,
            revents = 0
          }, {
            fd = 4,
            events = 32767,
            revents = 0
          }, {
            fd = 11649056,
            events = 0,
            revents = 0
          }, {
            fd = 47,
            events = 0,
            revents = 0
          }, {
            fd = 96,
            events = 0,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 4,
            events = 49,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 91,
            events = 110,
            revents = 0
          }, {
            fd = 124,
            events = 119,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 17456,
            events = 0,
            revents = 0
          }, {
            fd = -190624704,
            events = 32767,
            revents = 0
          }, {
            fd = 48,
            events = 0,
            revents = 0
          }, {
            fd = 2,
            events = 0,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -191900310,
            events = 32767,
            revents = 0
          }, {
            fd = -664,
            events = 65535,
            revents = 65535
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 48,
            events = 0,
            revents = 0
          }, {
            fd = 187264016,
            events = 0,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -11392,
            events = 32767,
            revents = 0
          }, {
            fd = 5621546,
            events = 0,
            revents = 0
          }, {
            fd = 187264016,
            events = 0,
            revents = 0
          }, {
            fd = -11360,
            events = 32767,
            revents = 0
          }, {
            fd = 187280083,
            events = 0,
            revents = 0
          }, {
            fd = -11344,
            events = 32767,
            revents = 0
          }, {
            fd = 5622263,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 187280083,
            events = 0,
            revents = 0
          }, {
            fd = -11280,
            events = 32767,
            revents = 0
          }, {
            fd = 6170867,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 5627325,
            events = 0,
            revents = 0
          }, {
            fd = 6,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 1,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -11232,
            events = 32767,
            revents = 0
          }, {
            fd = 5621546,
            events = 0,
            revents = 0
          }, {
            fd = -134398935,
            events = 0,
            revents = 0
          }, {
            fd = -11200,
            events = 32767,
            revents = 0
          }, {
            fd = 187280099,
            events = 0,
            revents = 0
          }, {
            fd = -11184,
            events = 32767,
            revents = 0
          }, {
            fd = 5622263,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 187280099,
            events = 0,
            revents = 0
          }, {
            fd = -11120,
            events = 32767,
            revents = 0
          }, {
            fd = 6170867,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 5627325,
            events = 0,
            revents = 0
          }, {
            fd = 6,
            events = 0,
            revents = 0
          }, {
            fd = 1,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 1,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 187280099,
            events = 0,
            revents = 0
          }, {
            fd = -11072,
            events = 32767,
            revents = 0
          }, {
            fd = 5622263,
            events = 0,
            revents = 0
          }, {
            fd = -11024,
            events = 32767,
            revents = 0
          }, {
            fd = -134398935,
            events = 32767,
            revents = 0
          }, {
            fd = -10944,
            events = 32767,
            revents = 0
          }, {
            fd = 15785445,
            events = 0,
            revents = 0
          }, {
            fd = -11016,
            events = 32767,
            revents = 0
          }, {
            fd = 5623097,
            events = 0,
            revents = 0
          }, {
            fd = 11498208,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = 0,
            events = 0,
            revents = 0
          }, {
            fd = -10992,
            events = 32767,
            revents = 0
          }}
        gfds = 0x7fffffffd100
        gfds_size = 128
        n_gfds = -1
        retval = 0
        our_fds = 0
        max_fds = 5
        context_acquired = false
        i = 0
        nfds = 0
        tmo_in_millisec = 0
        must_free = 0
        need_to_dispatch = false
#6  0x000000000066b757 in wait_reading_process_output
    (time_limit=0, nsecs=0, read_kbd=-1, do_display=true,
wait_for_cell=XIL(0), wait_proc=0x0, just_wait_proc=0) at
process.c:5423
        process_skipped = false
        channel = 6
        nfds = 1
        Available = {
          fds_bits = {32, 0 <repeats 15 times>}
        }
        Writeok = {
          fds_bits = {0 <repeats 16 times>}
        }
        check_write = true
        check_delay = 0
        no_avail = false
        xerrno = 11
        proc = XIL(0xcf7c00)
        timeout = {
          tv_sec = 100000,
          tv_nsec = 0
        }
        end_time = {
          tv_sec = 0,
          tv_nsec = 140737488345168
        }
        timer_delay = {
          tv_sec = 0,
          tv_nsec = -1
        }
        got_output_end_time = {
          tv_sec = 0,
          tv_nsec = -1
        }
        wait = FOREVER
        got_some_output = -1
        prev_wait_proc_nbytes_read = 0
        retry_for_async = false
        count = 4
        now = {
          tv_sec = 0,
          tv_nsec = -1
        }
#7  0x000000000056c31e in kbd_buffer_get_event
    (kbp=0x7fffffffdbd8, used_mouse_menu=0x7fffffffe1bf, end_time=0x0)
    at keyboard.c:3836
        do_display = true
        obj = XIL(0x14bfd3fb)
#8  0x0000000000568706 in read_event_from_main_queue
    (end_time=0x0, local_getcjmp=0x7fffffffdf80, used_mouse_menu=0x7fffffffe1bf)
    at keyboard.c:2138
        c = XIL(0)
        save_jump = {{
            __jmpbuf = {0, 0, 0, 0, 0, 0, 0, 0},
            __mask_was_saved = 0,
            __saved_mask = {
              __val = {0 <repeats 16 times>}
            }
          }}
        kb = 0x55c9f7 <XSETCDR+28>
        count = 3
#9  0x0000000000568970 in read_decoded_event_from_main_queue
    (end_time=0x0, local_getcjmp=0x7fffffffdf80, prev_event=XIL(0),
used_mouse_menu=0x7fffffffe1bf) at keyboard.c:2202
        nextevt = XIL(0x7fffffffde30)
        frame = 0x0
        terminal = 0x0
        events =
          {XIL(0x7fffffffddb0), make_fixnum(1422534), XIL(0xaf72e0),
XIL(0), XIL(0), XIL(0x7fffffffddb0), make_fixnum(1405386),
XIL(0x135b623), XIL(0x7fffffffddf0), make_fixnum(1420782),
XIL(0xaf72e0), XIL(0x100000000), XIL(0), XIL(0x7fffffffddf0),
make_fixnum(1405386), XIL(0)}
        n = 0
#10 0x000000000056a002 in read_char
    (commandflag=1, map=XIL(0xfbaa33), prev_event=XIL(0),
used_mouse_menu=0x7fffffffe1bf, end_time=0x0) at keyboard.c:2810
        c = XIL(0)
        jmpcount = 3
        local_getcjmp = {{
            __jmpbuf = {0, 2332898996579464134, 16538645, 48, 0, 0,
2332898994838827974, -2332899704998988858},
            __mask_was_saved = 0,
            __saved_mask = {
              __val = {140737214405912, 0, 140737225904168, 11498208,
0, 0, 140737488347152, 5621546, 4032516088, 11498208, 0, 0,
140737488347200, 5621546, 12363987, 140737488347296}
            }
          }}
        save_jump = {{
            __jmpbuf = {0, 0, 140737224472304, 0, 0, 11529984, 5621546, 0},
            __mask_was_saved = -8336,
            __saved_mask = {
              __val = {6254137, 140737231486936, 12884893472, 0,
31776, 140737224472304, 140737231486936, 5623453, 51539607552,
140737224472309, 140737224472304, 5632971, 11529984, 11529984, 0,
11498208}
            }
          }}
        tem = XIL(0x7fffefa759f0)
        save = XIL(0)
        previous_echo_area_message = XIL(0)
        also_record = XIL(0)
        reread = false
        recorded = false
        polling_stopped_here = true
        orig_kboard = 0xc863c0
#11 0x0000000000576842 in read_key_sequence
    (keybuf=0x7fffffffe3a0, prompt=XIL(0), dont_downcase_last=false,
can_return_switch_frame=true, fix_current_buffer=true,
prevent_redisplay=false)
    at keyboard.c:9124
        interrupted_kboard = 0xc863c0
        interrupted_frame = 0xf976a0
        key = XIL(0xf97ac0)
        used_mouse_menu = false
        echo_local_start = 0
        last_real_key_start = 0
        keys_local_start = 0
        new_binding = XIL(0xf97ac0)
        count = 3
        t = 0
        echo_start = 0
        keys_start = 0
        current_binding = XIL(0xfbaa33)
        first_unbound = 31
        mock_input = 0
        used_mouse_menu_history = {false <repeats 30 times>}
        fkey = {
          parent = XIL(0xbe3cd3),
          map = XIL(0xbe3cd3),
          start = 0,
          end = 0
        }
        keytran = {
          parent = XIL(0x7ffff0ae0eeb),
          map = XIL(0x7ffff0ae0eeb),
          start = 0,
          end = 0
        }
        indec = {
          parent = XIL(0xbe3cc3),
          map = XIL(0xbe3cc3),
          start = 0,
          end = 0
        }
        shift_translated = false
        delayed_switch_frame = XIL(0)
        original_uppercase = XIL(0)
        original_uppercase_position = -1
        dummyflag = false
        starting_buffer = 0x7ffff04576f0
        fake_prefixed_keys = XIL(0)
        first_event = XIL(0)
        second_event = XIL(0)
#12 0x00000000005667ae in command_loop_1 () at keyboard.c:1348
        cmd = XIL(0x7fffffffe4f0)
        keybuf =
          {XIL(0x7fffffffe420), XIL(0x7ffff0b08008), XIL(0x100000007),
XIL(0), XIL(0), XIL(0x80a0), XIL(0x11f), XIL(0xaff380), XIL(0xaff380),
XIL(0), XIL(0x7fffffffe440), XIL(0x617021), make_fixnum(1073741824),
XIL(0x7fffffffe460), XIL(0xaf72e0), XIL(0), XIL(0),
XIL(0x7fffffffe440), make_fixnum(1405386), XIL(0xf04576f5),
XIL(0x7fffffffe4a0), XIL(0x61729f), XIL(0xaf72e0), XIL(0), XIL(0),
XIL(0x7fffffffe480), make_fixnum(1405386), XIL(0xf04576f5),
XIL(0x7fffffffe4c0), make_fixnum(1591385)}
        i = 0
        prev_modiff = 0
        prev_buffer = 0x0
        already_adjusted = false
#13 0x0000000000611d61 in internal_condition_case
    (bfun=0x566384 <command_loop_1>, handlers=XIL(0x90), hfun=0x565b7d
<cmd_error>) at eval.c:1351
        val = make_fixnum(1405386)
        c = 0xb99e70
#14 0x000000000056607a in command_loop_2 (ignore=XIL(0)) at keyboard.c:1091
        val = XIL(0)
#15 0x000000000061161a in internal_catch
    (tag=XIL(0xd140), func=0x566051 <command_loop_2>, arg=XIL(0)) at eval.c:1112
        val = XIL(0x7fffffffe5c0)
        c = 0xb94fd0
#16 0x000000000056601c in command_loop () at keyboard.c:1070
#17 0x0000000000565752 in recursive_edit_1 () at keyboard.c:714
        count = 1
        val = XIL(0x7fffffffe620)
#18 0x00000000005658d4 in Frecursive_edit () at keyboard.c:786
        count = 0
        buffer = XIL(0)
#19 0x0000000000563961 in main (argc=4, argv=0x7fffffffe878) at emacs.c:2086
        stack_bottom_variable = 0x5e00
        do_initial_setlocale = true
        no_loadup = false
        junk = 0x0
        dname_arg = 0x0
        ch_to_dir = 0x0
        original_pwd = 0x0
        dump_mode = 0x0
        skip_args = 0
        temacs = 0x0
        attempt_load_pdump = true
        rlim = {
          rlim_cur = 10022912,
          rlim_max = 18446744073709551615
        }
        sockfd = -1

i thr
  Id   Target Id                                        Frame
* 1    Thread 0x7ffff0e30e00 (LWP 26108) "emacs"        pthread_sigmask (how=0,
    newmask=<optimized out>, oldmask=0x7fffffffcf10)
    at ../sysdeps/unix/sysv/linux/pthread_sigmask.c:48
  2    Thread 0x7ffff017d700 (LWP 26115) "gmain"
0x00007ffff4967819 in __GI___poll (fds=0xb400d0, nfds=1, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29
  3    Thread 0x7fffef59e700 (LWP 26116) "gdbus"
0x00007ffff4967819 in __GI___poll (fds=0xcb7a20, nfds=2, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29
  4    Thread 0x7fffeeb8a700 (LWP 26117) "dconf worker"
0x00007ffff4967819 in __GI___poll (fds=0xd93070, nfds=1, timeout=-1)
at ../sysdeps/unix/sysv/linux/poll.c:29

> > Proposed patch attached.
>
> As I said earlier, I don't think it's right to use stack unwinding at
> this level.  (Did you try this in a -nw session?)

Sorry, I didn't see that email until after I'd sent mine. I did try
with an -nw session, now, but I really think thread_select should not
return abnormally.

> I'd appreciate a description of what you think happens here, and why.

When a thread is signalled (by thread-signal, which sets another
thread's error_symbol) while the signalled thread is inside a
select(), thread_select() will return non-locally for that thread, and
we fail to release an internal GLib lock through
g_main_context_release(). That's the first bug.

When xg_select() fails to acquire the internal GLib lock, it simply
does a select() on the remaining file descriptors:

  context_acquired = g_main_context_acquire (context);
  /* FIXME: If we couldn't acquire the context, we just silently proceed
     because this function handles more than just glib file descriptors.
     Note that, as implemented, this failure is completely silent: there is
     no feedback to the caller.  */

This seems like a second, albeit documented, bug to me. I think we're
risking not waking up from the actual select because another
(non-Emacs) thread happened to hold the main context at the time.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 13:52:01 GMT) Full text and rfc822 format available.

Message #35 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 16:51:22 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 13:40:15 +0000
> Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> 
> When a thread is signalled (by thread-signal, which sets another
> thread's error_symbol) while the signalled thread is inside a
> select(), thread_select() will return non-locally for that thread, and
> we fail to release an internal GLib lock through
> g_main_context_release(). That's the first bug.

We should either release the global lock before the thread exits, or
defer the acting upon the signal until later.  We cannot disable the
signal handling altogether because it is entirely legitimate to signal
another thread, and when we do, that other thread will _always_ be
inside thread_select.

For the main thread, handling the signal in that situation shouldn't
be a problem, because it is not going to exit.  Right?

> When xg_select() fails to acquire the internal GLib lock, it simply
> does a select() on the remaining file descriptors:

Why does it fail to acquire that lock?

>   context_acquired = g_main_context_acquire (context);
>   /* FIXME: If we couldn't acquire the context, we just silently proceed
>      because this function handles more than just glib file descriptors.
>      Note that, as implemented, this failure is completely silent: there is
>      no feedback to the caller.  */
> 
> This seems like a second, albeit documented, bug to me. I think we're
> risking not waking up from the actual select because another
> (non-Emacs) thread happened to hold the main context at the time.

So what is the proposal for that? spin waiting for the lock?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 13:53:02 GMT) Full text and rfc822 format available.

Message #38 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 13:51:48 +0000
On Fri, Jul 12, 2019 at 1:31 PM Eli Zaretskii <eliz <at> gnu.org> wrote:>
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 12 Jul 2019 12:57:51 +0000
> > Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> >
> > Lisp Backtrace:
> > "sleep-for" (0xedf7a530)
> > 0xf6da40 Lisp type 3
> >
> > post_acquire_global_lock () can return abnormally (I didn't know
> > that), so really_call_select() can, too, so thread_select() can, too.
>
> Do you know which code sets current_thread->error_symbol, and what is
> that error symbol?

thread-signal, which my example code calls directly. I passed 'error,
and that's what error_symbol is set to.

last_thread_error is (error) when the Emacs session freezes.

> > > > +  ptrdiff_t count = SPECPDL_INDEX ();
> > >
> > > I don't think we should do that at this low level.
> >
> > You're right, it does stick out. I think we're safe because we're
> > calling Fsignal with the global lock held, but it's not a pretty or
> > well-documented situation.
>
> Is this the main thread which does that?  if so, there should be no
> problem with holding the global lock in this case, is there?
>
> If this is not the main thread, then the error handlers should be set
> so as to log the error in last_thread_error, and then terminate the
> thread (via exiting the thread function, AFAIR).

Indeed, that's what happens; but the thread now fails to release the
GLib lock it had previously acquired, so other threads cannot acquire
it again, ever.

> If this is not what happens, I'd like to know what does happen here,
> and why.

Sure, we should understand what's going on here; even if the fix turns
out to be simple.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 14:36:01 GMT) Full text and rfc822 format available.

Message #41 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 14:34:44 +0000
On Fri, Jul 12, 2019 at 1:51 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 12 Jul 2019 13:40:15 +0000
> > Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> >
> > When a thread is signalled (by thread-signal, which sets another
> > thread's error_symbol) while the signalled thread is inside a
> > select(), thread_select() will return non-locally for that thread, and
> > we fail to release an internal GLib lock through
> > g_main_context_release(). That's the first bug.
>
> We should either release the global lock before the thread exits, or
> defer the acting upon the signal until later.  We cannot disable the
> signal handling altogether because it is entirely legitimate to signal
> another thread, and when we do, that other thread will _always_ be
> inside thread_select.

Really? What about thread-yield?

> For the main thread, handling the signal in that situation shouldn't
> be a problem, because it is not going to exit.  Right?

I think the main thread can still fail to release the lock...

> > When xg_select() fails to acquire the internal GLib lock, it simply
> > does a select() on the remaining file descriptors:
>
> Why does it fail to acquire that lock?

Because another thread holds it, either an Emacs or a non-Emacs
thread. In both cases, I think we might miss events unless we return
with errno == EINTR.

> >   context_acquired = g_main_context_acquire (context)
> >   /* FIXME: If we couldn't acquire the context, we just silently proceed
> >      because this function handles more than just glib file descriptors.
> >      Note that, as implemented, this failure is completely silent: there is
> >      no feedback to the caller.  */
> >
> > This seems like a second, albeit documented, bug to me. I think we're
> > risking not waking up from the actual select because another
> > (non-Emacs) thread happened to hold the main context at the time.
>
> So what is the proposal for that? spin waiting for the lock?

I don't know, to be honest, but I'm afraid there's currently no better
way. There's a way to take the lock without spinning, but it's been
deprecated.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 15:04:01 GMT) Full text and rfc822 format available.

Message #44 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 18:02:52 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 14:34:44 +0000
> Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> 
> > > When a thread is signalled (by thread-signal, which sets another
> > > thread's error_symbol) while the signalled thread is inside a
> > > select(), thread_select() will return non-locally for that thread, and
> > > we fail to release an internal GLib lock through
> > > g_main_context_release(). That's the first bug.
> >
> > We should either release the global lock before the thread exits, or
> > defer the acting upon the signal until later.  We cannot disable the
> > signal handling altogether because it is entirely legitimate to signal
> > another thread, and when we do, that other thread will _always_ be
> > inside thread_select.
> 
> Really? What about thread-yield?

What about it?

You are asking whether, when thread-signal is executed, the thread
which we are signaling is necessarily parked inside thread_select?  If
so, I don't understand your surprise: only one thread can ever be
running, and that is by definition the thread which calls
thread-signal.  All the other threads cannot be running, which means
they are parked either in thread_select or in sys_mutex_lock called
from acquire_global_lock.  Right?

As for thread-yield, I'm not sure I understand how is it related to
the issue we are discussing.

> > For the main thread, handling the signal in that situation shouldn't
> > be a problem, because it is not going to exit.  Right?
> 
> I think the main thread can still fail to release the lock...

Since the main thread doesn't exit, but just longjmp's to top-level,
and then continues to run, why does it need to release the lock?  Any
thread should hold the lock for as long as it runs.

> > > When xg_select() fails to acquire the internal GLib lock, it simply
> > > does a select() on the remaining file descriptors:
> >
> > Why does it fail to acquire that lock?
> 
> Because another thread holds it, either an Emacs or a non-Emacs
> thread.

OK, and why is it a problem that we continue calling thread_select
regardless of the failure to acquire the Glib lock?  is the problem
this one:

> In both cases, I think we might miss events unless we return
> with errno == EINTR.

?  Or is there something else?  If the problem with missing events,
then which events are those, and what bad things will happen if we
miss them?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 15:07:02 GMT) Full text and rfc822 format available.

Message #47 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 18:05:46 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 13:51:48 +0000
> Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> 
> > If this is not the main thread, then the error handlers should be set
> > so as to log the error in last_thread_error, and then terminate the
> > thread (via exiting the thread function, AFAIR).
> 
> Indeed, that's what happens; but the thread now fails to release the
> GLib lock it had previously acquired, so other threads cannot acquire
> it again, ever.

Then we should make sure we release it when the thread function exits,
or before we call Fsignal from post_acquire_global_lock.  There's no
reason to use the unwind-protect machinery for that, I think.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 18:08:02 GMT) Full text and rfc822 format available.

Message #50 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 18:06:29 +0000
On Fri, Jul 12, 2019 at 3:06 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > From: Pip Cet <pipcet <at> gmail.com>
> > Date: Fri, 12 Jul 2019 13:51:48 +0000
> > Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> >
> > > If this is not the main thread, then the error handlers should be set
> > > so as to log the error in last_thread_error, and then terminate the
> > > thread (via exiting the thread function, AFAIR).
> >
> > Indeed, that's what happens; but the thread now fails to release the
> > GLib lock it had previously acquired, so other threads cannot acquire
> > it again, ever.
>
> Then we should make sure we release it when the thread function exits,

That's too late, it's legitimate for the thread to have a signal
handler. We need to release the lock at precisely the time that
unbind_to would release it.

> or before we call Fsignal from post_acquire_global_lock.  There's no
> reason to use the unwind-protect machinery for that, I think.

If we call Fsignal, surely the unwind-protect machinery is already set
up and we can safely call it? So unbind_to would work again...

I guess I've changed my mind: the unwind-protect machinery does
precisely what we want, we should simply document that thread_select
can exit nonlocally and that the select functions need to be written
to cater to that.

Two places call xg_select, and they both run long after unbind_to works.

Doing this without the unwind-protect machinery is difficult: for
example, a comment in post_acquire_global_lock claims
unbind_for_thread_switch can exit nonlocally, though I don't think
that actually happens or we would have seen the bug report.

What's your proposal for doing this?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 18:28:01 GMT) Full text and rfc822 format available.

Message #53 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 21:27:40 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 18:06:29 +0000
> Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> 
> > Then we should make sure we release it when the thread function exits,
> 
> That's too late, it's legitimate for the thread to have a signal
> handler. We need to release the lock at precisely the time that
> unbind_to would release it.

I had also another proposal:

> > or before we call Fsignal from post_acquire_global_lock.  There's no
> > reason to use the unwind-protect machinery for that, I think.
> 
> If we call Fsignal, surely the unwind-protect machinery is already set
> up and we can safely call it? So unbind_to would work again...

Sorry, I don't want to call unwind-protect there.  Call me paranoid,
if you want.

> I guess I've changed my mind: the unwind-protect machinery does
> precisely what we want, we should simply document that thread_select
> can exit nonlocally and that the select functions need to be written
> to cater to that.
> 
> Two places call xg_select, and they both run long after unbind_to works.
> 
> Doing this without the unwind-protect machinery is difficult: for
> example, a comment in post_acquire_global_lock claims
> unbind_for_thread_switch can exit nonlocally, though I don't think
> that actually happens or we would have seen the bug report.
> 
> What's your proposal for doing this?

Like I said: release the lock before calling Fsignal.  We could do
that before calling unbind_for_thread_switch, if needed.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 18:35:02 GMT) Full text and rfc822 format available.

Message #56 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: pipcet <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 21:34:29 +0300
> Date: Fri, 12 Jul 2019 21:27:40 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
> 
> Sorry, I don't want to call unwind-protect there.  Call me paranoid,
> if you want.

Maybe I should explain the rationale behind that paranoia.  The main
reason is that you are proposing to do that inside code that can
switch threads.  Switching threads means switching to another stack
and also to another set of handlers.  So using the unwind-protect
machinery in this situation is IMO asking for trouble.

And then there's the TTY frame case, where C-g triggers SIGINT, and we
actually longjmp from wherever we were...




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 19:26:01 GMT) Full text and rfc822 format available.

Message #59 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 19:24:38 +0000
[Message part 1 (text/plain, inline)]
On Fri, Jul 12, 2019 at 6:34 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > Sorry, I don't want to call unwind-protect there.  Call me paranoid,
> > if you want.
>
> Maybe I should explain the rationale behind that paranoia.  The main

I don't think it's paranoia, I just wasn't sure there was a good way
to avoid it.

I'm now convinced that there simply is no safe way to call select()
from two threads at once when using glib.

I think our options are hacking around it and hoping nothing breaks
(this is what the attached patch does; it releases the main context
glib lock from the wrong thread soon "after" the other thread called
select, but there's actually no way to ensure that "after" is
accurate), or rewriting things so we have a single thread that does
all the select()ing.

> reason is that you are proposing to do that inside code that can
> switch threads.  Switching threads means switching to another stack
> and also to another set of handlers.  So using the unwind-protect
> machinery in this situation is IMO asking for trouble.

Thanks for explaining.
[glib-hack.diff (application/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 19:32:01 GMT) Full text and rfc822 format available.

Message #62 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 19:30:34 +0000
> > > We should either release the global lock before the thread exits, or
> > > defer the acting upon the signal until later.  We cannot disable the
> > > signal handling altogether because it is entirely legitimate to signal
> > > another thread, and when we do, that other thread will _always_ be
> > > inside thread_select.
> >
> > Really? What about thread-yield?
>
> What about it?
>
> You are asking whether, when thread-signal is executed, the thread
> which we are signaling is necessarily parked inside thread_select?  If
> so, I don't understand your surprise: only one thread can ever be
> running, and that is by definition the thread which calls
> thread-signal.  All the other threads cannot be running, which means
> they are parked either in thread_select or in sys_mutex_lock called
> from acquire_global_lock.  Right?

No, they might also be in the sys_thread_yield syscall, having
released the global lock but not yet reacquired it:

  release_global_lock ();
  sys_thread_yield (); <<<<< here
  acquire_global_lock (self);

> As for thread-yield, I'm not sure I understand how is it related to
> the issue we are discussing.

I'm not sure how it's relevant to assert that "that other thread will
_always_ be inside thread_select". I have an idea where you might be
going with that, but that idea wouldn't work (to release the lock from
the signalling thread, not the signalled thread that holds it).

> If the problem with missing events,
> then which events are those, and what bad things will happen if we
> miss them?

All events that glib knows about but Emacs doesn't. For example, a
glib timeout is apparently used to achieve some kind of scroll effect
on GTK menus, which is why we call xg_select from xmenu.c.

I don't know which libraries use glib-based threads, but I think dbus does, too.

I believe, but am not certain, that this also includes X events when
using GTK. That would explain the frozen sessions.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 12 Jul 2019 19:58:01 GMT) Full text and rfc822 format available.

Message #65 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 12 Jul 2019 22:57:05 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 19:24:38 +0000
> Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
> 
> I'm now convinced that there simply is no safe way to call select()
> from two threads at once when using glib.

I hope not, although GTK with its idiosyncrasies caused a lot of pain
for the threads implementation in Emacs.

> I think our options are hacking around it and hoping nothing breaks
> (this is what the attached patch does; it releases the main context
> glib lock from the wrong thread soon "after" the other thread called
> select, but there's actually no way to ensure that "after" is
> accurate), or rewriting things so we have a single thread that does
> all the select()ing.

Hmm... how would this work with your patch?  Suppose one thread calls
xg_select, acquires the Glib lock, sets its holding_glib_lock flag,
then releases the global Lisp lock and calls pselect.  Since the
global Lisp lock is now up for grabs, some other Lisp thread can
acquire it and start running.  If that other thread then calls
xg_select, it will hang forever trying to acquire the Glib lock,
because the first thread that holds it is stuck in pselect.

Am I missing something?

I know very little about GTK and the Glib context lock, but AFAIR we
really must treat that lock as a global one, not a thread-local one.
So I think it's okay for one thread to take the Glib lock, and another
to release it, because Glib just wants to know whether the "rest of
the program" has it, it doesn't care which thread is that which holds
the lock.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 06:51:02 GMT) Full text and rfc822 format available.

Message #68 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 09:50:02 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Fri, 12 Jul 2019 19:30:34 +0000
> Cc: politza <at> hochschule-trier.de, 36609 <at> debbugs.gnu.org
> 
> > > > We should either release the global lock before the thread exits, or
> > > > defer the acting upon the signal until later.  We cannot disable the
> > > > signal handling altogether because it is entirely legitimate to signal
> > > > another thread, and when we do, that other thread will _always_ be
> > > > inside thread_select.
> > >
> > > Really? What about thread-yield?
> >
> > What about it?
> >
> > You are asking whether, when thread-signal is executed, the thread
> > which we are signaling is necessarily parked inside thread_select?  If
> > so, I don't understand your surprise: only one thread can ever be
> > running, and that is by definition the thread which calls
> > thread-signal.  All the other threads cannot be running, which means
> > they are parked either in thread_select or in sys_mutex_lock called
> > from acquire_global_lock.  Right?
> 
> No, they might also be in the sys_thread_yield syscall, having
> released the global lock but not yet reacquired it:
> 
>   release_global_lock ();
>   sys_thread_yield (); <<<<< here
>   acquire_global_lock (self);

OK, but that, too, means the thread being signaled is not running,
right?  And I still think that a very frequent case, perhaps the most
frequent, is that the thread being signaled is inside thread_select.

> I'm not sure how it's relevant to assert that "that other thread will
> _always_ be inside thread_select".

OK, we've now established that the other thread could also be in
acquire_global_lock or (for a very short time) in sys_thread_yield.

> I have an idea where you might be going with that

I was merely pointing out that we cannot disable the signal handling
as a means to solve the problem.

> but that idea wouldn't work (to release the lock from the signalling
> thread, not the signalled thread that holds it).

Maybe we have a misunderstanding here.  I was talking about this part
of post_acquire_global_lock:

   /* We could have been signaled while waiting to grab the global lock
      for the first time since this thread was created, in which case
      we didn't yet have the opportunity to set up the handlers.  Delay
      raising the signal in that case (it will be actually raised when
      the thread comes here after acquiring the lock the next time).  */
  if (!NILP (current_thread->error_symbol) && handlerlist)
    {
      Lisp_Object sym = current_thread->error_symbol;
      Lisp_Object data = current_thread->error_data;

      current_thread->error_symbol = Qnil;
      current_thread->error_data = Qnil;
      Fsignal (sym, data);
    }

In this part, we have already switched to the thread that has been
signaled, so we are in the signaled thread, not in the signaling
thread.  I meant to release the lock before the call to Fsignal here.

> > If the problem with missing events,
> > then which events are those, and what bad things will happen if we
> > miss them?
> 
> All events that glib knows about but Emacs doesn't. For example, a
> glib timeout is apparently used to achieve some kind of scroll effect
> on GTK menus, which is why we call xg_select from xmenu.c.
> 
> I don't know which libraries use glib-based threads, but I think dbus does, too.
> 
> I believe, but am not certain, that this also includes X events when
> using GTK. That would explain the frozen sessions.

So is the problem that the Glib context is locked "forever", or is the
problem that it's locked by another Lisp thread, even if this lock is
short-lived?  If the former, then arranging for the release of that
lock when the signaled thread exits would solve the problem, right?
And if the problem is the latter one, then why didn't we hear about
this much earlier?  Can you show the bad effect from missing these
events without signaling a thread?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 14:39:02 GMT) Full text and rfc822 format available.

Message #71 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 14:37:25 +0000
[Message part 1 (text/plain, inline)]
On Fri, Jul 12, 2019 at 7:57 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > I'm now convinced that there simply is no safe way to call select()
> > from two threads at once when using glib.
>
> I hope not, although GTK with its idiosyncrasies caused a lot of pain
> for the threads implementation in Emacs.

Well, I think we're going to have to do one or more of the following:

- have a race condition
- access glib-locked data from the "wrong" thread (another Emacs thread)
- release the glib lock from the "wrong" thread (another Emacs thread)

Of these, the second is the best alternative, I think: we simply grab
the g_main_context lock globally, acting for all Emacs threads, and
the last thread to require it releases it when it leaves xg_select. As
long as there's at least one thread in the critical section of
xg_select, we hold the lock, but access to the context isn't
necessarily from the thread which locked it.

> > I think our options are hacking around it and hoping nothing breaks
> > (this is what the attached patch does; it releases the main context
> > glib lock from the wrong thread soon "after" the other thread called
> > select, but there's actually no way to ensure that "after" is
> > accurate), or rewriting things so we have a single thread that does
> > all the select()ing.
>
> Hmm... how would this work with your patch?  Suppose one thread calls
> xg_select, acquires the Glib lock, sets its holding_glib_lock flag,
> then releases the global Lisp lock and calls pselect.  Since the
> global Lisp lock is now up for grabs, some other Lisp thread can
> acquire it and start running.

And when it starts running, it releases the Glib lock.

> If that other thread then calls
> xg_select, it will hang forever trying to acquire the Glib lock,
> because the first thread that holds it is stuck in pselect.

The first thread no longer holds the Glib lock, it was released when
we switched threads.

> I know very little about GTK and the Glib context lock, but AFAIR we
> really must treat that lock as a global one, not a thread-local one.
> So I think it's okay for one thread to take the Glib lock, and another
> to release it, because Glib just wants to know whether the "rest of
> the program" has it, it doesn't care which thread is that which holds
> the lock.

Okay, that sounds like option #2 above. The attached patch exposes
glib externals to the generic code, but it appears to work. If you
think the approach is okay, I'll move the glib-specific parts to
xgselect.c (if that's okay).
[glib-hack-002.diff (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 15:04:01 GMT) Full text and rfc822 format available.

Message #74 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 18:03:35 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sat, 13 Jul 2019 14:37:25 +0000
> Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
> 
> Well, I think we're going to have to do one or more of the following:
> 
> - have a race condition
> - access glib-locked data from the "wrong" thread (another Emacs thread)
> - release the glib lock from the "wrong" thread (another Emacs thread)
> 
> Of these, the second is the best alternative, I think: we simply grab
> the g_main_context lock globally, acting for all Emacs threads, and
> the last thread to require it releases it when it leaves xg_select.

I agree.

> As long as there's at least one thread in the critical section of
> xg_select, we hold the lock, but access to the context isn't
> necessarily from the thread which locked it.

Hmm... wouldn't this cause us always hold that lock when there's more
than one Lisp thread?  And if so, what will that do to GTK and its own
threads, which also want to acquire the Glib context?

IOW, I guess I don't understand what problem would the proposed
changes solve that the current code doesn't.  Can you tell?

> > Hmm... how would this work with your patch?  Suppose one thread calls
> > xg_select, acquires the Glib lock, sets its holding_glib_lock flag,
> > then releases the global Lisp lock and calls pselect.  Since the
> > global Lisp lock is now up for grabs, some other Lisp thread can
> > acquire it and start running.
> 
> And when it starts running, it releases the Glib lock.

But only if its holding_glib_lock flag is set, which is not
necessarily true.  For example, what about a thread that was just
created, and never called xg_select yet?

> Okay, that sounds like option #2 above. The attached patch exposes
> glib externals to the generic code, but it appears to work. If you
> think the approach is okay, I'll move the glib-specific parts to
> xgselect.c (if that's okay).

Yes, we should have accessor functions in xgselect.c instead of
letting thread.c etc. access the GTK/Glib data directly.

But I still would like to understand why we need to maintain a flag
per thread.

Also, I think I lost the context: does this attempt to solve the
original problem reported by Andreas, or only the problem reported by
you later in the discussion?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 15:05:02 GMT) Full text and rfc822 format available.

Message #77 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Politz <politza <at> hochschule-trier.de>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 17:04:23 +0200
I ran my problematic code a couple thousand times on this patch without
freezes, while producing the expected results.

Thanks for figuring this out.

Andreas




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 15:14:02 GMT) Full text and rfc822 format available.

Message #80 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: pipcet <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 18:13:30 +0300
> Date: Sat, 13 Jul 2019 18:03:35 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
> 
> Also, I think I lost the context: does this attempt to solve the
> original problem reported by Andreas, or only the problem reported by
> you later in the discussion?

I guess Andreas just answered this question.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 15:55:02 GMT) Full text and rfc822 format available.

Message #83 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: pipcet <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 18:54:40 +0300
> Date: Sat, 13 Jul 2019 18:03:35 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
> 
> But I still would like to understand why we need to maintain a flag
> per thread.

Specifically, why not make this flag thread-independent, and have any
thread release the context lock if that variable is non-zero?  I don't
think it matters which thread locked the Glib context, we just need to
release it once we emerge from thread_select, and do it before we
might call Fsignal.  That the lock might be released by a thread other
than the one which locked it shouldn't matter, I think, as long as we
consider all the Lisp threads acting together on behalf of Emacs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 15:58:02 GMT) Full text and rfc822 format available.

Message #86 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 15:57:17 +0000
On Sat, Jul 13, 2019 at 3:54 PM Eli Zaretskii <eliz <at> gnu.org> wrote:
> > But I still would like to understand why we need to maintain a flag
> > per thread.

We don't. I had originally planned to make reacquiring the lock fail
gracefully if we can't spin-lock, but that didn't make it into the
patch, so it should be simplified as you suggest.

> Specifically, why not make this flag thread-independent, and have any
> thread release the context lock if that variable is non-zero?  I don't
> think it matters which thread locked the Glib context, we just need to
> release it once we emerge from thread_select, and do it before we
> might call Fsignal.  That the lock might be released by a thread other
> than the one which locked it shouldn't matter, I think, as long as we
> consider all the Lisp threads acting together on behalf of Emacs.

I agree entirely.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 16:04:02 GMT) Full text and rfc822 format available.

Message #89 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 19:02:37 +0300
> From: Pip Cet <pipcet <at> gmail.com>
> Date: Sat, 13 Jul 2019 15:57:17 +0000
> Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
> 
> > Specifically, why not make this flag thread-independent, and have any
> > thread release the context lock if that variable is non-zero?  I don't
> > think it matters which thread locked the Glib context, we just need to
> > release it once we emerge from thread_select, and do it before we
> > might call Fsignal.  That the lock might be released by a thread other
> > than the one which locked it shouldn't matter, I think, as long as we
> > consider all the Lisp threads acting together on behalf of Emacs.
> 
> I agree entirely.

Great, then I think we are in agreement.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 13 Jul 2019 18:19:01 GMT) Full text and rfc822 format available.

Message #92 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Pip Cet <pipcet <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org, politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Sat, 13 Jul 2019 18:17:41 +0000
[Message part 1 (text/plain, inline)]
Patch attached.

It's possible this'll livelock certain glib libraries that want to
modify the glib main context while another thread is running. I don't
see a good way to fix that, but a bad way would be to apply some
heuristic (such as a 10 ms delay between calling select() and assuming
it actually was called).
[0001-Fix-lock-failures-in-xg_select-bug-36609.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 21 Aug 2020 12:58:01 GMT) Full text and rfc822 format available.

Message #95 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Pip Cet <pipcet <at> gmail.com>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 36609 <at> debbugs.gnu.org,
 politza <at> hochschule-trier.de
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Fri, 21 Aug 2020 14:57:29 +0200
Pip Cet <pipcet <at> gmail.com> writes:

> Patch attached.
>
> It's possible this'll livelock certain glib libraries that want to
> modify the glib main context while another thread is running. I don't
> see a good way to fix that, but a bad way would be to apply some
> heuristic (such as a 10 ms delay between calling select() and assuming
> it actually was called).

Skimming this thread, it seems that Eli was in agreement with applying
it, and Andreas has tried it extensively and it fixes the problem, so
I've applied it to Emacs 28 now. 

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) fixed. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 21 Aug 2020 12:58:01 GMT) Full text and rfc822 format available.

bug marked as fixed in version 28.1, send any further explanations to 36609 <at> debbugs.gnu.org and Andreas Politz <politza <at> hochschule-trier.de> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Fri, 21 Aug 2020 12:58:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 19 Sep 2020 11:24:05 GMT) Full text and rfc822 format available.

bug unarchived. Request was from dick.r.chiang <at> gmail.com to control <at> debbugs.gnu.org. (Sun, 06 Jun 2021 15:43:01 GMT) Full text and rfc822 format available.

bug No longer marked as fixed in versions 28.1 and reopened. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 06 Jun 2021 15:43:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sun, 06 Jun 2021 15:51:01 GMT) Full text and rfc822 format available.

Message #108 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Andreas Politz <politza <at> hochschule-trier.de>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sun, 06 Jun 2021 11:50:06 -0400
[Message part 1 (text/plain, inline)]
Four attachments:

#1. Want to revert commit 9c62ffb
[0001-Revert-Fix-lock-failures-in-xg_select.patch (text/x-diff, attachment)]
[Message part 3 (text/plain, inline)]
#2. Fails on tip of master, succeeds after patch in #1.
[42.el (application/emacs-lisp, attachment)]
[Message part 5 (text/plain, inline)]
#3. Abridged original example, fails after patch in #1.
[report.el (application/emacs-lisp, attachment)]
[Message part 7 (text/plain, inline)]
Fails not necessarily because xgselect.c is wrong, but rather because
channel-recv blocks on a mutex before channel-send can get its act together.
This was hard for all to discern because OP seemed to have gone out of his way
to obfuscate his "minimum" example.

#4. What #3 probably intended, succeeds after patch in #1.
[report-2.el (application/emacs-lisp, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sun, 06 Jun 2021 16:36:02 GMT) Full text and rfc822 format available.

Message #111 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: larsi <at> gnus.org, pipcet <at> gmail.com, politza <at> hochschule-trier.de,
 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sun, 06 Jun 2021 19:35:31 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: Pip Cet <pipcet <at> gmail.com>, larsi <at> gnus.org, 36609 <at> debbugs.gnu.org,
>     Eli Zaretskii <eliz <at> gnu.org>
> Date: Sun, 06 Jun 2021 11:25:01 -0400
> 
> #1. Want to revert commit 9c62ffb

This will bring back bug#36609, so we cannot do that without
discussing first why you think that commit was wrong.

> #2. Fails on tip of master, succeeds after patch in #1.

Please explain what does "fails" mean, and why do you think the above
commit is the culprit.  (A much simpler test case will be
appreciated, btw.)

>     Fails not necessarily because xgselect.c is wrong, but rather because
>     channel-recv blocks on a mutex before channel-send can get its act together.

You mean, in this code:

   (let ((channel (make-channel 1)))
     (make-thread
      (lambda nil
	(channel-send (car channel) 42))
      "produce")
     (channel-recv (cdr channel))
     (ignore-errors (enable-command 'list-threads))
     (call-interactively #'list-threads))

?  Here, channel-send is called by a new thread, created by
make-thread.  In this code, it is _expected_ that channel-recv will be
called (by the main thread) _before_ channel-send is called by the new
thread, because make-thread creates a thread, but the newly created
thread doesn't run until it can acquire the global lock.  Meanwhile,
the main thread continues running and calls channel-recv.  The new
thread will not begin running, AFAIU, until the main thread calls
condition-wait inside channel-recv.

By "blocks on a mutex", did you mean that channel-recv blocks trying
to acquire the mutex here:

   (cl-defgeneric channel-recv ((sink channel-terminal))
     (with-mutex (oref sink mutex) <<<<<<<<<<<<<<<<<<<<<<<<<<
       (with-slots (condition msg-queue) sink

If so, which thread holds that mutex at this point?

> #4. What #3 probably intended, succeeds after patch in #1.

Yes, race conditions can be solved by using sleep-for, but that's not
really a clean solution, at least not in my book.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sun, 06 Jun 2021 19:11:02 GMT) Full text and rfc822 format available.

Message #114 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, politza <at> hochschule-trier.de, pipcet <at> gmail.com,
 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sun, 06 Jun 2021 15:10:43 -0400
EZ> This will bring back bug#36609, so we cannot do that without
EZ> discussing first why you think that commit was wrong.

By enforcing bijection between acquires and releases in xgselect.c --
cavalierly so as the releases now come out-of-band from another module
thread.c -- commit 9c62ffb admits deadlock.  The conditions under which that
happens are not at all rarefied as evidenced by `for i in 1 2 3 ; do src/emacs
-Q -l ./42.el & done ;`

Generally, a commit made to remedy a phantom problem should always be
suspect.  In this case OP was squatting on the main thread, so he deserves
what he got.

EZ> Yes, race conditions can be solved by using sleep-for, but that's not
EZ> really a clean solution, at least not in my book.

Case #4 was not meant to solve anything, only to set OP straight on what he
really wanted.  You can remove the sleep-for.  Still works under patch #1.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sun, 06 Jun 2021 19:28:02 GMT) Full text and rfc822 format available.

Message #117 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: larsi <at> gnus.org, politza <at> hochschule-trier.de, pipcet <at> gmail.com,
 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sun, 06 Jun 2021 22:27:45 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: larsi <at> gnus.org,  pipcet <at> gmail.com,  politza <at> hochschule-trier.de,
>   36609 <at> debbugs.gnu.org
> Date: Sun, 06 Jun 2021 15:10:43 -0400
> 
> EZ> This will bring back bug#36609, so we cannot do that without
> EZ> discussing first why you think that commit was wrong.
> 
> By enforcing bijection between acquires and releases in xgselect.c --
> cavalierly so as the releases now come out-of-band from another module
> thread.c -- commit 9c62ffb admits deadlock.  The conditions under which that
> happens are not at all rarefied as evidenced by `for i in 1 2 3 ; do src/emacs
> -Q -l ./42.el & done ;`

You will have to provide a more detailed analysis, sorry.  The fact
that some Lisp can cause deadlock does not prove anything, there are
too many ways to write such deadlocking code.

> Generally, a commit made to remedy a phantom problem should always be
> suspect.  In this case OP was squatting on the main thread, so he deserves
> what he got.

That was not the analysis of that test case.  See the discussion of
the bug.  The problems with the GTK lock are real, not phantom.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Wed, 09 Jun 2021 21:41:01 GMT) Full text and rfc822 format available.

Message #120 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: larsi <at> gnus.org, pipcet <at> gmail.com, politza <at> hochschule-trier.de,
 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Wed, 09 Jun 2021 17:40:28 -0400
[Message part 1 (text/plain, inline)]
EZ> You will have to provide a more detailed analysis, sorry.

There is no guarantee the static variable `threads_holding_glib_lock`
introduced in xgselect.c in commit 9c62ffb is synchronized across threads.  As
such, relying on it becoming zero in time for release_select_lock() is fraught.
(If you add print statements, this particular heisenbug disappears, at least
on Linux kernel 4.15.0-99-generic).

Four attachments:

1. Desired patch to master (reverts 9c62ffb, adds main-thread-p check).
[0001-Only-acquire-glib-context-if-main-thread.patch (text/x-diff, attachment)]
[Message part 3 (text/plain, inline)]
2. OP's original "my code doesn't work, here you figure it out", obfuscated MRE
Run as:
for i in 1 2 3 ; do src/emacs -Q -l ./report-orig.el & done
Fails if:
One of the emacsen stops responding
[report-orig.el (application/emacs-lisp, attachment)]
[Message part 5 (text/plain, inline)]
3. "Affine" redaction of OP's MRE *infrequently* breaks on tip of master (succeeds
after patch)
Run as:
for i in 1 2 3 ; do src/emacs -Q -l ./report.el & done
Fails if:
One of the emacsen stops responding
[report.el (application/emacs-lisp, attachment)]
[Message part 7 (text/plain, inline)]
4. My MRE *frequently* breaks tip of master (succeeds after patch)
[42.el (application/emacs-lisp, attachment)]
[Message part 9 (text/plain, inline)]
Run as:
for i in 1 2 3 ; do src/emacs -Q -l ./42.el & done
Fails if:
One of the emacsen stops responding

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 06:47:01 GMT) Full text and rfc822 format available.

Message #123 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: larsi <at> gnus.org, pipcet <at> gmail.com, politza <at> hochschule-trier.de,
 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 09:46:24 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: larsi <at> gnus.org,  politza <at> hochschule-trier.de,  pipcet <at> gmail.com,
>   36609 <at> debbugs.gnu.org
> Date: Wed, 09 Jun 2021 17:40:28 -0400
> 
> There is no guarantee the static variable `threads_holding_glib_lock`
> introduced in xgselect.c in commit 9c62ffb is synchronized across threads.

How can that happen?  It's a single word, and its change should
therefore be atomic.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 11:53:02 GMT) Full text and rfc822 format available.

Message #126 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 07:52:45 -0400
[Standard disclaimer about not being a compiler semanticist]

> a single word...  change should ... be atomic.

I've never heard that.  I googled around; one helpful and only
mildly sententious monograph on the topic is

https://stefansf.de/post/qualifier-volatile-c/

Adding the volatile qualifier to `threads_holding_glib_lock` did not 
resolve my MRE on my particular architecture (whatever that may be).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 14:20:02 GMT) Full text and rfc822 format available.

Message #129 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 17:18:54 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: 36609 <at> debbugs.gnu.org
> Date: Thu, 10 Jun 2021 07:52:45 -0400
> 
> > a single word...  change should ... be atomic.
> 
> I've never heard that.

The idea is that a variable that is modified in a single machine
instruction will always have the value assigned by the last thread
which set that.  You cannot have, for example, half of the bits of the
variable set by one thread and the other half by another thread.

> Adding the volatile qualifier to `threads_holding_glib_lock` did not 
> resolve my MRE on my particular architecture (whatever that may be).

So can you describe what you think happens in your scenario that the
offending change that added threads_holding_glib_lock causes?  Because
I still don't see the connection, to tell the truth.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 14:57:01 GMT) Full text and rfc822 format available.

Message #132 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 10:55:53 -0400
> You cannot have, for example, half of the bits of the variable set by one thread
> and the other half by another thread.

I dunno, this seems like a sophomoric, rose-colored view of variable sharing
without regard to caching, or really any generally accepted guarantees about
POSIX threads.

> So can you describe what you think happens in your scenario that the
> offending change that added threads_holding_glib_lock causes?

Two threads see `threads_holding_glib_lock` is "0" and both attempt
acquiring the lock (line 43, xgselect.c)

Or, two threads see `threads_holding_glib_lock` as "2" and glib
lock is never released. (line 37, xgselect.c)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 15:06:02 GMT) Full text and rfc822 format available.

Message #135 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 18:04:55 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: 36609 <at> debbugs.gnu.org
> Date: Thu, 10 Jun 2021 10:55:53 -0400
> 
> > You cannot have, for example, half of the bits of the variable set by one thread
> > and the other half by another thread.
> 
> I dunno, this seems like a sophomoric, rose-colored view of variable sharing
> without regard to caching, or really any generally accepted guarantees about
> POSIX threads.

Is it, really?

> > So can you describe what you think happens in your scenario that the
> > offending change that added threads_holding_glib_lock causes?
> 
> Two threads see `threads_holding_glib_lock` is "0" and both attempt
> acquiring the lock (line 43, xgselect.c)
> 
> Or, two threads see `threads_holding_glib_lock` as "2" and glib
> lock is never released. (line 37, xgselect.c)

And you can show a GDB backtrace with values of the variable that
supports that?  And tell how that explains the hang that you see on
the Lisp level?  These are the details that I was asking from the
first message you posted here.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 15:36:01 GMT) Full text and rfc822 format available.

Message #138 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dick.r.chiang <at> gmail.com, 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 17:35:54 +0200
On Jun 10 2021, Eli Zaretskii wrote:

> The idea is that a variable that is modified in a single machine
> instruction will always have the value assigned by the last thread
> which set that.

You cannot have that without explicit atomic operations.  An ordinary
RMW access is never atomic across cpus.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 15:41:01 GMT) Full text and rfc822 format available.

Message #141 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Andreas Schwab <schwab <at> linux-m68k.org>
Cc: dick.r.chiang <at> gmail.com, 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 18:39:50 +0300
> From: Andreas Schwab <schwab <at> linux-m68k.org>
> Cc: dick.r.chiang <at> gmail.com,  36609 <at> debbugs.gnu.org
> Date: Thu, 10 Jun 2021 17:35:54 +0200
> 
> On Jun 10 2021, Eli Zaretskii wrote:
> 
> > The idea is that a variable that is modified in a single machine
> > instruction will always have the value assigned by the last thread
> > which set that.
> 
> You cannot have that without explicit atomic operations.

We can do that if it turns out to be an issue.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Thu, 10 Jun 2021 21:37:01 GMT) Full text and rfc822 format available.

Message #144 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Thu, 10 Jun 2021 17:36:13 -0400
EZ> Is it, really?

Yes.  While I can't claim to be any more knowledgeable, the naivete of someone
protesting thread atomicity based on word-size is stunningly apparent.

EZ> And you can show a GDB backtrace with values of the variable that supports
EZ> that?

Alas I cannot.  My gdb skills are less impressive in the presence of threads,
and heisenbugs of this nature tend to mysteriously disappear when gdb's
leisurely pace gives caches ample time to flush.

I mentioned earlier I could not reproduce my MRE failure when I merely inserted
print statements.

My patch isn't quite as studied as Pip Cet's (my patch merely skirts locking
the glib unless you're main-thread-p), but it does fix OP and, unlike the
current state of the code, doesn't admit glaring theoretical flaws, even
though you seem to think those flaws could never materialize in practice
despite evidence (gdb-less as it is) to the contrary.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 11 Jun 2021 06:01:01 GMT) Full text and rfc822 format available.

Message #147 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Fri, 11 Jun 2021 09:00:07 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: 36609 <at> debbugs.gnu.org
> Date: Thu, 10 Jun 2021 17:36:13 -0400
> 
> EZ> Is it, really?
> 
> Yes.  While I can't claim to be any more knowledgeable, the naivete of someone
> protesting thread atomicity based on word-size is stunningly apparent.

I'm okay with replacing the simple increments and decrements of that
variable with atomic ones, using the likes of __atomic_fetch_add
intrinsics of GCC, if that solves the problem.  Did you try that?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 19 Jun 2021 17:54:01 GMT) Full text and rfc822 format available.

Message #150 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sat, 19 Jun 2021 20:53:15 +0300
> Date: Fri, 11 Jun 2021 09:00:07 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 36609 <at> debbugs.gnu.org
> 
> I'm okay with replacing the simple increments and decrements of that
> variable with atomic ones, using the likes of __atomic_fetch_add
> intrinsics of GCC, if that solves the problem.  Did you try that?

Ping!

Did you try that, and if so, did using the atomics solve the original
problem?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 19 Jun 2021 19:15:02 GMT) Full text and rfc822 format available.

Message #153 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sat, 19 Jun 2021 15:14:19 -0400
I enter a plea of lacking the programming know-how for atomics.

I stand by everything I've said thus far: an obfuscated MRE led to a
misdiagnosis, led to an unnecessary and highly thread-unsafe fix, which can be
remedied by the vulgar "end-run" patch I submitted, and which I test every day
with parallel-thread Gnus.

Unfortunately in this and every urological contest of wills I seem to have with
project maintainers, I, by construction, must always lose.

>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Fri, 11 Jun 2021 09:00:07 +0300
>> From: Eli Zaretskii <eliz <at> gnu.org>
>> Cc: 36609 <at> debbugs.gnu.org
>> 
>> I'm okay with replacing the simple increments and decrements of that
>> variable with atomic ones, using the likes of __atomic_fetch_add
>> intrinsics of GCC, if that solves the problem.  Did you try that?

> Ping!

> Did you try that, and if so, did using the atomics solve the original
> problem?

> Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 19 Jun 2021 19:19:01 GMT) Full text and rfc822 format available.

Message #156 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sat, 19 Jun 2021 22:18:43 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: 36609 <at> debbugs.gnu.org
> Date: Sat, 19 Jun 2021 15:14:19 -0400
> 
> I enter a plea of lacking the programming know-how for atomics.

If I suggest a patch, would you be willing to give it a try?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sat, 19 Jun 2021 21:13:01 GMT) Full text and rfc822 format available.

Message #159 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sat, 19 Jun 2021 17:12:32 -0400
"Suggest" or "Provide"?

If the former, well, my recalcitrance should be your answer.

If the latter, sure, although if you've written the patch, then running my
MREs would be epsilon more effort.

>>>>> Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: dick.r.chiang <at> gmail.com
>> Cc: 36609 <at> debbugs.gnu.org
>> Date: Sat, 19 Jun 2021 15:14:19 -0400
>> 
>> I enter a plea of lacking the programming know-how for atomics.

> If I suggest a patch, would you be willing to give it a try?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sun, 20 Jun 2021 11:40:02 GMT) Full text and rfc822 format available.

Message #162 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sun, 20 Jun 2021 14:39:31 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: 36609 <at> debbugs.gnu.org
> Date: Sat, 19 Jun 2021 17:12:32 -0400
> 
> "Suggest" or "Provide"?
> 
> If the former, well, my recalcitrance should be your answer.
> 
> If the latter, sure

Thanks, please try the patch below.

> although if you've written the patch, then running my MREs would be
> epsilon more effort.

You have a use case and wrote code with which you are familiar, so you
are in a better position to test it.  Moreover, I don't have access to
a system where these problems could happen, so it's far from epsilon
effort for me.

diff --git a/src/xgselect.c b/src/xgselect.c
index 0d91d55bad..9a90670b0f 100644
--- a/src/xgselect.c
+++ b/src/xgselect.c
@@ -34,12 +34,27 @@ static GMainContext *glib_main_context;
 
 void release_select_lock (void)
 {
+#if GNUC_PREREQ (4, 7, 0)
+  if (__atomic_sub_fetch (&threads_holding_glib_lock, 1, __ATOMIC_ACQ_REL))
+    g_main_context_release (glib_main_context);
+#else
   if (--threads_holding_glib_lock == 0)
     g_main_context_release (glib_main_context);
+#endif
 }
 
 static void acquire_select_lock (GMainContext *context)
 {
+#if GNUC_PREREQ (4, 7, 0)
+  if (__atomic_fetch_add (&threads_holding_glib_lock, 1, __ATOMIC_ACQ_REL) == 0)
+    {
+      glib_main_context = context;
+      while (!g_main_context_acquire (context))
+	{
+	  /* Spin. */
+	}
+    }
+#else
   if (threads_holding_glib_lock++ == 0)
     {
       glib_main_context = context;
@@ -48,6 +63,7 @@ static void acquire_select_lock (GMainContext *context)
 	  /* Spin. */
 	}
     }
+#endif
 }
 
 /* `xg_select' is a `pselect' replacement.  Why do we need a separate function?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sun, 20 Jun 2021 14:02:02 GMT) Full text and rfc822 format available.

Message #165 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: dick.r.chiang <at> gmail.com
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sun, 20 Jun 2021 10:01:37 -0400
Yes!  If you just add a negation to the __atomic_sub_fetch call, this passes
all the MREs.

It's not clear how you want to deal with the #else GNUC_PREREQ (4, 7, 0).

>>>>> "EZ" == Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: dick.r.chiang <at> gmail.com
>> Cc: 36609 <at> debbugs.gnu.org
>> Date: Sat, 19 Jun 2021 17:12:32 -0400
>>
>> "Suggest" or "Provide"?
>>
>> If the former, well, my recalcitrance should be your answer.
>>
>> If the latter, sure

EZ> Thanks, please try the patch below.

>> although if you've written the patch, then running my MREs would be epsilon
>> more effort.

EZ> You have a use case and wrote code with which you are familiar, so you are
EZ> in a better position to test it.  Moreover, I don't have access to a
EZ> system where these problems could happen, so it's far from epsilon effort
EZ> for me.

EZ> diff --git a/src/xgselect.c b/src/xgselect.c
EZ> index 0d91d55bad..9a90670b0f 100644
EZ> --- a/src/xgselect.c
EZ> +++ b/src/xgselect.c
EZ> @@ -34,12 +34,27 @@ static GMainContext *glib_main_context;

EZ>  void release_select_lock (void)
EZ>  {
EZ> +#if GNUC_PREREQ (4, 7, 0)
EZ> +  if (__atomic_sub_fetch (&threads_holding_glib_lock, 1, __ATOMIC_ACQ_REL))
EZ> +    g_main_context_release (glib_main_context);
EZ> +#else
EZ>    if (--threads_holding_glib_lock == 0)
EZ>      g_main_context_release (glib_main_context);
EZ> +#endif
EZ>  }

EZ>  static void acquire_select_lock (GMainContext *context)
EZ>  {
EZ> +#if GNUC_PREREQ (4, 7, 0)
EZ> +  if (__atomic_fetch_add (&threads_holding_glib_lock, 1, __ATOMIC_ACQ_REL) == 0)
EZ> +    {
EZ> +      glib_main_context = context;
EZ> +      while (!g_main_context_acquire (context))
EZ> +	{
EZ> +	  /* Spin. */
EZ> +	}
EZ> +    }
EZ> +#else
EZ>    if (threads_holding_glib_lock++ == 0)
EZ>      {
EZ>        glib_main_context = context;
EZ> @@ -48,6 +63,7 @@ static void acquire_select_lock (GMainContext *context)
EZ>  	  /* Spin. */
EZ>  	}
EZ>      }
EZ> +#endif
EZ>  }

EZ>  /* `xg_select' is a `pselect' replacement.  Why do we need a separate
EZ> function?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Sun, 20 Jun 2021 15:54:01 GMT) Full text and rfc822 format available.

Message #168 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50; Possible race-condition in threading
 implementation
Date: Sun, 20 Jun 2021 18:53:40 +0300
> From: dick.r.chiang <at> gmail.com
> Cc: 36609 <at> debbugs.gnu.org
> Date: Sun, 20 Jun 2021 10:01:37 -0400
> 
> Yes!  If you just add a negation to the __atomic_sub_fetch call

You mean, use the patch below instead?

> this passes all the MREs.

Thanks, will install soon.

> It's not clear how you want to deal with the #else GNUC_PREREQ (4, 7, 0).

By hoping no one uses this and expects threads to be stable enough
under GTK.

diff --git a/src/xgselect.c b/src/xgselect.c
index 0d91d55bad..92b118b955 100644
--- a/src/xgselect.c
+++ b/src/xgselect.c
@@ -34,12 +34,27 @@ static GMainContext *glib_main_context;
 
 void release_select_lock (void)
 {
+#if GNUC_PREREQ (4, 7, 0)
+  if (__atomic_sub_fetch (&threads_holding_glib_lock, 1, __ATOMIC_ACQ_REL) == 0)
+    g_main_context_release (glib_main_context);
+#else
   if (--threads_holding_glib_lock == 0)
     g_main_context_release (glib_main_context);
+#endif
 }
 
 static void acquire_select_lock (GMainContext *context)
 {
+#if GNUC_PREREQ (4, 7, 0)
+  if (__atomic_fetch_add (&threads_holding_glib_lock, 1, __ATOMIC_ACQ_REL) == 0)
+    {
+      glib_main_context = context;
+      while (!g_main_context_acquire (context))
+	{
+	  /* Spin. */
+	}
+    }
+#else
   if (threads_holding_glib_lock++ == 0)
     {
       glib_main_context = context;
@@ -48,6 +63,7 @@ static void acquire_select_lock (GMainContext *context)
 	  /* Spin. */
 	}
     }
+#endif
 }
 
 /* `xg_select' is a `pselect' replacement.  Why do we need a separate function?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36609; Package emacs. (Fri, 25 Jun 2021 13:55:02 GMT) Full text and rfc822 format available.

Message #171 received at 36609 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: dick.r.chiang <at> gmail.com
Cc: 36609 <at> debbugs.gnu.org
Subject: Re: bug#36609: 27.0.50;
 Possible race-condition in threading implementation
Date: Fri, 25 Jun 2021 16:54:15 +0300
> Date: Sun, 20 Jun 2021 18:53:40 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 36609 <at> debbugs.gnu.org
> 
> > From: dick.r.chiang <at> gmail.com
> > Cc: 36609 <at> debbugs.gnu.org
> > Date: Sun, 20 Jun 2021 10:01:37 -0400
> > 
> > Yes!  If you just add a negation to the __atomic_sub_fetch call
> 
> You mean, use the patch below instead?
> 
> > this passes all the MREs.
> 
> Thanks, will install soon.

Now done.




bug marked as fixed in version 28.1, send any further explanations to 36609 <at> debbugs.gnu.org and Andreas Politz <politza <at> hochschule-trier.de> Request was from dick <dick.r.chiang <at> gmail.com> to control <at> debbugs.gnu.org. (Fri, 25 Jun 2021 14:34:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sat, 24 Jul 2021 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 266 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.