GNU bug report logs - #13086
24.2.50; Emacs seems to hang at w32proc.c:1126

Previous Next

Package: emacs;

Reported by: Stephen Powell <stephen_powell <at> optusnet.com.au>

Date: Wed, 5 Dec 2012 07:23:01 UTC

Severity: normal

Merged with 13157

Found in versions 24.2.50, 24.3.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 13086 in the body.
You can then email your comments to 13086 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Wed, 05 Dec 2012 07:23:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Stephen Powell <stephen_powell <at> optusnet.com.au>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 05 Dec 2012 07:23:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Stephen Powell <stephen_powell <at> optusnet.com.au>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Wed, 05 Dec 2012 07:22:29 +0000
I'm using a checkout of the latest trunk.  While reading mail in gnus
emacs seems to hang using 100% cpu.  It is easy to reproduce.  I have a
live gdb session with the hang.

This is the gdb log:

(gdb) info threads

  Id   Target Id         Frame
* 8    Thread 7752.0x41c 0x776a000d in ntdll!LdrFindResource_U ()
   from C:\Windows\SysWOW64\ntdll.dll
  7    Thread 7752.0x1b24 0x776af8b1 in ntdll!RtlUpdateClonedSRWLock ()
   from C:\Windows\SysWOW64\ntdll.dll
  6    Thread 7752.0x18dc 0x776af8e5 in ntdll!RtlUpdateClonedSRWLock ()
   from C:\Windows\SysWOW64\ntdll.dll
  5    Thread 7752.0x1804 0x776af8b1 in ntdll!RtlUpdateClonedSRWLock ()
   from C:\Windows\SysWOW64\ntdll.dll
  4    Thread 7752.0x16cc 0x751078d7 in USER32!IsDialogMessage ()
   from C:\Windows\syswow64\user32.dll
  3    Thread 7752.0x11d0 0x776afd71 in ntdll!RtlFindSetBits ()
   from C:\Windows\SysWOW64\ntdll.dll
  2    Thread 7752.0x1784 0x776b013d in
ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
   from C:\Windows\SysWOW64\ntdll.dll
  1    Thread 7752.0x1bb0 waitpid (pid=5528, status=0x0, options=1) at
w32proc.c:1126

(gdb) thread apply all bt full

Thread 8 (Thread 7752.0x41c):
#0  0x776a000d in ntdll!LdrFindResource_U () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x7772f896 in ntdll!RtlQueryTimeZoneInformation () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x00960fb2 in ?? ()
No symbol table info available.
#3  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#4  0x6d4bffd4 in ?? ()
No symbol table info available.
#5  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#6  0x00000000 in ?? ()
No symbol table info available.

Thread 7 (Thread 7752.0x1b24):
#0  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x74686f0f in ?? () from C:\Windows\System32\mswsock.dll
No symbol table info available.
#3  0x746876a6 in ?? () from C:\Windows\System32\mswsock.dll
No symbol table info available.
#4  0x755e6b87 in WSAAddressToStringW () from C:\Windows\syswow64\ws2_32.dll
No symbol table info available.
#5  0x0105f78e in _sys_read_ahead (fd=5) at w32.c:6115
        nblock = 0
        cp = 0x144aaf0
        rc = 0
#6  0x01022520 in reader_thread (arg=0x144aaf0) at w32proc.c:952
        rc = 3
        cp = 0x144aaf0
#7  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#8  0x00c9ffd4 in ?? ()
No symbol table info available.
#9  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#10 0x0144aaf0 in child_procs ()
No symbol table info available.
#11 0x6d140fee in ?? ()
No symbol table info available.
#12 0x00000000 in ?? ()
No symbol table info available.

Thread 6 (Thread 7752.0x18dc):
#0  0x776af8e5 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776af8e5 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x7539dd54 in ReadFile () from C:\Windows\syswow64\KernelBase.dll
No symbol table info available.
#3  0x00000314 in ?? ()
No symbol table info available.
#4  0x00000000 in ?? ()
No symbol table info available.

Thread 5 (Thread 7752.0x1804):
#0  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776af8b1 in ntdll!RtlUpdateClonedSRWLock () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x753a149d in WaitForSingleObjectEx () from
C:\Windows\syswow64\KernelBase.dll
No symbol table info available.
#3  0x00000304 in ?? ()
No symbol table info available.
#4  0x00000000 in ?? ()
No symbol table info available.

Thread 4 (Thread 7752.0x16cc):
#0  0x751078d7 in USER32!IsDialogMessage () from
C:\Windows\syswow64\user32.dll
No symbol table info available.
#1  0x751078d7 in USER32!IsDialogMessage () from
C:\Windows\syswow64\user32.dll
No symbol table info available.
#2  0x7510790d in USER32!GetCursorPos () from C:\Windows\syswow64\user32.dll
No symbol table info available.
#3  0x6bb2fec0 in ?? ()
No symbol table info available.
#4  0x010e4717 in w32_msg_pump (msg_buf=0x6bb2ff28) at w32fns.c:2394
        msg = {
          hwnd = 0x40916,
          message = 49478,
          wParam = 0,
          lParam = 0,
          time = 447382693,
          pt = {
            x = 322,
            y = 1112
          }
        }
        result = 0
        focus_window = 0x403
#5  0x010e4955 in w32_msg_worker <at> 4 (arg=0x0) at w32fns.c:2620
        msg = {
          hwnd = 0x0,
          message = 0,
          wParam = 0,
          lParam = 0,
          time = 0,
          pt = {
            x = 0,
            y = 0
          }
        }
        dummy_buf = {
          next = 0x0,
          w32msg = {
            msg = {
              hwnd = 0x0,
              message = 0,
              wParam = 0,
              lParam = 0,
              time = 0,
              pt = {
                x = 0,
                y = 0
              }
            },
            dwModifiers = 0,
            rect = {
              left = 0,
              top = 0,
              right = 0,
              bottom = 0
            }
          },
          result = 0,
          completed = 0
        }
#6  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#7  0x6bb2ffd4 in ?? ()
No symbol table info available.
#8  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#9  0x00000000 in ?? ()
No symbol table info available.

Thread 3 (Thread 7752.0x11d0):
#0  0x776afd71 in ntdll!RtlFindSetBits () from C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x753a3bc8 in SleepEx () from C:\Windows\syswow64\KernelBase.dll
No symbol table info available.
#2  0x00000000 in ?? ()
No symbol table info available.

Thread 2 (Thread 7752.0x1784):
#0  0x776b013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
   from C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#1  0x776b013d in ntdll!RtlEnableEarlyCriticalSectionEventCreation ()
   from C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#2  0x776e2f51 in ntdll!RtlWeaklyEnumerateEntryHashTable () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#3  0x00000003 in ?? ()
No symbol table info available.
#4  0x00cfb2f8 in ?? ()
No symbol table info available.
#5  0x752033aa in KERNEL32!BaseCleanupAppcompatCacheSupport ()
   from C:\Windows\syswow64\kernel32.dll
No symbol table info available.
#6  0x6b17ffd4 in ?? ()
No symbol table info available.
#7  0x776c9ef2 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#8  0x00cfb2c8 in ?? ()
No symbol table info available.
#9  0x776c9ec5 in ntdll!RtlpNtSetValueKey () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#10 0x776e2e25 in ntdll!RtlWeaklyEnumerateEntryHashTable () from
C:\Windows\SysWOW64\ntdll.dll
No symbol table info available.
#11 0x00cfb2c8 in ?? ()
No symbol table info available.
#12 0x00000000 in ?? ()
No symbol table info available.

Thread 1 (Thread 7752.0x1bb0):
#0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126
        active = 0
        retval = 1
        nh = 0
        cp = 0x144a9e8
        cps = {0x6d550152, 0x9, 0xcc01b4, 0x0, 0xcc0000, 0xd01cc0,
0x140027, 0x5a40,
          0x1a0026, 0x5a40, 0x33267da <__register_frame_info+53635034>,
          0x3315832 <__register_frame_info+53565490>, 0xe0028, 0xd01c18,
0x1, 0xd01cb8,
          0x88f13c, 0x776be023 <ntdll!LdrWx86FormatVirtualImage+679>,
0x9, 0x88f1a4,
          0xd01cc0, 0xd01cb8, 0x88f1d0, 0x753a161b
<WaitForMultipleObjectsEx+306>, 0xcc0000,
          0x0, 0x753a1654 <WaitForMultipleObjectsEx+363>, 0x279bba63,
0x9, 0xd01cb0,
          0xd01c90, 0x24}
        wait_hnd = {0x1, 0x88f0a8, 0x11d8ff1 <get_next_msg+550>,
0x144df8c, 0x140027, 0x5a40,
          0x33267d8 <__register_frame_info+53635032>,
          0x370522a <__register_frame_info+57692714>, 0xcc9438,
0x88f0a8, 0x24, 0x0, 0x0,
          0xce7128, 0xcc9128, 0xcc924c, 0xd01cc0, 0xcc9658, 0x11c37d6
<w32_read_socket+5951>,
          0x88f044, 0x0, 0x88f1c0, 0x777071d5
<ntdll!AlpcMaxAllowedMessageLength+36229>,
          0x1ab62262, 0xfffffffe, 0x776be38c
<ntdll!LdrWx86FormatVirtualImage+1552>,
          0x776be0f2 <ntdll!LdrWx86FormatVirtualImage+886>, 0x0, 0x9,
0xd01c90,
          0x33267d8 <__register_frame_info+53635032>,
          0x776be38c <ntdll!LdrWx86FormatVirtualImage+1552>}
        timeout_ms = 0
        dont_wait = 1
#1  0x010e04ab in get_child_status (child=5528, status=0x0, options=1,
interruptible=false)
    at sysdep.c:290
        pid = -1
#2  0x010e054b in child_status_changed (child=5528, status=0x0,
options=0) at sysdep.c:333
No locals.
#3  0x0101f1b9 in handle_child_signal (sig=18) at process.c:6236
        deleted_pid = 5528
        all_pids_are_fixnums = false
        xpid = 22112
        tail = 92692214
#4  0x010e10cf in deliver_process_signal (sig=18, handler=0x101f118
<handle_child_signal>)
    at sysdep.c:1579
        old_errno = 0
        on_main_thread = true
#5  0x0101f3a3 in deliver_child_signal (sig=18) at process.c:6277
No locals.
#6  0x01023acd in sys_select (nfds=7, rfds=0x88f674, wfds=0x0, efds=0x0,
timeout=0x88f664,
    ignored=0x0) at w32proc.c:2032
        orfds = {
          bits = {121, 0}
        }
        timeout_ms = 189
        start_time = 447287689
        i = 7
        nh = 6
        nc = 2
        nr = 1
        active = 6
        cp = 0x144ab48
        cps = {0x144ab48, 0x144aa98, 0x6d6, 0x15846c40, 0x0, 0xb43e940,
0x0,
          0x35e2750 <__register_frame_info+56502096>, 0x7, 0x1, 0x88f5b8,
          0x102f0ae <gobble_input+294>, 0x34c0ee0
<__register_frame_info+55316192>, 0x88f574,
          0x24, 0x31, 0x26165dc0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x5277405,
          0x0, 0x0, 0x0, 0x0, 0x0}
        wait_hnd = {0x1e4, 0x1e0, 0x2ec, 0x340, 0x318, 0x124, 0x198,
0x30c, 0x7, 0x1,
          0x88f418, 0x11d8ff1 <get_next_msg+550>, 0x144df8c, 0x0, 0x10,
0x0, 0x5b06f40,
          0x88f40c, 0x88f418, 0x102f4c6 <unblock_input+22>, 0x0,
0x279dd5f7, 0x88f438, 0x0,
          0x708, 0x0, 0x88f558, 0x11c37d6 <w32_read_socket+5951>,
0x88f4b8, 0x0, 0xa5,
          0x1040001 <Finsert_file_contents+10749>, 0x18,
          0x331581a <__register_frame_info+53565466>, 0x88f468, 0x0,
0x0, 0x0, 0x708, 0x0,
          0x0, 0x88f4d0, 0x88f498, 0x102b3d1 <decode_timer+156>, 0x6d6,
0x15846c40,
          0xd9e9a240, 0x15846c40, 0x31, 0x6d6, 0x88f498, 0x10e0200
<EMACS_TIME_LT+38>, 0x0,
          0x279dd577, 0x88f4b8, 0x1058a20 <gettimeofday+17>, 0x708, 0x0,
0x88f4d8,
          0x10aced1 <decode_time_components+543>, 0x708, 0x0, 0xa5,
          0x1040001 <Finsert_file_contents+10749>, 0x18,
          0x331581a <__register_frame_info+53565466>, 0x88f4e8, 0x0,
0x0, 0x0, 0x708, 0x0,
          0x0, 0x88f550, 0x88f518, 0x102b3d1 <decode_timer+156>, 0x6d6,
0x15846c40,
          0xd9e9a240, 0x15846c40, 0x31, 0x6d6, 0x88f518, 0x10e0200
<EMACS_TIME_LT+38>, 0x0,
          0xb43e940, 0x6d6, 0x15846c40, 0x7, 0x1, 0x88f5a8, 0x102b714
<timer_check_2+828>,
          0x0, 0xb43e940, 0x6d6, 0x144df00}
        fdindex = {-1, 0, 3, 4, 5, 6, 0, 9, 53565466, 64, 8975112,
17756670, 0, 95461478,
          86471685, 53565490, 57693122, 0, 1000000, 95448640, 0, 0,
8975160, 1992716691,
          664654535, 0, 8975396, 7, 8975396, 839487998, 1992788202,
839487998, 3154135,
          10000000, 0, 7, 64, 1992788219, 216652739, 122163710,
30266038, 0, 95297760,
          95448896, 8975252, 2003599699, 8975256, 16854003, 95448832,
13641448, 8975248,
          2003558435, 1, 95448640, 95448672, 1992716691, 664654407, 0,
8975524, 7, 8975524,
          839487998, 1992788202, 839487998}
#7  0x0101cab9 in wait_reading_process_output (time_limit=0, nsecs=0,
read_kbd=-1,
    do_display=true, wait_for_cell=53565466, wait_proc=0x0,
just_wait_proc=0)
    at process.c:4544
        timeout_reduced_for_timers = 1
        channel = 7
        nfds = 1
        Available = {
          bits = {64, 0}
        }
        Writeok = {
          bits = {0, 0}
        }
        check_write = 0
        check_delay = 2
        no_avail = 0
        xerrno = 0
        proc = 93645125
        timeout = {
          tv_sec = 0,
          tv_nsec = 189000000
        }
        end_time = {
          tv_sec = 310000000,
          tv_nsec = 1800
        }
        wait_channel = -1
        got_some_input = 1
        count = 2
#8  0x0102aa7e in kbd_buffer_get_event (kbp=0x88f7bc,
used_mouse_menu=0x88fa53, end_time=0x0)
    at keyboard.c:3784
        obj = 16949402
#9  0x01028e14 in read_char (commandflag=1, nmaps=8, maps=0x88f960,
prev_event=53565466,
    used_mouse_menu=0x88fa53, end_time=0x0) at keyboard.c:2756
        kb = 0x1
        c = 53565466
        jmpcount = 2
        local_getcjmp = {8976632, 0, 0, 0, 8976268, 16942778, 8978372,
0, 8976676, 16800372,
          53670346, 53565466, 53593690, 53565442, 8976800, 53670466}
        save_jump = {0 <repeats 16 times>}
        tem = 18913693
        save = 84664620
        previous_echo_area_message = 53565466
        also_record = 53565466
        reread = false
        gcpro1 = {
          next = 0x88f800,
          var = 0x331c65a <__register_frame_info+53593690>,
          nvars = 8976376
        }
        gcpro2 = {
          next = 0x3335eea <__register_frame_info+53698282>,
          var = 0x331581a <__register_frame_info+53565466>,
          nvars = 53565466
        }
        polling_stopped_here = true
        orig_kboard = 0x3318680 <__register_frame_info+53577344>
#10 0x01032c44 in read_key_sequence (keybuf=0x88fbd0, bufsize=30,
prompt=53565466,
    dont_downcase_last=false, can_return_switch_frame=true,
fix_current_buffer=true)
    at keyboard.c:9201
        interrupted_kboard = 0x3318680 <__register_frame_info+53577344>
        interrupted_frame = 0x35e2750 <__register_frame_info+56502096>
        key = 53565466
        used_mouse_menu = false
        echo_local_start = 0
        last_real_key_start = 0
        keys_local_start = 0
        local_first_binding = 0
        from_string = 53565466
        count = 2
        t = 0
        echo_start = 0
        keys_start = 0
        nmaps = 8
        nmaps_allocated = 8
        defs = 0x88f920
        submaps = 0x88f960
        orig_local_map = 81398110
        orig_keymap = 53565466
        localized_local_map = 0
        first_binding = 0
        first_unbound = 31
        mock_input = 0
        fkey = {
          parent = 57361686,
          map = 57361686,
          start = 0,
          end = 0
        }
        keytran = {
          parent = 53554886,
          map = 53554886,
          start = 0,
          end = 0
        }
        indec = {
          parent = 57361678,
          map = 57361678,
          start = 0,
          end = 0
        }
        shift_translated = false
        delayed_switch_frame = 53565466
        original_uppercase = 53635034
        original_uppercase_position = -1
        dummyflag = false
        starting_buffer = 0x5a24600
        fake_prefixed_keys = 53565466
        gcpro1 = {
          next = 0x88faa8,
          var = 0x10056fe <set_internal+945>,
          nvars = 20976016
        }
#11 0x01026a27 in command_loop_1 () at keyboard.c:1448
        cmd = 81168962
        keybuf = {440, 456, 432, 20757259, 56502096, 53565466, 53565466,
18921393, 8977624,
          8977628, 8977448, 16933724, 90330742, 53565490, 8977487,
53565466, 53670346,
          53565466, 8977448, 56502096, 53577344, 2130567168, 8977544,
16933296, 90330742,
          8977487, 8977512, 16829167, 2, 55249086}
        i = 1
        prev_modiff = 99
        prev_buffer = 0x595a200
        already_adjusted = false
#12 0x0100b9ee in internal_condition_case (bfun=0x10266c4 <command_loop_1>,
    handlers=53615762, hfun=0x1026093 <cmd_error>) at eval.c:1192
        val = 55249086
        c = {
          tag = 53565466,
          val = 53565466,
          next = 0x88fd74,
          gcpro = 0x0,
          jmp = {8977720, 0, 0, 0, 8977548, 16824731, 8978372, 0, 2, 2,
1992732168, 8977624,
            8977704, 8977768, 1992740619, 8977704},
          backlist = 0x0,
          handlerlist = 0x0,
          lisp_eval_depth = 0,
          pdlcount = 2,
          poll_suppress_count = 0,
          interrupt_input_blocked = 0,
          byte_stack = 0x0
        }
        h = {
          handler = 53615762,
          var = 53565466,
          chosen_clause = 53565490,
          tag = 0x88fcc0,
          next = 0x0
        }
#13 0x0102643c in command_loop_2 (ignore=53565466) at keyboard.c:1163
        val = 0
#14 0x0100b565 in internal_catch (tag=53605618, func=0x1026418
<command_loop_2>, arg=53565466)
    at eval.c:963
        c = {
          tag = 53605618,
          val = 53565466,
          next = 0x0,
          gcpro = 0x0,
          jmp = {8977896, 2130567168, 0, 0, 8977756, 16823638, 8978372,
0, 53565466,
            53591552, 10623408, 10618832, 2130567168, 8977896, 16798209,
21265788},
          backlist = 0x0,
          handlerlist = 0x0,
          lisp_eval_depth = 0,
          pdlcount = 2,
          poll_suppress_count = 0,
          interrupt_input_blocked = 0,
          byte_stack = 0x0
        }
#15 0x010263f4 in command_loop () at keyboard.c:1142
No locals.
#16 0x01025cd8 in recursive_edit_1 () at keyboard.c:774
        count = 1
        val = 1992664130
#17 0x01025e29 in Frecursive_edit () at keyboard.c:838
        count = 0
        buffer = 53565466
#18 0x01002827 in main (argc=2, argv=0xa22ca0) at emacs.c:1560
        dummy = 8978152
        stack_bottom_variable = 0 '\000'
        do_initial_setlocale = true
        dumping = false
        skip_args = 0
        no_loadup = false
        junk = 0x0
        dname_arg = 0x0
        ch_to_dir = 0x0





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Wed, 05 Dec 2012 16:43:02 GMT) Full text and rfc822 format available.

Message #8 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>
Cc: 13086 <at> debbugs.gnu.org
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Wed, 05 Dec 2012 18:42:05 +0200
> Date: Wed, 05 Dec 2012 07:22:29 +0000
> From: Stephen Powell <stephen_powell <at> optusnet.com.au>
> 
> I'm using a checkout of the latest trunk.  While reading mail in gnus
> emacs seems to hang using 100% cpu.  It is easy to reproduce.  I have a
> live gdb session with the hang.

100% CPU usage is not a hang, it's probably an infloop.  Can you use
the 'finish' command, as described in etc/DEBUG, to find out where it
loops?

> Thread 1 (Thread 7752.0x1bb0):
> #0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126

Line 1126 of w32proc.c is the opening brace of the waitpid function.
I find it hard to believe it is looping there ;-)

So it's probably looping in the caller's loop, in get_child_status.
Can you see if that's the case, and if so, what value does waitpid
return to its caller?  Is that zero, by chance?

Also, is the process whose PID is 5528 still running?  If so, what
process is it?  Or is it a network connection?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Wed, 05 Dec 2012 19:29:02 GMT) Full text and rfc822 format available.

Message #11 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Powell <stephen_powell <at> optusnet.com.au>
To: eliz <at> gnu.org
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Wed, 05 Dec 2012 19:28:20 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

> Can you use the 'finish' command, as described in etc/DEBUG, to find
> out where it loops?

(gdb) thread 1
[Switching to thread 1 (Thread 7752.0x1bb0)]
#0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126
1126    {
(gdb) finish
Run till exit from #0  waitpid (pid=5528, status=0x0, options=1) at
w32proc.c:1126
0x010e04ab in get_child_status (child=5528, status=0x0, options=1,
interruptible=false)
    at sysdep.c:290
290       while ((pid = waitpid (child, status, options)) < 0)
Value returned is $1 = -1
(gdb) finish
Run till exit from #0  0x010e04ab in get_child_status (child=5528,
status=0x0, options=1,
    interruptible=false) at sysdep.c:290

Hung here.

> So it's probably looping in the caller's loop, in get_child_status.
> Can you see if that's the case, and if so, what value does waitpid
> return to its caller?  Is that zero, by chance?

-1

> Also, is the process whose PID is 5528 still running?

process 5528 was not found on 88RCB2S




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Wed, 05 Dec 2012 21:27:01 GMT) Full text and rfc822 format available.

Message #14 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>,
	Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Wed, 05 Dec 2012 23:25:54 +0200
> Date: Wed, 05 Dec 2012 19:28:20 +0000
> From: Stephen Powell <stephen_powell <at> optusnet.com.au>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > Can you use the 'finish' command, as described in etc/DEBUG, to find
> > out where it loops?
> 
> (gdb) thread 1
> [Switching to thread 1 (Thread 7752.0x1bb0)]
> #0  waitpid (pid=5528, status=0x0, options=1) at w32proc.c:1126
> 1126    {
> (gdb) finish
> Run till exit from #0  waitpid (pid=5528, status=0x0, options=1) at
> w32proc.c:1126
> 0x010e04ab in get_child_status (child=5528, status=0x0, options=1,
> interruptible=false)
>     at sysdep.c:290
> 290       while ((pid = waitpid (child, status, options)) < 0)
> Value returned is $1 = -1
> (gdb) finish
> Run till exit from #0  0x010e04ab in get_child_status (child=5528,
> status=0x0, options=1,
>     interruptible=false) at sysdep.c:290
> 
> Hung here.

I see.  I wonder why it returned -1 the first time, when the process
was still running or just exited.  Can you try setting breakpoints in
a way that will be able to explain that?

I also don't understand the logic of the loop in get_child_status.
Paul, can you tell why it loops if waitpid is called with WNOHANG and
returns -1?  Won't that cause a busy-wait loop that pegs the CPU?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 02:05:02 GMT) Full text and rfc822 format available.

Message #17 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org, Stephen Powell <stephen_powell <at> optusnet.com.au>
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Wed, 05 Dec 2012 18:04:04 -0800
On 12/05/2012 01:25 PM, Eli Zaretskii wrote:
> Paul, can you tell why it loops if waitpid is called with WNOHANG and
> returns -1?  Won't that cause a busy-wait loop that pegs the CPU?

If waitpid is called with WNOHANG and returns -1, then
errno must equal EINTR.  (There's an assertion for this.)
Since the wait was interrupted by a signal, it's supposed
to be retried.

Presumably Emacs is confused, and is waiting for a process
that it already waited for.  That's a bug and we should fix it.
I can't tell from the info so far whether the bug is in
the w32 implementation of waitpid, or in Emacs proper.

Was this emacs compiled with error-checking turned on
(./configure --enable-checking, or whatever equivalent is
use in the Microsoft Windows world)?
If so, I'm puzzled as to why the 'eassert (errno == EINTR)'
didn't fire.  If not, can you reproduce the bug with
error-checking enabled?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 03:52:02 GMT) Full text and rfc822 format available.

Message #20 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 05:51:30 +0200
> Date: Wed, 05 Dec 2012 18:04:04 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: Stephen Powell <stephen_powell <at> optusnet.com.au>, 
>  13086 <at> debbugs.gnu.org
> 
> On 12/05/2012 01:25 PM, Eli Zaretskii wrote:
> > Paul, can you tell why it loops if waitpid is called with WNOHANG and
> > returns -1?  Won't that cause a busy-wait loop that pegs the CPU?
> 
> If waitpid is called with WNOHANG and returns -1, then
> errno must equal EINTR.  (There's an assertion for this.)

Is that necessarily the case?  I don't see that in the Posix
specification, which says:

  RETURN VALUE

      If wait() or waitpid() returns because the status of a child
      process is available, these functions will return a value equal to
      the process ID of the child process for which status is
      reported. If wait() or waitpid() returns due to the delivery of a
      signal to the calling process, -1 will be returned and errno will
      be set to [EINTR]. If waitpid() was invoked with WNOHANG set in
      options, it has at least one child process specified by pid for
      which status is not available, and status is not available for any
      process specified by pid, 0 will be returned. Otherwise, (pid_t)-1
      will be returned, and errno will be set to indicate the error. 

   ERRORS

      The waitpid() function will fail if:

      [ECHILD]
	  The process or process group specified by pid does not exist
	  or is not a child of the calling process.
      [EINTR]
	  The function was interrupted by a signal. The value of the
	  location pointed to by stat_loc is undefined.
      [EINVAL]
	  The options argument is not valid. 

My reading of this, and specifically of the last sentence under
"RETURN VALUE", is that errno could also be ECHILD.  Am I missing
something?

> Presumably Emacs is confused, and is waiting for a process
> that it already waited for.  That's a bug and we should fix it.

Perhaps so, but inflooping in that case is hardly a Good Thing, is it?
And neither is aborting when asserts are enabled.  Perhaps signaling
an error would be better.

> I can't tell from the info so far whether the bug is in
> the w32 implementation of waitpid, or in Emacs proper.

I don't know yet either, but I thought I need to understand the logic
of that loop before I make up my mind, because the w32 waitpid
implementation should behave according to the expectations of its
callers.

> Was this emacs compiled with error-checking turned on
> (./configure --enable-checking, or whatever equivalent is
> use in the Microsoft Windows world)?
> If so, I'm puzzled as to why the 'eassert (errno == EINTR)'
> didn't fire.  If not, can you reproduce the bug with
> error-checking enabled?

Stephen, can you do that, please?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 05:08:01 GMT) Full text and rfc822 format available.

Message #23 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Wed, 05 Dec 2012 21:07:33 -0800
On 12/05/2012 07:51 PM, Eli Zaretskii wrote:
> My reading of this, and specifically of the last sentence under
> "RETURN VALUE", is that errno could also be ECHILD.

Yes, waitpid can return -1 with errno == ECHILD, just as
it can return -1 with errno == EINVAL.  But if Emacs is
invoking waitpid that way, then Emacs has a bug.

Emacs cannot invoke waitpid on any argument such that waitpid
could return -1 with errno == ECHILD, because if it does so
it will allow race conditions such as the races that led to
Bug#8855.

> inflooping in that case is hardly a Good Thing, is it?
> And neither is aborting when asserts are enabled.  Perhaps signaling
> an error would be better.

If we can't fix the bug, perhaps signaling an error is the
best we can do, but I'd rather fix the bug.  Generally speaking,
if there's an internal programming error, Emacs aborts
rather than signaling an error.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 05:08:02 GMT) Full text and rfc822 format available.

Message #26 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Powell <stephen_powell <at> optusnet.com.au>
To: eliz <at> gnu.org
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 05:07:23 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:
>> Was this emacs compiled with error-checking turned on
>> (./configure --enable-checking, or whatever equivalent is
>> use in the Microsoft Windows world)?
>> If so, I'm puzzled as to why the 'eassert (errno =3D=3D EINTR)'
>> didn't fire.  If not, can you reproduce the bug with
>> error-checking enabled?
>
> Stephen, can you do that, please?

No it wasn't enabled, done now.  After reading mail in gnus for a few
dozen messages this is the new gdb log:

Thread 1 (Thread 7928.0x1010):
#0  terminate_due_to_signal (sig=3D22, backtrace_limit=3D2147483647) at ema=
cs.c:314
No locals.
#1  0x0102232f in die (msg=3D0x156df4c "assertion failed: errno =3D=3D EINT=
R", file=3D0x156df26 "sysdep.c", line=3D294) at alloc.c:6499
No locals.
#2  0x0114a3bb in get_child_status (child=3D6436, status=3D0x0, options=3D1=
, interruptible=3Dfalse) at sysdep.c:294
        pid =3D -1
#3  0x0114a4b3 in child_status_changed (child=3D6436, status=3D0x0, options=
=3D0) at sysdep.c:333
No locals.
#4  0x0102ffaa in handle_child_signal (sig=3D18) at process.c:6236
        deleted_pid =3D 6436
        all_pids_are_fixnums =3D false
        xpid =3D 25744
        tail =3D 95266870
#5  0x0114b4d7 in deliver_process_signal (sig=3D18, handler=3D0x102fe9b <ha=
ndle_child_signal>) at sysdep.c:1579
        old_errno =3D 0
        on_main_thread =3D true
#6  0x01030320 in deliver_child_signal (sig=3D18) at process.c:6277
No locals.
#7  0x01035340 in sys_select (nfds=3D7, rfds=3D0x88f6b4, wfds=3D0x0, efds=
=3D0x0, timeout=3D0x88f6a4, ignored=3D0x0) at w32proc.c:2032
        orfds =3D {
          bits =3D {121, 0}
        }
        timeout_ms =3D 78
        start_time =3D 526288248
        i =3D 7
        nh =3D 6
        nc =3D 2
        nr =3D 1
        active =3D 6
        cp =3D 0x1676b48
        cps =3D {0x1676b48, 0x1676a98, 0x0, 0x4a62f80, 0x5, 0x396636c0, 0x0=
, 0x3867750 <__register_frame_info+59144016>, 0x7, 0x1, 0x88f5f8, 0x10486dc=
 <gobble_input+415>, 0x3745ee0 <__register_frame_info+57958112>, 0x88f5b4, =
0x24, 0x0, 0x19273580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x58637=
b5, 0x0, 0x0, 0x0, 0x0, 0x0}
        wait_hnd =3D {0x1e8, 0x1e4, 0x2e8, 0x304, 0x31c, 0x364, 0x380, 0x32=
0, 0x7, 0x1, 0x88f458, 0x12ecfac <get_next_msg+550>, 0x1679f8c, 0x0, 0x10, =
0x0, 0x88f418, 0x88f44c, 0x88f458, 0x1048af4 <unblock_input+22>, 0x0, 0xa24=
811c3, 0x88f478, 0x0, 0x0, 0x1dcd6500, 0x88f598, 0x12c61ad <w32_read_socket=
+7705>, 0x88f4f8, 0x0, 0x2d9, 0x1060001 <Finsert_file_contents+65>, 0x18, 0=
x359a81a <__register_frame_info+56207386>, 0x88f4a8, 0x0, 0x0, 0x7a120, 0x0=
, 0x0, 0x0, 0x88f510, 0x88f4d8, 0x1041005 <decode_timer+322>, 0x0, 0x4a62f8=
0, 0x4a62f80, 0x76c67193 <msvcrt!_itoa_s+1347>, 0xa2481153, 0x0, 0x88f5c4, =
0x7, 0x1, 0xa2481143, 0x88f4f8, 0x108a38f <gettimeofday+17>, 0x0, 0x1dcd650=
0, 0x88f518, 0x11097fb <decode_time_components+543>, 0x0, 0x1dcd6500, 0x2d9=
, 0x1060001 <Finsert_file_contents+65>, 0x18, 0x359a81a <__register_frame_i=
nfo+56207386>, 0x88f528, 0x0, 0x0, 0x7a120, 0x0, 0x0, 0x0, 0x88f590, 0x88f5=
58, 0x1041005 <decode_timer+322>, 0x0, 0x4a62f80, 0x4a62f80, 0x4a62f80, 0x0=
, 0x0, 0x88f558, 0x114a0f8 <EMACS_TIME_LT+38>, 0x5, 0x396636c0, 0x0, 0x4a62=
f80, 0x7, 0x1, 0x88f5e8, 0x1041502 <timer_check_2+1270>, 0x5, 0x396636c0, 0=
x0, 0x1679f00}
        fdindex =3D {-1, 0, 3, 4, 5, 6, 14089224, 6, 56207386, 64, 8975176,=
 18203154, 0, 525208227, -2, 2003792923, 4, 13369344, 1342179425, 97975520,=
 8975356, 2004228145, 8975224, 1992716691, -1572333837, 0, 8975460, 7, 8975=
460, 571139213, 1992788202, 571139213, 3154319, 10000000, 0, 7, 64, 1992788=
219, 216654055, -146185075, 30266221, 0, 96273952, 97976032, 8975368, 20038=
57877, 8975320, 16895143, 97976096, 2003580050, 14089216, 14089224, 1, 9797=
5520, 97975584, 1992716691, -1572333965, 0, 8975588, 7, 8975588, 571139213,=
 1992788202, 571139213}
#8  0x0102c033 in wait_reading_process_output (time_limit=3D28, nsecs=3D0, =
read_kbd=3D-1, do_display=3Dtrue, wait_for_cell=3D56207386, wait_proc=3D0x0=
, just_wait_proc=3D0) at process.c:4544
        timeout_reduced_for_timers =3D 1
        channel =3D 7
        nfds =3D 1
        Available =3D {
          bits =3D {64, 0}
        }
        Writeok =3D {
          bits =3D {0, 0}
        }
        check_write =3D 0
        check_delay =3D 2
        no_avail =3D 0
        xerrno =3D 0
        proc =3D 95467533
        timeout =3D {
          tv_sec =3D 0,=20
          tv_nsec =3D 78000000
        }
        end_time =3D {
          tv_sec =3D 1354769779,=20
          tv_nsec =3D 307000000
        }
        wait_channel =3D -1
        got_some_input =3D 1
        count =3D 2
#9  0x01100ac5 in sit_for (timeout=3D112, reading=3Dtrue, display_option=3D=
1) at dispnew.c:5907
        sec =3D 28
        nsec =3D 0
        do_display =3D true
#10 0x0103c823 in read_char (commandflag=3D1, nmaps=3D7, maps=3D0x88f960, p=
rev_event=3D56207386, used_mouse_menu=3D0x88fa53, end_time=3D0x0) at keyboa=
rd.c:2656
        tem0 =3D 56207386
        timeout =3D 28
        delay_level =3D 4
        buffer_size =3D 55
        c =3D 56207386
        jmpcount =3D 2
        local_getcjmp =3D {8976632, 0, 0, 0, 8976268, 17023232, 8978372, 0,=
 8976676, 16808689, 56312266, 56207386, 56235610, 56207362, 8976800, 563123=
86}
        save_jump =3D {0 <repeats 16 times>}
        tem =3D 20056477
        save =3D 87159004
        previous_echo_area_message =3D 56207386
        also_record =3D 56207386
        reread =3D false
        gcpro1 =3D {
          next =3D 0x88f800,=20
          var =3D 0x35a165a <__register_frame_info+56235610>,=20
          nvars =3D 8976376
        }
        gcpro2 =3D {
          next =3D 0x35baeea <__register_frame_info+56340202>,=20
          var =3D 0x359a81a <__register_frame_info+56207386>,=20
          nvars =3D 56207386
        }
        polling_stopped_here =3D false
        orig_kboard =3D 0x359d280 <__register_frame_info+56218240>
#11 0x010503ee in read_key_sequence (keybuf=3D0x88fbd0, bufsize=3D30, promp=
t=3D56207386, dont_downcase_last=3Dfalse, can_return_switch_frame=3Dtrue, f=
ix_current_buffer=3Dtrue) at keyboard.c:9201
        interrupted_kboard =3D 0x359d280 <__register_frame_info+56218240>
        interrupted_frame =3D 0x3867750 <__register_frame_info+59144016>
        key =3D 1256
        used_mouse_menu =3D false
        echo_local_start =3D 0
        last_real_key_start =3D 0
        keys_local_start =3D 0
        local_first_binding =3D 0
        from_string =3D 56207386
        count =3D 2
        t =3D 0
        echo_start =3D 0
        keys_start =3D 0
        nmaps =3D 7
        nmaps_allocated =3D 7
        defs =3D 0x88f920
        submaps =3D 0x88f960
        orig_local_map =3D 83881006
        orig_keymap =3D 56207386
        localized_local_map =3D 0
        first_binding =3D 0
        first_unbound =3D 31
        mock_input =3D 0
        fkey =3D {
          parent =3D 60004134,=20
          map =3D 60004134,=20
          start =3D 0,=20
          end =3D 0
        }
        keytran =3D {
          parent =3D 56196806,=20
          map =3D 56196806,=20
          start =3D 0,=20
          end =3D 0
        }
        indec =3D {
          parent =3D 60004126,=20
          map =3D 60004126,=20
          start =3D 0,=20
          end =3D 0
        }
        shift_translated =3D false
        delayed_switch_frame =3D 56207386
        original_uppercase =3D 56276954
        original_uppercase_position =3D -1
        dummyflag =3D false
        starting_buffer =3D 0x38db800 <__register_frame_info+59619328>
        fake_prefixed_keys =3D 56207386
        gcpro1 =3D {
          next =3D 0x3867750 <__register_frame_info+59144016>,=20
          var =3D 0x0,=20
          nvars =3D 56257586
        }
#12 0x010391e7 in command_loop_1 () at keyboard.c:1448
        cmd =3D 94126946
        keybuf =3D {128, 196, 84139489, 22412922, 8812368, 56207386, 562073=
86, 93540705, 8977624, 8977628, 8977448, 17008771, 96967310, 56207410, 8977=
487, 84271466, 56312266, 56207386, 8977448, 59144016, 56218240, 2130567168,=
 8977544, 17008128, 96967366, 8977487, 8977512, 16853932, 2, 57890958}
        i =3D 1
        prev_modiff =3D 7108
        prev_buffer =3D 0x38db800 <__register_frame_info+59619328>
        already_adjusted =3D false
#13 0x01011174 in internal_condition_case (bfun=3D0x1038cf7 <command_loop_1=
>, handlers=3D56257682, hfun=3D0x10384e3 <cmd_error>) at eval.c:1192
        val =3D 57890958
        c =3D {
          tag =3D 56207386,=20
          val =3D 56207386,=20
          next =3D 0x88fd74,=20
          gcpro =3D 0x0,=20
          jmp =3D {8977720, 0, 0, 0, 8977548, 16847137, 8978372, 0, 2, 2, 1=
992732168, 8977624, 8977704, 8977768, 1992740619, 8977704},=20
          backlist =3D 0x0,=20
          handlerlist =3D 0x0,=20
          lisp_eval_depth =3D 0,=20
          pdlcount =3D 2,=20
          poll_suppress_count =3D 0,=20
          interrupt_input_blocked =3D 0,=20
          byte_stack =3D 0x0
        }
        h =3D {
          handler =3D 56257682,=20
          var =3D 56207386,=20
          chosen_clause =3D 56207410,=20
          tag =3D 0x88fcc0,=20
          next =3D 0x0
        }
#14 0x01038963 in command_loop_2 (ignore=3D56207386) at keyboard.c:1163
        val =3D 0
#15 0x01010bc2 in internal_catch (tag=3D56247538, func=3D0x103893f <command=
_loop_2>, arg=3D56207386) at eval.c:963
        c =3D {
          tag =3D 56247538,=20
          val =3D 56207386,=20
          next =3D 0x0,=20
          gcpro =3D 0x0,=20
          jmp =3D {8977896, 2130567168, 0, 0, 8977756, 16845747, 8978372, 0=
, 56207386, 56233472, 10623464, 10618848, 2130567168, 8977896, 16803841, 23=
543164},=20
          backlist =3D 0x0,=20
          handlerlist =3D 0x0,=20
          lisp_eval_depth =3D 0,=20
          pdlcount =3D 2,=20
          poll_suppress_count =3D 0,=20
          interrupt_input_blocked =3D 0,=20
          byte_stack =3D 0x0
        }
#16 0x0103891b in command_loop () at keyboard.c:1142
No locals.
#17 0x01037e98 in recursive_edit_1 () at keyboard.c:774
        count =3D 1
        val =3D 1992664130
#18 0x010381d3 in Frecursive_edit () at keyboard.c:838
        count =3D 0
        buffer =3D 56207386
#19 0x01002a0a in main (argc=3D2, argv=3D0xa22f30) at emacs.c:1560
        dummy =3D 8978152
        stack_bottom_variable =3D 0 '\000'
        do_initial_setlocale =3D true
        dumping =3D false
        skip_args =3D 0
        no_loadup =3D false
        junk =3D 0x0
        dname_arg =3D 0x0
        ch_to_dir =3D 0x0







Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 07:37:02 GMT) Full text and rfc822 format available.

Message #29 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Wed, 05 Dec 2012 23:35:55 -0800
I found a race condition in delete-process that
looks like it might be relevant, and installed
a patch as trunk bzr 111122.  Could you please
give that a try?  If it doesn't fix the problem,
it might help to see whether the problem is related
to delete-process.  E.g., you might plant a breakpoint
on Fdelete_process and see what sort of calls are
being made to it.  Another possibility is to put
a breakpoint on waitpid, to see what arguments are
being passed to it and what it returns -- maybe
you can insert some code to print out that info
to a file.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 18:29:01 GMT) Full text and rfc822 format available.

Message #32 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 20:28:02 +0200
> Date: Wed, 05 Dec 2012 21:07:33 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> 
> > inflooping in that case is hardly a Good Thing, is it?
> > And neither is aborting when asserts are enabled.  Perhaps signaling
> > an error would be better.
> 
> If we can't fix the bug, perhaps signaling an error is the
> best we can do, but I'd rather fix the bug.  Generally speaking,
> if there's an internal programming error, Emacs aborts
> rather than signaling an error.

Yes, but we usually do that only if Emacs cannot possibly recover from
that internal error.  If Emacs _can_ continue, then we only abort via
eassert, so that a production version won't crash.  In this case, any
errno except EINTR can simply be ignored.  E.g., if waitpid was
somehow called to wait for a non-existing or wrong process, we don't
care about such a process anyway.

So how about the following change?

=== modified file 'src/sysdep.c'
--- src/sysdep.c	2012-12-03 21:42:12 +0000
+++ src/sysdep.c	2012-12-06 18:25:22 +0000
@@ -287,17 +287,20 @@ get_child_status (pid_t child, int *stat
      so that another thread running glib won't find them.  */
   eassert (0 < child);
 
-  while ((pid = waitpid (child, status, options)) < 0)
-    {
-      /* CHILD must be a child process that has not been reaped, and
-	 STATUS and OPTIONS must be valid.  */
-      eassert (errno == EINTR);
-
-      /* Note: the MS-Windows emulation of waitpid calls QUIT
-	 internally.  */
-      if (interruptible)
-	QUIT;
-    }
+  do {
+    pid = waitpid (child, status, options);
+    if (pid < 0)
+      {
+	/* CHILD must be a child process that has not been reaped, and
+	   STATUS and OPTIONS must be valid.  */
+	eassert (errno == EINTR);
+
+	/* Note: the MS-Windows emulation of waitpid calls QUIT
+	   internally.  */
+	if (errno == EINTR && interruptible)
+	  QUIT;
+      }
+  } while (pid < 0 && errno == EINTR);
 
   /* If successful and status is requested, tell wait_reading_process_output
      that it needs to wake up and look around.  */






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 18:42:02 GMT) Full text and rfc822 format available.

Message #35 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 20:41:30 +0200
> Date: Wed, 05 Dec 2012 23:35:55 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> 
> I found a race condition in delete-process that
> looks like it might be relevant, and installed
> a patch as trunk bzr 111122.  Could you please
> give that a try?

I also found some potential problem in w32proc.c, and fixed that in
revision 111132 on the trunk.  Stephen, please try the latest code and
see if the problem persists.

> If it doesn't fix the problem,
> it might help to see whether the problem is related
> to delete-process.  E.g., you might plant a breakpoint
> on Fdelete_process and see what sort of calls are
> being made to it.  Another possibility is to put
> a breakpoint on waitpid, to see what arguments are
> being passed to it and what it returns -- maybe
> you can insert some code to print out that info
> to a file.

Yes, if Emacs still loops where it did in your original report, please
tell what does waitpid set errno to, and through which 'return' line
does it return to its caller.

I find DebPrint useful for this -- is fast and causes GDB to display a
string when you run Emacs under GDB.  You will see examples of its
usage in w32proc.c and elsewhere.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 19:12:01 GMT) Full text and rfc822 format available.

Message #38 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Powell <stephen_powell <at> optusnet.com.au>
To: eliz <at> gnu.org
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 19:10:54 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

> I also found some potential problem in w32proc.c, and fixed that in
> revision 111132 on the trunk.  Stephen, please try the latest code and
> see if the problem persists.

OK, I tried that revision with the same problem.

I don't really know what I'm doing, but one odd thing I noticed while
playing around with Paul's suggestion to set a breakpoint on
delete_process:

1. Set a breakpoint on process.c:808. Print p.pid and do a xbacktrace.
2. Run gnus.
3. Gnus uses imap.el to get mail from my host.
4. Imap.el calls delete-process from imap-close

$1 = 1356
"delete-process" (0x88bf48)
"imap-close" (0x88c248)
"mail-source-fetch-imap" (0x88c584)

5. Imap.el calls delete-process to delete the same pid from
imap-sentinel

$2 = 1356
"delete-process" (0x88b928)
"imap-sentinel" (0x88bc24)
"delete-process" (0x88bf48)
"imap-close" (0x88c248)
"mail-source-fetch-imap" (0x88c584)

6. Let gnus sit for about a minute.  The error is signalled with the
same pid

Breakpoint 1, terminate_due_to_signal (sig=22,
backtrace_limit=2147483647) at emacs.c:314
314	  signal (sig, SIG_DFL);
(gdb) up 2
#2  0x0114a8db in get_child_status (child=1356, status=0x0, options=1,
interruptible=false) at sysdep.c:294
294	      eassert (errno == EINTR);





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 19:37:01 GMT) Full text and rfc822 format available.

Message #41 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 21:36:15 +0200
> Date: Thu, 06 Dec 2012 19:10:54 +0000
> From: Stephen Powell <stephen_powell <at> optusnet.com.au>
> CC: eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > I also found some potential problem in w32proc.c, and fixed that in
> > revision 111132 on the trunk.  Stephen, please try the latest code and
> > see if the problem persists.
> 
> OK, I tried that revision with the same problem.
> 
> I don't really know what I'm doing, but one odd thing I noticed while
> playing around with Paul's suggestion to set a breakpoint on
> delete_process:
> 
> 1. Set a breakpoint on process.c:808. Print p.pid and do a xbacktrace.
> 2. Run gnus.
> 3. Gnus uses imap.el to get mail from my host.
> 4. Imap.el calls delete-process from imap-close
> 
> $1 = 1356
> "delete-process" (0x88bf48)
> "imap-close" (0x88c248)
> "mail-source-fetch-imap" (0x88c584)
> 
> 5. Imap.el calls delete-process to delete the same pid from
> imap-sentinel
> 
> $2 = 1356
> "delete-process" (0x88b928)
> "imap-sentinel" (0x88bc24)
> "delete-process" (0x88bf48)
> "imap-close" (0x88c248)
> "mail-source-fetch-imap" (0x88c584)
> 
> 6. Let gnus sit for about a minute.  The error is signalled with the
> same pid
> 
> Breakpoint 1, terminate_due_to_signal (sig=22,
> backtrace_limit=2147483647) at emacs.c:314
> 314	  signal (sig, SIG_DFL);
> (gdb) up 2
> #2  0x0114a8db in get_child_status (child=1356, status=0x0, options=1,
> interruptible=false) at sysdep.c:294
> 294	      eassert (errno == EINTR);

What is the value of errno in frame #2?  Also, can you tell through
which line does waitpid exit in this case?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 19:49:01 GMT) Full text and rfc822 format available.

Message #44 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: stephen_powell <at> optusnet.com.au
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 21:48:33 +0200
> > I don't really know what I'm doing, but one odd thing I noticed while
> > playing around with Paul's suggestion to set a breakpoint on
> > delete_process:
> > 
> > 1. Set a breakpoint on process.c:808. Print p.pid and do a xbacktrace.
> > 2. Run gnus.
> > 3. Gnus uses imap.el to get mail from my host.
> > 4. Imap.el calls delete-process from imap-close
> > 
> > $1 = 1356
> > "delete-process" (0x88bf48)
> > "imap-close" (0x88c248)
> > "mail-source-fetch-imap" (0x88c584)
> > 
> > 5. Imap.el calls delete-process to delete the same pid from
> > imap-sentinel
> > 
> > $2 = 1356
> > "delete-process" (0x88b928)
> > "imap-sentinel" (0x88bc24)
> > "delete-process" (0x88bf48)
> > "imap-close" (0x88c248)
> > "mail-source-fetch-imap" (0x88c584)

Can you see if the first of these calls to delete-process succeeds to
reap the process?  IOW, is it the second call to delete-process that
causes the trouble, e.g., because it asks Emacs to reap a process that
was already reaped?

To see this, put a breakpoint inside get_child_status, or maybe inside
waitpid, and define commands that just display the child PID and
continue, like this:

  (gdb) break waitpid
  (gdb) commands
   > p pid
   > continue
   > end
  (gdb)

Then see how many times waitpid is invoked for the same PID, and
whether it causes assertion violation on the first or the second time
(if there is a second time).

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:19:02 GMT) Full text and rfc822 format available.

Message #47 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Powell <stephen_powell <at> optusnet.com.au>
To: eliz <at> gnu.org
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 20:17:45 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

Answering your first mail:

> What is the value of errno in frame #2?

(gdb) p errno
$25 = 10

> Also, can you tell through which line does waitpid exit in this case?

Breakpoint 5, waitpid (pid=4860, status=0x0, options=1) at w32proc.c:1190
1190		  return -1;
$24 = 4860


> Then see how many times waitpid is invoked for the same PID, and
> whether it causes assertion violation on the first or the second time
> (if there is a second time).

I started a new session for this question so the pids will be differant
from the answer to your first mail.

It only does a waitpid once for the offending pid:

Breakpoint 3, waitpid (pid=4880, status=0x0, options=1) at w32proc.c:1132

Breakpoint 1, terminate_due_to_signal (sig=22,
backtrace_limit=2147483647) at emacs.c:314
314	  signal (sig, SIG_DFL);
(gdb) up 2
#2  0x0114a8db in get_child_status (child=4880, status=0x0, options=1,
interruptible=false) at sysdep.c:294
294	      eassert (errno == EINTR);




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:19:02 GMT) Full text and rfc822 format available.

Message #50 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 12:18:36 -0800
On 12/06/12 10:28, Eli Zaretskii wrote:
> Yes, but we usually do that only if Emacs cannot possibly recover from
> that internal error.  If Emacs _can_ continue, then we only abort via
> eassert, so that a production version won't crash.

Well, it depends on what "recovery" means.  In this case, if
Emacs ignores the error and continues, it will have a confused
data structure that can cause it to kill unrelated innocent-victim
processes seemingly at random.  Such a bug would be rare and hard to
track down -- I would not be surprised if users have run across
this bug in the wild but have not known that Emacs was the culprit
and have assumed it was a Heisenbug in the innocent-victim program.

One possibility is for Emacs to fall back into recovery mode, such
as what it does now when it runs out of memory.  That is, Emacs
would tell you that it has had an internal error and that you should
save your work and exit as soon as you can.  This would lessen
the likelihood of the confused Emacs causing further damage.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:27:01 GMT) Full text and rfc822 format available.

Message #53 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 22:25:29 +0200
> Date: Thu, 06 Dec 2012 12:18:36 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> 
> On 12/06/12 10:28, Eli Zaretskii wrote:
> > Yes, but we usually do that only if Emacs cannot possibly recover from
> > that internal error.  If Emacs _can_ continue, then we only abort via
> > eassert, so that a production version won't crash.
> 
> Well, it depends on what "recovery" means.  In this case, if
> Emacs ignores the error and continues, it will have a confused
> data structure that can cause it to kill unrelated innocent-victim
> processes seemingly at random.

How can that happen, if PID is not our child process?

And even if it does happen, the hypothetical problem you envision
hardly justify losing an Emacs session.

> One possibility is for Emacs to fall back into recovery mode, such
> as what it does now when it runs out of memory.  That is, Emacs
> would tell you that it has had an internal error and that you should
> save your work and exit as soon as you can.  This would lessen
> the likelihood of the confused Emacs causing further damage.

It is much easier to remove the offending process object from the list
of those we expect to be dead.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:37:02 GMT) Full text and rfc822 format available.

Message #56 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 12:36:05 -0800
On 12/06/12 12:25, Eli Zaretskii wrote:
 
> How can that happen, if PID is not our child process?

Because Emacs regularly kills processes that it thinks are its
children.  If I'm running several applications, one of them
can kill the other even if the applications are otherwise
unrelated -- that's standard Unix semantics.  So if
Emacs mistakenly thinks that it has a child with PID 1234
and does the equivalent of "kill -9 1234", which is something
that delete-process does, Emacs can kill an innocent and
unrelated victim.

> the hypothetical problem you envision
> hardly justify losing an Emacs session.

It depends on how important the other process is to the user.
Sometimes, the other process is more important than Emacs,
and the user would rather lose this Emacs session than lose
the other process.

>> One possibility is for Emacs to fall back into recovery mode
> 
> It is much easier to remove the offending process object from the list
> of those we expect to be dead.

Sure, but there's a good case for being cautious here, even if
it's a bit harder to be cautious.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:38:01 GMT) Full text and rfc822 format available.

Message #59 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 22:36:44 +0200
> Date: Thu, 06 Dec 2012 20:17:45 +0000
> From: Stephen Powell <stephen_powell <at> optusnet.com.au>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> Answering your first mail:
> 
> > What is the value of errno in frame #2?
> 
> (gdb) p errno
> $25 = 10

That's ECHILD.

> > Also, can you tell through which line does waitpid exit in this case?
> 
> Breakpoint 5, waitpid (pid=4860, status=0x0, options=1) at w32proc.c:1190
> 1190		  return -1;
> $24 = 4860

It seems we are looking for a process that is not on our list of
watched child processes.  How could that happen?

Can you look at the contents of the child_procs array, and see if the
process we are looking for is there?  Also, what is the value of
child_proc_count when waitpid is called?

If the above doesn't show the light, can you please compile w32proc.c
with -DFULL_DEBUG, and see what Emacs says about processes it launches
and reaps when you run Emacs under GDB?

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:41:01 GMT) Full text and rfc822 format available.

Message #62 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 22:39:33 +0200
> Date: Thu, 06 Dec 2012 12:36:05 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> 
> On 12/06/12 12:25, Eli Zaretskii wrote:
>  
> > How can that happen, if PID is not our child process?
> 
> Because Emacs regularly kills processes that it thinks are its
> children.  If I'm running several applications, one of them
> can kill the other even if the applications are otherwise
> unrelated -- that's standard Unix semantics.  So if
> Emacs mistakenly thinks that it has a child with PID 1234
> and does the equivalent of "kill -9 1234", which is something
> that delete-process does, Emacs can kill an innocent and
> unrelated victim.

But that already happened when we are calling waitpid.  It cannot be
helped at that point.

> >> One possibility is for Emacs to fall back into recovery mode
> > 
> > It is much easier to remove the offending process object from the list
> > of those we expect to be dead.
> 
> Sure, but there's a good case for being cautious here, even if
> it's a bit harder to be cautious.

It doesn't make sense to be that cautious in an Emacs compiled without
assertions.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:50:01 GMT) Full text and rfc822 format available.

Message #65 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: eggert <at> cs.ucla.edu
Cc: 13086 <at> debbugs.gnu.org
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 22:48:52 +0200
> Date: Thu, 06 Dec 2012 22:39:33 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
> 
> > Date: Thu, 06 Dec 2012 12:36:05 -0800
> > From: Paul Eggert <eggert <at> cs.ucla.edu>
> > CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> > 
> > On 12/06/12 12:25, Eli Zaretskii wrote:
> >  
> > > How can that happen, if PID is not our child process?
> > 
> > Because Emacs regularly kills processes that it thinks are its
> > children.  If I'm running several applications, one of them
> > can kill the other even if the applications are otherwise
> > unrelated -- that's standard Unix semantics.  So if
> > Emacs mistakenly thinks that it has a child with PID 1234
> > and does the equivalent of "kill -9 1234", which is something
> > that delete-process does, Emacs can kill an innocent and
> > unrelated victim.
> 
> But that already happened when we are calling waitpid.  It cannot be
> helped at that point.
> 
> > >> One possibility is for Emacs to fall back into recovery mode
> > > 
> > > It is much easier to remove the offending process object from the list
> > > of those we expect to be dead.
> > 
> > Sure, but there's a good case for being cautious here, even if
> > it's a bit harder to be cautious.
> 
> It doesn't make sense to be that cautious in an Emacs compiled without
> assertions.

Just to make sure I'm clear: I'm saying that inflooping in these cases
will help nothing.  And that is exactly what we will do in an build
without assertions, if waitpid returns ECHILD.  That simply makes no
sense at all.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 20:59:01 GMT) Full text and rfc822 format available.

Message #68 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 12:57:54 -0800
On 12/06/12 12:39, Eli Zaretskii wrote:

> But that already happened when we are calling waitpid.

Not necessarily.  Sometimes Emacs does a waitpid before
killing the child.  For example, the waitpid of
call_process_cleanup might precede the EMACS_KILLPG
of call_process_kill.

> It doesn't make sense to be that cautious in an Emacs compiled without
> assertions.

Well, as I said, I'd rather that the bug got fixed.
But if it's a common bug on Windows platforms, and we can't
easily fix it, perhaps we should add some #ifdef WINDOWSNT
code that ignores the problem along the lines of your suggestion.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 21:38:02 GMT) Full text and rfc822 format available.

Message #71 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 13:36:53 -0800
On 12/06/12 12:48, Eli Zaretskii wrote:
> And that is exactly what we will do in an build
> without assertions, if waitpid returns ECHILD.

One possibility is to replace
"eassert (errno == EINTR);" with
"if (errno != EINTR) emacs_abort ();";
that would prevent the loop.

There are other possibilities,
for example, Emacs could report an error and
refuse to kill any processes thereafter;
this would prevent it from killing innocent-victim
processes.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Thu, 06 Dec 2012 22:52:02 GMT) Full text and rfc822 format available.

Message #74 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Powell <stephen_powell <at> optusnet.com.au>
To: eliz <at> gnu.org
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Thu, 06 Dec 2012 22:51:16 +0000
Eli Zaretskii <eliz <at> gnu.org> writes:

> If the above doesn't show the light, can you please compile w32proc.c
> with -DFULL_DEBUG, and see what Emacs says about processes it launches
> and reaps when you run Emacs under GDB?


OK, its 9:30 my time and I got to do stuff, but here is what I've found
so for.

There are two hosts gnus connects to, Yahoo and HotMail.

1. Gnus uses the command

`c:\cygwin\bin\bash.exe -c "openssl s_client -quiet -ssl3 -connect
imap.mail.yahoo.com:993"'

which runs for a three or four seconds and exits with 0 as the return
code.  Emacs keeps track of this process.  Gnus seems to wait for this
to finish.

From the gdb log:

warning: register_child registered fd 6 with pid 6360
warning: sys_kill.TerminateProcess returned 5 for pid 6360


2. Gnus uses the command

`c:\cygwin\bin\bash.exe -c "gnutls-cli --insecure -p 995 pop3.live.com"'

which runs for fifty seconds and exits with 1 as the return code.  Emacs
loses track of this process.  Gnus does not seem to wait for this to
finish but carries on quickly.

From the gdb log:

warning: register_child registered fd 6 with pid 7020

No terminate process but the process has finished.


I'll try to get back to his latter today.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 00:11:01 GMT) Full text and rfc822 format available.

Message #77 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Stephen Powell <stephen_powell <at> optusnet.com.au>
To: stephen_powell <at> optusnet.com.au
Cc: 13086 <at> debbugs.gnu.org, eliz <at> gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 00:10:01 +0000
I've changed the mail-sources entry for hotmail to use the correct login
parameters so that now it actually works.  It connects quickly rather
than failing to connect and waiting for a timeout.

This has made the emacs abort go away but doesn't solve the problem of
emacs losing track of the child process.  I don't think I know enough to
be able to debug this problem.  Any suggestions about how to proceed are
welcome.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 06:24:02 GMT) Full text and rfc822 format available.

Message #80 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 08:22:31 +0200
> Date: Thu, 06 Dec 2012 13:36:53 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 13086 <at> debbugs.gnu.org
> 
> On 12/06/12 12:48, Eli Zaretskii wrote:
> > And that is exactly what we will do in an build
> > without assertions, if waitpid returns ECHILD.
> 
> One possibility is to replace
> "eassert (errno == EINTR);" with
> "if (errno != EINTR) emacs_abort ();";
> that would prevent the loop.
> 
> There are other possibilities,
> for example, Emacs could report an error and
> refuse to kill any processes thereafter;
> this would prevent it from killing innocent-victim
> processes.

Any of these is better than iflooping.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 06:45:02 GMT) Full text and rfc822 format available.

Message #83 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 08:43:20 +0200
> Date: Thu, 06 Dec 2012 22:51:16 +0000
> From: Stephen Powell <stephen_powell <at> optusnet.com.au>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
> 
> OK, its 9:30 my time and I got to do stuff, but here is what I've found
> so for.

Thanks, we are making good progress, I think.

> 1. Gnus uses the command
> 
> `c:\cygwin\bin\bash.exe -c "openssl s_client -quiet -ssl3 -connect
> imap.mail.yahoo.com:993"'
> 
> which runs for a three or four seconds and exits with 0 as the return
> code.  Emacs keeps track of this process.  Gnus seems to wait for this
> to finish.
> 
> >From the gdb log:
> 
> warning: register_child registered fd 6 with pid 6360
> warning: sys_kill.TerminateProcess returned 5 for pid 6360

I'd like to see a backtrace from where this message comes (in
sys_kill).

> 2. Gnus uses the command
> 
> `c:\cygwin\bin\bash.exe -c "gnutls-cli --insecure -p 995 pop3.live.com"'
> 
> which runs for fifty seconds and exits with 1 as the return code.  Emacs
> loses track of this process.  Gnus does not seem to wait for this to
> finish but carries on quickly.

When you say "keeps track" and "loses track", what specifically do you
mean?  What is different in what Emacs does or displays in these two
cases, evidence do you have that led you to these conclusions?

> >From the gdb log:
> 
> warning: register_child registered fd 6 with pid 7020
> 
> No terminate process but the process has finished.

It would be good to see the series of calls to waitpid in this case,
and also the messages from here:

  #ifdef FULL_DEBUG
		  DebPrint (("select waiting on child %d fd %d\n",
			     cp-child_procs, i));
  #endif

in sys_select, while this process is running and after it exits or
dies.

Thanks again for working on this.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 06:46:02 GMT) Full text and rfc822 format available.

Message #86 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>
Cc: 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 08:44:48 +0200
> Date: Fri, 07 Dec 2012 00:10:01 +0000
> From: Stephen Powell <stephen_powell <at> optusnet.com.au>
> CC: eliz <at> gnu.org, 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
> 
> I've changed the mail-sources entry for hotmail to use the correct login
> parameters so that now it actually works.  It connects quickly rather
> than failing to connect and waiting for a timeout.
> 
> This has made the emacs abort go away but doesn't solve the problem of
> emacs losing track of the child process.  I don't think I know enough to
> be able to debug this problem.  Any suggestions about how to proceed are
> welcome.

I made some suggestions in another mail.  It could be that it will be
easier to debug this if you go back to the configuration where the
connection was timing out, because this will give you (and GDB) more
time to gather evidence.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 06:48:02 GMT) Full text and rfc822 format available.

Message #89 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org, stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 08:46:51 +0200
> Date: Thu, 06 Dec 2012 12:57:54 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
> 
> Well, as I said, I'd rather that the bug got fixed.
> But if it's a common bug on Windows platforms, and we can't
> easily fix it, perhaps we should add some #ifdef WINDOWSNT
> code that ignores the problem along the lines of your suggestion.

I wasn't thinking about Windows at all, as a matter of fact.  The
current bug on Windows will soon be solved, one way or another.  I was
thinking about Posix platforms inflooping when waitpid returns ECHILD,
when the build is without assertions (which is the default).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 16:50:03 GMT) Full text and rfc822 format available.

Message #92 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>,
	stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 11:48:29 -0500
On 12/7/2012 1:46 AM, Eli Zaretskii wrote:
>> Date: Thu, 06 Dec 2012 12:57:54 -0800
>> From: Paul Eggert <eggert <at> cs.ucla.edu>
>> CC: stephen_powell <at> optusnet.com.au, 13086 <at> debbugs.gnu.org
>>
>> Well, as I said, I'd rather that the bug got fixed.
>> But if it's a common bug on Windows platforms, and we can't
>> easily fix it, perhaps we should add some #ifdef WINDOWSNT
>> code that ignores the problem along the lines of your suggestion.
>
> I wasn't thinking about Windows at all, as a matter of fact.  The
> current bug on Windows will soon be solved, one way or another.  I was
> thinking about Posix platforms inflooping when waitpid returns ECHILD,
> when the build is without assertions (which is the default).

I haven't been following this thread closely, but could any of this 
discussion be relevant to the emacs-24 branch?

I'm asking because two Cygwin users have reported that Emacs 24.2 
sometimes "hangs", and that there is at least one zombie process when 
this happens; see

  http://cygwin.com/ml/cygwin/2012-11/msg00101.html
  http://cygwin.com/ml/cygwin/2012-12/msg00114.html
  http://cygwin.com/ml/cygwin/2012-12/msg00078.html

I've been trying to get more information from these users.  If you think 
this could be related to the current bug (which seems to be about the 
trunk in spite of the subject line), I'll send the information when I 
get it.  Otherwise I'll file a new bug report.

Ken




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 17:44:02 GMT) Full text and rfc822 format available.

Message #95 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: 13086 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
	stephen_powell <at> optusnet.com.au
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 09:43:23 -0800
On 12/07/2012 08:48 AM, Ken Brown wrote:
> could any of this discussion be relevant to the emacs-24 branch?

I suspect the emacs-24 Cygwin bug is not due to the particular code
we're talking about, which was briefly in the emacs-24 branch but
was reverted at the first sign of trouble.  So I suggest filing a
new bug report.

It could be that this discussion is relevant, as this area
is clearly buggy in the emacs-24 branch, and the bugs could cause
symptoms such as you describe.  So I suggest mentioning Bug#13086
in any new bug report for emacs-24.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 17:55:02 GMT) Full text and rfc822 format available.

Message #98 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 13086 <at> debbugs.gnu.org
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 09:54:28 -0800
On 12/06/2012 10:22 PM, Eli Zaretskii wrote:
> Any of these is better than iflooping.

OK, I changed it to use emacs_abort for now (trunk bzr 111150),
as that's simplest.  We can improve this later as needed.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 07 Dec 2012 18:59:02 GMT) Full text and rfc822 format available.

Message #101 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 13086 <at> debbugs.gnu.org
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 07 Dec 2012 20:57:09 +0200
> Date: Fri, 07 Dec 2012 09:54:28 -0800
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> CC: 13086 <at> debbugs.gnu.org
> 
> On 12/06/2012 10:22 PM, Eli Zaretskii wrote:
> > Any of these is better than iflooping.
> 
> OK, I changed it to use emacs_abort for now (trunk bzr 111150),
> as that's simplest.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Fri, 21 Dec 2012 11:58:02 GMT) Full text and rfc822 format available.

Message #104 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>
Cc: 13086 <at> debbugs.gnu.org, Fabrice Popineau <fabrice.popineau <at> gmail.com>
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Fri, 21 Dec 2012 13:56:38 +0200
Fabrice Popineau reported a similar problem on his machine, and
debugged it to find the source.  Based on his findings, I committed to
the emacs-24 branch (revision 111050) changes that should avoid losing
track of child processes, and hopefully (after they are merged to the
trunk) also resolve the problem reported here.

Thanks.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#13086; Package emacs. (Sun, 23 Dec 2012 17:14:02 GMT) Full text and rfc822 format available.

Message #107 received at 13086 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: stephen_powell <at> optusnet.com.au
Cc: 13086 <at> debbugs.gnu.org, fabrice.popineau <at> gmail.com
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Sun, 23 Dec 2012 19:12:32 +0200
> Date: Fri, 21 Dec 2012 13:56:38 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> CC: 13086 <at> debbugs.gnu.org, Fabrice Popineau <fabrice.popineau <at> gmail.com>
> 
> Fabrice Popineau reported a similar problem on his machine, and
> debugged it to find the source.  Based on his findings, I committed to
> the emacs-24 branch (revision 111050) changes that should avoid losing
> track of child processes, and hopefully (after they are merged to the
> trunk) also resolve the problem reported here.

Another commit (trunk revision 111311) should make handling of
subprocess shutdown on MS-Windows even more solid.  As a nice side
effect, the annoying "reader_thread.SetEvent failed with 6 for fd 3"
message no longer appears.





Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Tue, 22 Jan 2013 13:40:03 GMT) Full text and rfc822 format available.

Notification sent to Stephen Powell <stephen_powell <at> optusnet.com.au>:
bug acknowledged by developer. (Tue, 22 Jan 2013 13:40:04 GMT) Full text and rfc822 format available.

Message #112 received at 13086-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Stephen Powell <stephen_powell <at> optusnet.com.au>
Cc: 13086-done <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#13086: 24.2.50; Emacs seems to hang at w32proc.c:1126
Date: Tue, 22 Jan 2013 15:38:02 +0200
> Date: Fri, 07 Dec 2012 00:10:01 +0000
> From: Stephen Powell <stephen_powell <at> optusnet.com.au>
> CC: eliz <at> gnu.org, 13086 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
> 
> I've changed the mail-sources entry for hotmail to use the correct login
> parameters so that now it actually works.  It connects quickly rather
> than failing to connect and waiting for a timeout.
> 
> This has made the emacs abort go away but doesn't solve the problem of
> emacs losing track of the child process.  I don't think I know enough to
> be able to debug this problem.  Any suggestions about how to proceed are
> welcome.

No further reports about this for a long time, so I presume the
problem was solved by the related changes on the trunk.

I'm therefore closing this bug.  Feel free to reopen if something
similar surfaces.

Thanks.




Forcibly Merged 13086 13157. Request was from Glenn Morris <rgm <at> gnu.org> to control <at> debbugs.gnu.org. (Sun, 17 Feb 2013 03:00:01 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 17 Mar 2013 11:24:10 GMT) Full text and rfc822 format available.

This bug report was last modified 11 years and 47 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.