GNU bug report logs - #14569
24.3.50; bootstrap fails on Cygwin

Package: emacs;

Reported by: Katsumi Yamaoka <yamaoka <at> jpl.org>

Date: Fri, 7 Jun 2013 00:17:01 UTC

Severity: important

Found in version 24.3.50

Done: Ken Brown <kbrown <at> cornell.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 14569 in the body.
You can then email your comments to 14569 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox

Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 07 Jun 2013 00:17:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Katsumi Yamaoka <yamaoka <at> jpl.org>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Fri, 07 Jun 2013 00:17:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: bug-gnu-emacs <at> gnu.org
Subject: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 07 Jun 2013 09:16:39 +0900

Hi,

Bootstrap got to fail on Cygwin since yesterday.  An error occurs
at least when performing batch-update-autoloads as follows:

[...]
Wrote /Work/emacs/lisp/mh-e/mh-loaddefs.el
(No changes need to be saved)
EMACSLOADPATH=/Work/emacs/lisp LC_ALL=C /Work/emacs/src/bootstrap-emacs.exe \
 -batch --no-site-file --no-site-lisp -l autoload \
 --eval "(setq generate-autoload-cookie \";;;###tramp-autoload\")" \
 --eval "(setq generated-autoload-file (unmsys--file-name \"/Work/emacs/lisp/net/tramp-loaddefs.el\"))" \
 --eval "(setq make-backup-files nil)" \
 -f batch-update-autoloads /Work/emacs/lisp/net
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_setspecific': Invalid argument.  Aborting.
Makefile:392: recipe for target `/Work/emacs/lisp/net/tramp-loaddefs.el' failed
make[3]: *** [/Work/emacs/lisp/net/tramp-loaddefs.el] Aborted
[...]
make: *** [bootstrap] Error 2

I can run bootstrap-emacs.exe with the -Q option but I have no
clue to examine it.  Please help.

(This is of what I built last.)
In GNU Emacs 24.3.50.1 (i686-pc-cygwin, X toolkit, Xaw3d scroll bars)
 of 2013-06-05 on localhost
Bzr revision: 112848 eliz <at> gnu.org-20130604163346-bxz8tbdsd4zt5zm2
Windowing system distributor `The Cygwin/X Project', version 11.0.11401000
Configured using:
 `configure --verbose --with-x-toolkit=lucid --without-imagemagick
 --without-dbus --without-gconf --without-gsettings'

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 13:55:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: bug-emacs <bug-gnu-emacs <at> gnu.org>
Cc: eggert <at> cs.ucla.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 10 Jun 2013 15:54:10 +0200

Katsumi Yamaoka wrote:

> Bootstrap got to fail on Cygwin since yesterday.  An error occurs
> at least when performing batch-update-autoloads as follows:

I did a similar report on emacs-devel:

http://lists.gnu.org/archive/html/emacs-devel/2013-06/msg00333.html

Now I have discovered that trunk rev. 112858 build fine while rev. 
112859 fails as described. These are the changes that have broken the 
bootstrap on Cygwin:

=================================
2013-06-05  Paul Eggert  <eggert <at> cs.ucla.edu>
 2
 3        Chain glib's SIGCHLD handler from Emacs's (Bug#14474).
 4        * process.c (dummy_handler): New function.
 5        (lib_child_handler): New static var.
 6        (handle_child_signal): Invoke it.
 7        (catch_child_signal): If a library has set up a signal handler,
 8        save it into lib_child_handler.
 9        (init_process_emacs): If using glib and not on Windows, 
tickle glib's
 10        child-handling code so that it initializes its private 
SIGCHLD handler.
 11        * syssignal.h (SA_SIGINFO): Default to 0.
 12        * xterm.c (x_term_init): Remove D-bus hack that I installed 
on May
 13        31; it should no longer be needed now.
=================================

Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 16:28:02 GMT) Full text and rfc822 format available.

Message #11 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 10 Jun 2013 18:27:28 +0200

Hello.

This sounds like a bug in GLib.  Put a breakpoint at g_thread_abort to get a useful backtrace.

	Jan D.

10 jun 2013 kl. 15:54 skrev Angelo Graziosi <angelo.graziosi <at> alice.it>:

> Katsumi Yamaoka wrote:
> 
>> Bootstrap got to fail on Cygwin since yesterday.  An error occurs
>> at least when performing batch-update-autoloads as follows:
> 
> I did a similar report on emacs-devel:
> 
> http://lists.gnu.org/archive/html/emacs-devel/2013-06/msg00333.html
> 
> Now I have discovered that trunk rev. 112858 build fine while rev. 112859 fails as described. These are the changes that have broken the bootstrap on Cygwin:
> 
> =================================
> 2013-06-05  Paul Eggert  <eggert <at> cs.ucla.edu>
> 2
> 3        Chain glib's SIGCHLD handler from Emacs's (Bug#14474).
> 4        * process.c (dummy_handler): New function.
> 5        (lib_child_handler): New static var.
> 6        (handle_child_signal): Invoke it.
> 7        (catch_child_signal): If a library has set up a signal handler,
> 8        save it into lib_child_handler.
> 9        (init_process_emacs): If using glib and not on Windows, tickle glib's
> 10        child-handling code so that it initializes its private SIGCHLD handler.
> 11        * syssignal.h (SA_SIGINFO): Default to 0.
> 12        * xterm.c (x_term_init): Remove D-bus hack that I installed on May
> 13        31; it should no longer be needed now.
> =================================
> 
> 
> Ciao,
> Angelo.
> 
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 18:58:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, eggert <at> cs.ucla.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 10 Jun 2013 20:56:53 +0200

Il 10/06/2013 18.27, Jan Djärv ha scritto:
> Hello.
>
> This sounds like a bug in GLib.  Put a breakpoint at g_thread_abort to get a useful backtrace.

I am afraid but GDB is not for me... :(


Ciao,
 Angelo.

>
> 	Jan D.
>
> 10 jun 2013 kl. 15:54 skrev Angelo Graziosi <angelo.graziosi <at> alice.it>:
>
>> Katsumi Yamaoka wrote:
>>
>>> Bootstrap got to fail on Cygwin since yesterday.  An error occurs
>>> at least when performing batch-update-autoloads as follows:
>>
>> I did a similar report on emacs-devel:
>>
>> http://lists.gnu.org/archive/html/emacs-devel/2013-06/msg00333.html
>>
>> Now I have discovered that trunk rev. 112858 build fine while rev. 112859 fails as described. These are the changes that have broken the bootstrap on Cygwin:
>>
>> =================================
>> 2013-06-05  Paul Eggert  <eggert <at> cs.ucla.edu>
>> 2
>> 3        Chain glib's SIGCHLD handler from Emacs's (Bug#14474).
>> 4        * process.c (dummy_handler): New function.
>> 5        (lib_child_handler): New static var.
>> 6        (handle_child_signal): Invoke it.
>> 7        (catch_child_signal): If a library has set up a signal handler,
>> 8        save it into lib_child_handler.
>> 9        (init_process_emacs): If using glib and not on Windows, tickle glib's
>> 10        child-handling code so that it initializes its private SIGCHLD handler.
>> 11        * syssignal.h (SA_SIGINFO): Default to 0.
>> 12        * xterm.c (x_term_init): Remove D-bus hack that I installed on May
>> 13        31; it should no longer be needed now.
>> =================================
>>
>>
>> Ciao,
>> Angelo.
>>
>>
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 20:11:02 GMT) Full text and rfc822 format available.

Message #17 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: Jan Djärv <jan.h.d <at> swipnet.se>, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 10 Jun 2013 13:10:29 -0700

Are you linking with libxml2?  These URLs:

http://91r.net/ask/15791784.html

http://xmlsoft.org/threads.html

suggests that Emacs may not be initializing
libxml2 properly.

You should be able to tell whether you're linking
with libxml2 by looking at the 'make' output,
or by running 'ldd src/temacs'.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 21:16:01 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>,
 Jan Djärv <jan.h.d <at> swipnet.se>,
 Ken Brown <kbrow1i <at> gmail.com>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 10 Jun 2013 23:15:17 +0200

Il 10/06/2013 22.10, Paul Eggert ha scritto:
> Are you linking with libxml2?  These URLs:
>
> http://91r.net/ask/15791784.html
>
> http://xmlsoft.org/threads.html
>
> suggests that Emacs may not be initializing
> libxml2 properly.
>
> You should be able to tell whether you're linking
> with libxml2 by looking at the 'make' output,
> or by running 'ldd src/temacs'.
>

Hmm... I configure with:

$ "${source_dir}"/configure --prefix="${prefix_dir}"

and "configure" adds all it finds,

...
Configured for `i686-pc-cygwin'.

  Where should the build process find the source code?    /work/emacs
  What compiler should emacs be built with?               gcc 
-std=gnu99 -g3 -O2
  Should Emacs use the GNU version of malloc?             yes
  Should Emacs use a relocating allocator for buffers?    no
  Should Emacs use mmap(2) for buffer allocation?         yes
  What window system should Emacs use?                    x11
  What toolkit should Emacs use?                          GTK3
  Where do we find X Windows header files?                Standard dirs
  Where do we find X Windows libraries?                   Standard dirs
  Does Emacs use -lXaw3d?                                 no
  Does Emacs use -lXpm?                                   yes
  Does Emacs use -ljpeg?                                  yes
  Does Emacs use -ltiff?                                  yes
  Does Emacs use a gif library?                           yes -lgif
  Does Emacs use -lpng?                                   yes
  Does Emacs use -lrsvg-2?                                yes
  Does Emacs use imagemagick?                             yes
  Does Emacs use -lgpm?                                   no
  Does Emacs use -ldbus?                                  yes
  Does Emacs use -lgconf?                                 yes
  Does Emacs use GSettings?                               yes
  Does Emacs use a file notification library?             yes -lgio (gfile)
  Does Emacs use -lselinux?                               no
  Does Emacs use -lgnutls?                                yes
  Does Emacs use -lxml2?                                  yes
  Does Emacs use -lfreetype?                              yes
  Does Emacs use -lm17n-flt?                              yes
  Does Emacs use -lotf?                                   yes
  Does Emacs use -lxft?                                   yes
  Does Emacs use toolkit scroll bars?                     yes
...


It is certainly not my will that wants to link with xml2,

$ ldd emacs/Work/src/temacs.exe | grep xml
	cygxml2-2.dll => /usr/bin/cygxml2-2.dll (0x45990000)


Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 21:53:01 GMT) Full text and rfc822 format available.

Message #23 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: Jan Djärv <jan.h.d <at> swipnet.se>, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 10 Jun 2013 14:52:33 -0700

On 06/10/13 14:15, Angelo Graziosi wrote:
> I configure with:
> 
> $ "${source_dir}"/configure --prefix="${prefix_dir}"

What happens if you configure --without-xml2?
Something like this, say:

"${source_dir}"/configure --prefix="${prefix_dir}" --without-xml2
make clean
make

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 22:08:02 GMT) Full text and rfc822 format available.

Message #26 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>,
 Jan Djärv <jan.h.d <at> swipnet.se>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 00:06:18 +0200

Il 10/06/2013 23.52, Paul Eggert ha scritto:
> On 06/10/13 14:15, Angelo Graziosi wrote:
>> I configure with:
>>
>> $ "${source_dir}"/configure --prefix="${prefix_dir}"
>
> What happens if you configure --without-xml2?
> Something like this, say:


Hmm... I have just verified that my builds on Kubuntu uses xm2 too, and 
the same rev. that fails to bootstrap on Cygwin, there builds fine..

For example, rev. 112902 gives

...
Configured for `x86_64-unknown-linux-gnu'.

  Where should the build process find the source code?    /work/emacs 
What compiler should emacs be built with?               clang -O3
  Should Emacs use the GNU version of malloc?             yes
      (Using Doug Lea's new malloc from the GNU C Library.)
  Should Emacs use a relocating allocator for buffers?    no
  Should Emacs use mmap(2) for buffer allocation?         no
  What window system should Emacs use?                    x11
  What toolkit should Emacs use?                          GTK3
  Where do we find X Windows header files?                Standard dirs
  Where do we find X Windows libraries?                   Standard dirs
  Does Emacs use -lXaw3d?                                 no
  Does Emacs use -lXpm?                                   yes
  Does Emacs use -ljpeg?                                  yes
  Does Emacs use -ltiff?                                  yes
  Does Emacs use a gif library?                           yes -lgif
  Does Emacs use -lpng?                                   yes
  Does Emacs use -lrsvg-2?                                yes
  Does Emacs use imagemagick?                             yes
  Does Emacs use -lgpm?                                   yes
  Does Emacs use -ldbus?                                  yes
  Does Emacs use -lgconf?                                 yes
  Does Emacs use GSettings?                               yes
  Does Emacs use a file notification library?             yes -lgio (gfile)
  Does Emacs use -lselinux?                               no
  Does Emacs use -lgnutls?                                yes
  Does Emacs use -lxml2?                                  yes
  Does Emacs use -lfreetype?                              yes
  Does Emacs use -lm17n-flt?                              yes
  Does Emacs use -lotf?                                   yes
  Does Emacs use -lxft?                                   yes
  Does Emacs use toolkit scroll bars?                     yes
...

and

$ ldd /usr/local/emacs/bin/emacs | grep xml
        libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 
(0x00007f4858e01000)


> "${source_dir}"/configure --prefix="${prefix_dir}" --without-xml2
> make clean
> make
>

tomorrow...

Good Night,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 10 Jun 2013 23:25:01 GMT) Full text and rfc822 format available.

Message #29 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>,
 Jan Djärv <jan.h.d <at> swipnet.se>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 01:23:25 +0200

Il 10/06/2013 23.52, Paul Eggert ha scritto:
> On 06/10/13 14:15, Angelo Graziosi wrote:
>> I configure with:
>>
>> $ "${source_dir}"/configure --prefix="${prefix_dir}"
>
> What happens if you configure --without-xml2?
> Something like this, say:
>
> "${source_dir}"/configure --prefix="${prefix_dir}" --without-xml2
> make clean
> make
>

I am afraid for you, but what you suggest fails in the same manner:

$ cd emacs
$ mkdir Work
$ ./autogen.sh
$ cd Work/
$ ../configure --prefix=/usr/local/emacs --without-xml2
...
Configured for `i686-pc-cygwin'.

  Where should the build process find the source code?    /work/emacs
  What compiler should emacs be built with?               gcc 
-std=gnu99 -g3 -O2
  Should Emacs use the GNU version of malloc?             yes
  Should Emacs use a relocating allocator for buffers?    no
  Should Emacs use mmap(2) for buffer allocation?         yes
  What window system should Emacs use?                    x11
  What toolkit should Emacs use?                          GTK3
  Where do we find X Windows header files?                Standard dirs
  Where do we find X Windows libraries?                   Standard dirs
  Does Emacs use -lXaw3d?                                 no
  Does Emacs use -lXpm?                                   yes
  Does Emacs use -ljpeg?                                  yes
  Does Emacs use -ltiff?                                  yes
  Does Emacs use a gif library?                           yes -lgif
  Does Emacs use -lpng?                                   yes
  Does Emacs use -lrsvg-2?                                yes
  Does Emacs use imagemagick?                             yes
  Does Emacs use -lgpm?                                   no
  Does Emacs use -ldbus?                                  yes
  Does Emacs use -lgconf?                                 yes
  Does Emacs use GSettings?                               yes
  Does Emacs use a file notification library?             yes -lgio (gfile)
  Does Emacs use -lselinux?                               no
  Does Emacs use -lgnutls?                                yes
  Does Emacs use -lxml2?                                  no
  Does Emacs use -lfreetype?                              yes
  Does Emacs use -lm17n-flt?                              yes
  Does Emacs use -lotf?                                   yes
  Does Emacs use -lxft?                                   yes
  Does Emacs use toolkit scroll bars?                     yes
...

(notice that now it says: "Does Emacs use -lxml2? no")

$ make
...
Compiling /work/emacs/src/../lisp/language/cham.el
GLib (gthread-posix.c): Unexpected error from C library during 
'pthread_setspecific': Invalid argument.  Aborting.
Makefile:229: recipe for target `compile-onefile' failed
make[2]: *** [compile-onefile] Aborted
make[2]: uscita dalla directory "/work/emacs/Work/lisp"
Makefile:809: recipe for target 
`/work/emacs/src/../lisp/language/cham.elc' failed
make[1]: *** [/work/emacs/src/../lisp/language/cham.elc] Error 2
make[1]: uscita dalla directory "/work/emacs/Work/src"
Makefile:381: recipe for target `src' failed
make: *** [src] Error 2

I would have been surprised if it had worked.. As I explained, on 
GNU/Linux Kubuntu 12.04 Emacs Trunk bootstrap fine with the XML2 support..

and now, really, Good Night!!!

Ciao,
 Angelo.

(PS. When I replay, TB refuses to send the replay to the address 
14569 <at> debbugs.gnu.org, I need to change it manually into 
bug-gnu-emacs <at> gnu.org, IS there some tricks I can do to avoid this? TIA, A.)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 15:14:02 GMT) Full text and rfc822 format available.

Message #32 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>,
 Jan Djärv <jan.h.d <at> swipnet.se>, kbrown <at> cornell.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 17:13:00 +0200

Il 10/06/2013 23.52, Paul Eggert ha scritto:
> On 06/10/13 14:15, Angelo Graziosi wrote:
>> I configure with:
>>
>> $ "${source_dir}"/configure --prefix="${prefix_dir}"
>
> What happens if you configure --without-xml2?
> Something like this, say:
>
> "${source_dir}"/configure --prefix="${prefix_dir}" --without-xml2
> make clean
> make
>

As you have seen, what you propose fails... but also this fails:

$ cd emacs
$ ./autogen.sh
$ mkdir Work
$ cd Work
$ ../configure --without-all
...
Configured for `i686-pc-cygwin'.

  Where should the build process find the source code?    /work/emacs
  What compiler should emacs be built with?               gcc 
-std=gnu99 -g3 -O2
  Should Emacs use the GNU version of malloc?             yes
  Should Emacs use a relocating allocator for buffers?    no
  Should Emacs use mmap(2) for buffer allocation?         yes
  What window system should Emacs use?                    x11
  What toolkit should Emacs use?                          GTK3
  Where do we find X Windows header files?                Standard dirs
  Where do we find X Windows libraries?                   Standard dirs
  Does Emacs use -lXaw3d?                                 no
  Does Emacs use -lXpm?                                   no
  Does Emacs use -ljpeg?                                  no
  Does Emacs use -ltiff?                                  no
  Does Emacs use a gif library?                           no
  Does Emacs use -lpng?                                   no
  Does Emacs use -lrsvg-2?                                no
  Does Emacs use imagemagick?                             no
  Does Emacs use -lgpm?                                   no
  Does Emacs use -ldbus?                                  no
  Does Emacs use -lgconf?                                 no
  Does Emacs use GSettings?                               no
  Does Emacs use a file notification library?             yes -lgio (gfile)
  Does Emacs use -lselinux?                               no
  Does Emacs use -lgnutls?                                no
  Does Emacs use -lxml2?                                  no
  Does Emacs use -lfreetype?                              no
  Does Emacs use -lm17n-flt?                              no
  Does Emacs use -lotf?                                   no
  Does Emacs use -lxft?                                   no
  Does Emacs use toolkit scroll bars?                     no

$ make
...
make[2]: uscita dalla directory "/work/emacs/Work/lisp"
make[2]: ingresso nella directory "/work/emacs/Work/lisp"
Compiling /work/emacs/src/../lisp/international/characters.el
Wrote /work/emacs/lisp/international/characters.elc
make[2]: uscita dalla directory "/work/emacs/Work/lisp"
make[2]: ingresso nella directory "/work/emacs/Work/lisp"
Compiling /work/emacs/src/../lisp/composite.el
Wrote /work/emacs/lisp/composite.elc
make[2]: uscita dalla directory "/work/emacs/Work/lisp"
make[2]: ingresso nella directory "/work/emacs/Work/lisp"
Compiling /work/emacs/src/../lisp/language/chinese.el
GLib (gthread-posix.c): Unexpected error from C library during 
'pthread_setspecific': Invalid argument.  Aborting.
Makefile:229: recipe for target `compile-onefile' failed
make[2]: *** [compile-onefile] Aborted
make[2]: uscita dalla directory "/work/emacs/Work/lisp"
Makefile:809: recipe for target 
`/work/emacs/src/../lisp/language/chinese.elc' failed
make[1]: *** [/work/emacs/src/../lisp/language/chinese.elc] Error 2
make[1]: uscita dalla directory "/work/emacs/Work/src"
Makefile:381: recipe for target `src' failed
make: *** [src] Error 2

Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 15:40:01 GMT) Full text and rfc822 format available.

Message #35 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, eggert <at> cs.ucla.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 17:39:00 +0200

Hello.

10 jun 2013 kl. 20:56 skrev Angelo Graziosi <angelo.graziosi <at> alice.it>:

> Il 10/06/2013 18.27, Jan Djärv ha scritto:
>> Hello.
>> 
>> This sounds like a bug in GLib.  Put a breakpoint at g_thread_abort to get a useful backtrace.
> 
> I am afraid but GDB is not for me... :(

The error is not even consistent, it only occurs sometimes.  There seems to be a random memory corruption going on.  Sometimes bootstrap fails with a core dump, sometimes with

GLib (gthread-posix.c): Unexpected error from C library during 'pthread_setspecific': Invalid argument.  Aborting.

And this is while compiling .el-files.  It is not crashing in the same .el-file, many files compile just fine before the crash happens (if it happens).  Redoing the make after a crash usually produces an Emacs executable.  It seems to run fine, but I haven't run it for a very long time.  Maybe the bug manifests itself only in bootstrap-emacs when using GLib?

I got one backtrace for the setspecific error:

Breakpoint 1, 0x610dcd26 in abort () from /usr/bin/cygwin1.dll
(gdb) bt
#0  0x610dcd26 in abort () from /usr/bin/cygwin1.dll
#1  0x6a90d066 in g_spawn_close_pid () from /usr/bin/cygglib-2.0-0.dll
#2  0x6a908e8c in g_private_set () from /usr/bin/cygglib-2.0-0.dll
#3  0x6a8f06ce in g_thread_self () from /usr/bin/cygglib-2.0-0.dll
#4  0x6a8ce250 in g_main_context_iteration () from /usr/bin/cygglib-2.0-0.dll
#5  0x6a8ce2aa in g_main_context_iteration () from /usr/bin/cygglib-2.0-0.dll
#6  0x6a8f017d in g_thread_proxy () from /usr/bin/cygglib-2.0-0.dll
#7  0x610ffe1a in pthread::thread_init_wrapper(void*) ()
   from /usr/bin/cygwin1.dll
#8  0x6108974c in thread_wrapper(void*) () from /usr/bin/cygwin1.dll


This is in a separate thread, Emacs is executing in another thread:

(gdb) info threads
  Id   Target Id         Frame
* 3    Thread 1564.0x2d4 0x610dcd26 in abort () from /usr/bin/cygwin1.dll
  2    Thread 1564.0xfd0 0x7c90e514 in ntdll!KiFastSystemCallRet ()
   from /cygdrive/c/WINDOWS/system32/ntdll.dll
  1    Thread 1564.0xa8c 0x0054b2d5 in oblookup (obarray=<optimized out>,
    ptr=<optimized out>, size=10, size_byte=<optimized out>) at lread.c:3905
(gdb) thr 1
[Switching to thread 1 (Thread 1564.0xa8c)]
#0  0x0054b2d5 in oblookup (obarray=<optimized out>, ptr=<optimized out>,
    size=10, size_byte=<optimized out>) at lread.c:3905
3905            if (SBYTES (SYMBOL_NAME (tail)) == size_byte

(gdb) thr 1
[Switching to thread 1 (Thread 1564.0xa8c)]
#0  0x0054b2d5 in oblookup (obarray=<optimized out>, ptr=<optimized out>,
    size=10, size_byte=<optimized out>) at lread.c:3905
3905            if (SBYTES (SYMBOL_NAME (tail)) == size_byte
(gdb) bt
#0  0x0054b2d5 in oblookup (obarray=<optimized out>, ptr=<optimized out>,
    size=10, size_byte=<optimized out>) at lread.c:3905
#1  0x0054b678 in intern_c_string_1 (
    str=0x779503 <targets.14003+4547> ":keepalive", len=10) at lread.c:3715
#2  0x0056b76b in intern_c_string (
    str=0x779503 <targets.14003+4547> ":keepalive") at lisp.h:3332
#3  init_process_emacs () at process.c:7144
#4  0x004bf335 in main (argc=<optimized out>, argv=0x22abc0) at emacs.c:1464
(gdb) fr3
Undefined command: "fr3".  Try "help".
(gdb) fr 3
#3  init_process_emacs () at process.c:7144
7144         subfeatures = pure_cons (intern_c_string (sopt->name), subfeatures);

As there seems to be no good memory debuggers for Cygwin, this will be hard to find.  I still think it is an GLib/Cygwin error.

	Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 16:59:01 GMT) Full text and rfc822 format available.

Message #38 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org, angelo.graziosi <at> alice.it
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 19:58:35 +0300

> From: Jan Djärv <jan.h.d <at> swipnet.se>
> Date: Tue, 11 Jun 2013 17:39:00 +0200
> Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org
> 
> Breakpoint 1, 0x610dcd26 in abort () from /usr/bin/cygwin1.dll
> (gdb) bt
> #0  0x610dcd26 in abort () from /usr/bin/cygwin1.dll
> #1  0x6a90d066 in g_spawn_close_pid () from /usr/bin/cygglib-2.0-0.dll
> #2  0x6a908e8c in g_private_set () from /usr/bin/cygglib-2.0-0.dll
> #3  0x6a8f06ce in g_thread_self () from /usr/bin/cygglib-2.0-0.dll
> #4  0x6a8ce250 in g_main_context_iteration () from /usr/bin/cygglib-2.0-0.dll
> #5  0x6a8ce2aa in g_main_context_iteration () from /usr/bin/cygglib-2.0-0.dll
> #6  0x6a8f017d in g_thread_proxy () from /usr/bin/cygglib-2.0-0.dll
> #7  0x610ffe1a in pthread::thread_init_wrapper(void*) ()
>    from /usr/bin/cygwin1.dll
> #8  0x6108974c in thread_wrapper(void*) () from /usr/bin/cygwin1.dll

Can you find out (by looking at the glib sources) when and why would
g_spawn_close_pid call 'abort'?  It might give us some clues.

Also, what process (or is it a thread?) did glib spawn here, and why?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 18:12:02 GMT) Full text and rfc822 format available.

Message #41 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>,
 Jan Djärv <jan.h.d <at> swipnet.se>, kbrown <at> cornell.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 11:10:32 -0700

On 06/11/13 08:13, Angelo Graziosi wrote:
> $ ../configure --without-all
> ...
>   Does Emacs use a file notification library?             yes -lgio (gfile)

That's a bug; --without-all should disable file notification.

I installed a fix in trunk bzr 112928.
You'll also need --without-x (or some other non-glib X toolkit)
to suppress glib.

Please update to the trunk and then run:

   ./autogen.sh
   ./configure --without-all --without-x

This should build you a glib-less Emacs.  On my Fedora 17 host,
the shell command 'ldd src/temacs' reports:

	linux-vdso.so.1 =>  (0x00007fffcbffe000)
	libacl.so.1 => /lib64/libacl.so.1 (0x000000386cc00000)
	librt.so.1 => /lib64/librt.so.1 (0x000000385ea00000)
	libtinfo.so.5 => /lib64/libtinfo.so.5 (0x0000003868a00000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x000000385e600000)
	libm.so.6 => /lib64/libm.so.6 (0x000000385de00000)
	libc.so.6 => /lib64/libc.so.6 (0x000000385da00000)
	libattr.so.1 => /lib64/libattr.so.1 (0x000000386ba00000)
	/lib64/ld-linux-x86-64.so.2 (0x000000385d600000)

Arguably, --without-all should disable some of these
remaining libraries, too; but at least it now disables glib.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 18:52:02 GMT) Full text and rfc822 format available.

Message #44 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Jan Djärv <jan.h.d <at> swipnet.se>, 14569 <at> debbugs.gnu.org,
 angelo.graziosi <at> alice.it
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 11:50:37 -0700

On 06/11/13 09:58, Eli Zaretskii wrote:
> Can you find out (by looking at the glib sources) when and why would
> g_spawn_close_pid call 'abort'?  It might give us some clues.

On POSIX platforms, g_spawn_close_pid does nothing.
So apparently glib is compiled for Windows (i.e.,
glib/gspawn.c is not being compiled, but glib/gspawn-win32.c
is being compiled instead.

The Emacs code that tickles gnulib is written this way:

#if defined HAVE_GLIB && !defined WINDOWSNT
      /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
	 this should always fail, but is enough to initialize glib's
	 private SIGCHLD handler.  */
      g_source_unref (g_child_watch_source_new (getpid ()));
#endif

I did notice one problem: the code previously invoked
g_child_watch_source_new (0), which is not safe if Emacs
has already forked -- perhaps Cygwin was doing that?
So I changed it to g_child_watch_source_new (getpid ())
in trunk bzr 112929.

Another thought is that there may be a mismatch between
glib builds.  Since WINDOWSNT is not defined for Cygwin builds,
a Cygwin Emacs will call g_child_watch_source_new.  My reading of
the bleeding-edge glib source code is that a Cygwin glib should call
waitpid and mess with the SIGCHLD handler, just as a
POSIX glib would, so the above Emacs code is correct.
But if you're building under Cygwin and linking with
a mingw glib, the above code may well run into problems.
Is this a possibility that we should worry about?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 19:28:02 GMT) Full text and rfc822 format available.

Message #47 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, 14569 <at> debbugs.gnu.org,
 angelo.graziosi <at> alice.it
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 15:26:56 -0400

[Message part 1 (text/plain, inline)]

On 6/11/2013 2:50 PM, Paul Eggert wrote:
> On 06/11/13 09:58, Eli Zaretskii wrote:
>> Can you find out (by looking at the glib sources) when and why would
>> g_spawn_close_pid call 'abort'?  It might give us some clues.
>
> On POSIX platforms, g_spawn_close_pid does nothing.
> So apparently glib is compiled for Windows (i.e.,
> glib/gspawn.c is not being compiled, but glib/gspawn-win32.c
> is being compiled instead.

No, this is not the case.  I just replicated the glib build to make 
sure.  Cygwin is a POSIX platform, to the extent possible.

> The Emacs code that tickles gnulib is written this way:
>
> #if defined HAVE_GLIB && !defined WINDOWSNT
>        /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
> 	 this should always fail, but is enough to initialize glib's
> 	 private SIGCHLD handler.  */
>        g_source_unref (g_child_watch_source_new (getpid ()));
> #endif
>
> I did notice one problem: the code previously invoked
> g_child_watch_source_new (0), which is not safe if Emacs
> has already forked -- perhaps Cygwin was doing that?
> So I changed it to g_child_watch_source_new (getpid ())
> in trunk bzr 112929.
>
> Another thought is that there may be a mismatch between
> glib builds.  Since WINDOWSNT is not defined for Cygwin builds,
> a Cygwin Emacs will call g_child_watch_source_new.  My reading of
> the bleeding-edge glib source code is that a Cygwin glib should call
> waitpid and mess with the SIGCHLD handler, just as a
> POSIX glib would, so the above Emacs code is correct.
> But if you're building under Cygwin and linking with
> a mingw glib, the above code may well run into problems.
> Is this a possibility that we should worry about?

No.  This does not happen.  The Cygwin glib maintainer takes pains to 
patch the source if necessary to make sure that Cygwin is not treated 
like Windows.  See, for instance, the attached patch that is used in the 
Cygwin build.

Ken

[2.32.1-not-win32.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 19:54:02 GMT) Full text and rfc822 format available.

Message #50 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org, angelo.graziosi <at> alice.it
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 22:53:16 +0300

> Date: Tue, 11 Jun 2013 15:26:56 -0400
> From: Ken Brown <kbrown <at> cornell.edu>
> CC: Eli Zaretskii <eliz <at> gnu.org>, 14569 <at> debbugs.gnu.org,
>         angelo.graziosi <at> alice.it
> 
> No.  This does not happen.  The Cygwin glib maintainer takes pains to 
> patch the source if necessary to make sure that Cygwin is not treated 
> like Windows.  See, for instance, the attached patch that is used in the 
> Cygwin build.

So, in this patched glib, what does g_spawn_close_pid do, and under
what circumstances could it call 'abort'?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 20:08:01 GMT) Full text and rfc822 format available.

Message #53 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org, angelo.graziosi <at> alice.it
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 16:06:30 -0400

On 6/11/2013 3:53 PM, Eli Zaretskii wrote:
>> Date: Tue, 11 Jun 2013 15:26:56 -0400
>> From: Ken Brown <kbrown <at> cornell.edu>
>> CC: Eli Zaretskii <eliz <at> gnu.org>, 14569 <at> debbugs.gnu.org,
>>          angelo.graziosi <at> alice.it
>>
>> No.  This does not happen.  The Cygwin glib maintainer takes pains to
>> patch the source if necessary to make sure that Cygwin is not treated
>> like Windows.  See, for instance, the attached patch that is used in the
>> Cygwin build.
>
> So, in this patched glib, what does g_spawn_close_pid do, and under
> what circumstances could it call 'abort'?

It does nothing.  So Jan's backtrace is suspect.  I don't know if that 
could result from optimization, but I'll build a non-optimized glib and 
see if I can get a more reliable backtrace.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 20:14:02 GMT) Full text and rfc822 format available.

Message #56 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>,
 Jan Djärv <jan.h.d <at> swipnet.se>, kbrown <at> cornell.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 22:13:12 +0200

Ciao Paul,

Il 11/06/2013 20.10, Paul Eggert ha scritto:
> On 06/11/13 08:13, Angelo Graziosi wrote:
>> $ ../configure --without-all
>> ...
>>    Does Emacs use a file notification library?             yes -lgio (gfile)
>
> That's a bug; --without-all should disable file notification.
>
> I installed a fix in trunk bzr 112928.
> You'll also need --without-x (or some other non-glib X toolkit)
> to suppress glib.
>
> Please update to the trunk and then run:
>
>     ./autogen.sh
>     ./configure --without-all --without-x



obviously the car without the wheels doesn't crash! ;)

Ciao,
 Angelo.


>
> This should build you a glib-less Emacs.  On my Fedora 17 host,
> the shell command 'ldd src/temacs' reports:
>
> 	linux-vdso.so.1 =>  (0x00007fffcbffe000)
> 	libacl.so.1 => /lib64/libacl.so.1 (0x000000386cc00000)
> 	librt.so.1 => /lib64/librt.so.1 (0x000000385ea00000)
> 	libtinfo.so.5 => /lib64/libtinfo.so.5 (0x0000003868a00000)
> 	libpthread.so.0 => /lib64/libpthread.so.0 (0x000000385e600000)
> 	libm.so.6 => /lib64/libm.so.6 (0x000000385de00000)
> 	libc.so.6 => /lib64/libc.so.6 (0x000000385da00000)
> 	libattr.so.1 => /lib64/libattr.so.1 (0x000000386ba00000)
> 	/lib64/ld-linux-x86-64.so.2 (0x000000385d600000)
>
> Arguably, --without-all should disable some of these
> remaining libraries, too; but at least it now disables glib.
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 20:59:01 GMT) Full text and rfc822 format available.

Message #59 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org,
 angelo.graziosi <at> alice.it
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 22:58:00 +0200

Hello.

11 jun 2013 kl. 22:06 skrev Ken Brown <kbrown <at> cornell.edu>:

> On 6/11/2013 3:53 PM, Eli Zaretskii wrote:
>>> Date: Tue, 11 Jun 2013 15:26:56 -0400
>>> From: Ken Brown <kbrown <at> cornell.edu>
>>> CC: Eli Zaretskii <eliz <at> gnu.org>, 14569 <at> debbugs.gnu.org,
>>>         angelo.graziosi <at> alice.it
>>> 
>>> No.  This does not happen.  The Cygwin glib maintainer takes pains to
>>> patch the source if necessary to make sure that Cygwin is not treated
>>> like Windows.  See, for instance, the attached patch that is used in the
>>> Cygwin build.
>> 
>> So, in this patched glib, what does g_spawn_close_pid do, and under
>> what circumstances could it call 'abort'?
> 
> It does nothing.  So Jan's backtrace is suspect.  I don't know if that could result from optimization, but I'll build a non-optimized glib and see if I can get a more reliable backtrace.
> 

It is suspect, the error message does belong to g_private_set (frame #2).  Frame #1 should have been g_thread_abort.  If there is indeed a memory corruption, such as a stack overwrite, that might explain it.  Or it might just be that gdb is in error.

The build BTW, was un-optimized.

	Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 11 Jun 2013 21:01:02 GMT) Full text and rfc822 format available.

Message #62 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org,
 angelo.graziosi <at> alice.it
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 11 Jun 2013 22:59:48 +0200

Hello.

11 jun 2013 kl. 22:58 skrev Jan Djärv <jan.h.d <at> swipnet.se>:

> 
> It is suspect, the error message does belong to g_private_set (frame #2).  Frame #1 should have been g_thread_abort.  If there is indeed a memory corruption, such as a stack overwrite, that might explain it.  Or it might just be that gdb is in error.
> 
> The build BTW, was un-optimized.

That is the Emacs build, I did not build GLib, so that was probably optimized.
We have seen in Emacs that breaks in abort shows strange backtraces.

	Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Wed, 12 Jun 2013 04:30:05 GMT) Full text and rfc822 format available.

Message #65 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Wed, 12 Jun 2013 13:29:06 +0900

Katsumi Yamaoka wrote:
> Bootstrap got to fail on Cygwin since yesterday.  An error occurs
> at least when performing batch-update-autoloads as follows:

> [...]
> Wrote /Work/emacs/lisp/mh-e/mh-loaddefs.el
> (No changes need to be saved)
> EMACSLOADPATH=/Work/emacs/lisp LC_ALL=C /Work/emacs/src/bootstrap-emacs.exe \
>  -batch --no-site-file --no-site-lisp -l autoload \
>  --eval "(setq generate-autoload-cookie \";;;###tramp-autoload\")" \
>  --eval "(setq generated-autoload-file (unmsys--file-name
> \"/Work/emacs/lisp/net/tramp-loaddefs.el\"))" \
>  --eval "(setq make-backup-files nil)" \
>  -f batch-update-autoloads /Work/emacs/lisp/net
> GLib (gthread-posix.c): Unexpected error from C library during
> pthread_setspecific': Invalid argument.  Aborting.
> Makefile:392: recipe for target `/Work/emacs/lisp/net/tramp-loaddefs.el' failed
> make[3]: *** [/Work/emacs/lisp/net/tramp-loaddefs.el] Aborted
> [...]
> make: *** [bootstrap] Error 2

After that I tried `make -k' for four times and I got to get no
such error.  Though it must not be a right solution, the built
one works so far.  In the early three `make -k', the error
happened when byte-compiling an *.el file, but each time an *.el
file causing the error varied.  So, this isn't due to a Lisp code,
I guess.  For instance, one happened when compiling nntp.el, but
I can build Ma Gnus (from the Git master) without causing such
an error using this Emacs.  Anyway I will report it again if I
have a chance to get a C backtrace or other.

In GNU Emacs 24.3.50.1 (i686-pc-cygwin, X toolkit, Xaw3d scroll bars)
 of 2013-06-12 on localhost
Bzr revision: 112936 yamaoka <at> jpl.org-20130612013823-xw8ar9emw320nl12
Windowing system distributor `The Cygwin/X Project', version 11.0.11401000
Configured using:
 `configure --verbose --with-x-toolkit=lucid --without-imagemagick
 --without-dbus --without-gconf --without-gsettings'

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Wed, 12 Jun 2013 07:01:04 GMT) Full text and rfc822 format available.

Message #68 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Cc: Eli Zaretskii <eliz <at> gnu.org>, "eggert <at> cs.ucla.edu" <eggert <at> cs.ucla.edu>,
 Ken Brown <kbrown <at> cornell.edu>,
 "angelo.graziosi <at> alice.it" <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Wed, 12 Jun 2013 09:00:13 +0200

[Message part 1 (text/plain, inline)]

Hi.

Paul Eggert wrote:

> I did notice one problem: the code previously invoked g_child_watch_source_new (0), which is not safe if Emacs has already forked -- perhaps Cygwin was doing that? So I changed it to g_child_watch_source_new (getpid ()) in trunk bzr 112929.

It is crasches much less with this change, about one in three builds.  Previously it crasched on every build. 
Some sort of race condition, perhaps?

As for what the threads do, I don't know. There are five threads created when Emacs byte-compiles one file.

      Jan D.

[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Wed, 12 Jun 2013 18:34:01 GMT) Full text and rfc822 format available.

Message #71 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: Eli Zaretskii <eliz <at> gnu.org>,
 "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Ken Brown <kbrown <at> cornell.edu>,
 "angelo.graziosi <at> alice.it" <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Wed, 12 Jun 2013 11:33:34 -0700

On 06/12/13 00:00, Jan Djärv wrote:
 
> It crashes much less with this change

That's surprising -- I'd expect it to either crash not
at all, or to crash just as often as before.

Can you strace a failing instance?  The syscall pattern
may help explain things.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Wed, 12 Jun 2013 20:12:02 GMT) Full text and rfc822 format available.

Message #74 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>,
 Jan Djärv <jan.h.d <at> swipnet.se>,
 Eli Zaretskii <eliz <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Wed, 12 Jun 2013 22:11:25 +0200

Just for completeness,

Il 12/06/2013 20.33, Paul Eggert ha scritto:
> On 06/12/13 00:00, Jan Djärv wrote:
>
>> It crashes much less with this change
>
> That's surprising -- I'd expect it to either crash not
> at all, or to crash just as often as before.
>
> Can you strace a failing instance?  The syscall pattern
> may help explain things.
>

here it crashes (on different .el files) every time I try to build.


Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 13 Jun 2013 07:09:02 GMT) Full text and rfc822 format available.

Message #77 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 Eli Zaretskii <eliz <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 13 Jun 2013 09:08:26 +0200

Hello. 

12 jun 2013 kl. 22:11 skrev Angelo Graziosi <angelo.graziosi <at> alice.it>:

> Just for completeness,
> 
> Il 12/06/2013 20.33, Paul Eggert ha scritto:
>> On 06/12/13 00:00, Jan Djärv wrote:
>> 
>>> It crashes much less with this change
>> 
>> That's surprising -- I'd expect it to either crash not
>> at all, or to crash just as often as before.
>> 
>> Can you strace a failing instance?  The syscall pattern
>> may help explain things.
>> 
> 
> here it crashes (on different .el files) every time I try to build.
> 

Well, the crashes are kind of random, so maybe the randomness just shifted a bit on my system?

I do get segmentation violations sometimes instead of the pthread abort.  
But I haven't been able to get one while running the debugger yet 

     Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 13 Jun 2013 17:40:02 GMT) Full text and rfc822 format available.

Message #80 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Ken Brown <kbrown <at> cornell.edu>, Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 13 Jun 2013 10:39:22 -0700

On 06/13/13 00:08, Jan Djärv wrote:
> I do get segmentation violations sometimes instead of the pthread abort.  
> But I haven't been able to get one while running the debugger yet 

Which version of glib are you using?  Older versions
require special handholding with initialization,
e.g., g_type_init, and perhaps we're running into
that problem -- or perhaps you're using a newer
version and its autoinitialization code isn't working.

Also, a bit of Googling found this bug:

http://cygwin.com/ml/cygwin/2012-05/msg00472.html

where signals may nor may not reach the correct thread
with pthread_kill.  Emacs uses pthread_kill to redirect
SIGCHLD to the main thread; if this is sent to a random
thread instead, that could explain the random crashes
you're observing (maybe a recursive runaway in a
signal handler?).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 14 Jun 2013 09:12:01 GMT) Full text and rfc822 format available.

Message #83 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Eli Zaretskii <eliz <at> gnu.org>,
 Ken Brown <kbrown <at> cornell.edu>, Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 14 Jun 2013 11:11:04 +0200

Hi.

13 jun 2013 kl. 19:39 skrev Paul Eggert <eggert <at> cs.ucla.edu>:

> On 06/13/13 00:08, Jan Djärv wrote:
>> I do get segmentation violations sometimes instead of the pthread abort.  
>> But I haven't been able to get one while running the debugger yet 
> 
> Which version of glib are you using?  Older versions
> require special handholding with initialization,
> e.g., g_type_init, and perhaps we're running into
> that problem -- or perhaps you're using a newer
> version and its autoinitialization code isn't working.

Glib 2.32.3.  If g_type_init isn't run, an error message will be shown at once.  As it is now, Emacs runs for a bit before crashing.

> 
> Also, a bit of Googling found this bug:
> 
> http://cygwin.com/ml/cygwin/2012-05/msg00472.html
> 
> where signals may nor may not reach the correct thread
> with pthread_kill.  Emacs uses pthread_kill to redirect
> SIGCHLD to the main thread; if this is sent to a random
> thread instead, that could explain the random crashes
> you're observing (maybe a recursive runaway in a
> signal handler?).

Could be.  Unfortunately, Emacs does not crash when running under strace.

	Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 14 Jun 2013 17:46:01 GMT) Full text and rfc822 format available.

Message #86 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: cygwin <at> cygwin.com
Cc: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 14 Jun 2013 10:45:47 -0700

Cygwin developers, I'm worried about a Cygwin bug where
pthread_kill may not send a signal to the correct thread.
This bug may be causing Emacs to crash.  The Cygwin bug is
discussed in this thread:

http://cygwin.com/ml/cygwin/2012-05/msg00472.html

Emacs uses pthread_kill to redirect
SIGCHLD to the main thread; if this is sent to a random
thread instead, that could explain the random crashes.

My question is: does this bug still exist with Cygwin,
and if so is it likely to get fixed soon?

More details about the Emacs bug can be found here:

  http://bugs.gnu.org/14569

Briefly, Emacs is crashing randomly on Cygwin ever since it started
doing this:

  /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
     this should always fail, but is enough to initialize glib's            
     private SIGCHLD handler.  */
  g_source_unref (g_child_watch_source_new (getpid ()));

After this newly-inserted code, Emacs finds out what the
child signal handler was:

  /* Now, find out what glib's signal handler was, and store it
     into lib_child_handler.  */
  struct sigaction action, old_action;
  emacs_sigaction_init (&action, deliver_child_signal);
  sigaction (SIGCHLD, &action, &old_action);
  eassert (! (old_action.sa_flags & SA_SIGINFO));
  if (old_action.sa_handler != SIG_DFL && old_action.sa_handler != SIG_IGN
      && old_action.sa_handler != deliver_child_signal)
    lib_child_handler = old_action.sa_handler;

Emacs's SIGCHILD handler, deliver_child_signal, arranges the
signal handling to occur in the main thread (to avoid races
within Emacs), like this:

  int old_errno = errno;
  bool on_main_thread = true;
  if (! pthread_equal (pthread_self (), main_thread))
    {
      sigset_t blocked;
      sigemptyset (&blocked);
      sigaddset (&blocked, sig);
      pthread_sigmask (SIG_BLOCK, &blocked, 0);
      pthread_kill (main_thread, sig);
      on_main_thread = false;
    }
  if (on_main_thread)
    handle_child_signal (sig);
  errno = old_errno;

And handle_child_signal, which runs in the main thread, does
a bunch of Emacsish things and then invokes lib_child_handler (sig),
which is glib's SIGCHLD handler.

All this works just fine on Fedora and other platforms; but it
doesn't work on Cygwin.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 14 Jun 2013 18:13:01 GMT) Full text and rfc822 format available.

Message #89 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Christopher Faylor <cgf-use-the-mailinglist-please <at> cygwin.com>
To: cygwin <at> cygwin.com
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 14 Jun 2013 14:03:59 -0400

On Fri, Jun 14, 2013 at 10:45:47AM -0700, Paul Eggert wrote:
>Cygwin developers, I'm worried about a Cygwin bug where
>pthread_kill may not send a signal to the correct thread.
>This bug may be causing Emacs to crash.  The Cygwin bug is
>discussed in this thread:
>
>http://cygwin.com/ml/cygwin/2012-05/msg00472.html
>
>Emacs uses pthread_kill to redirect
>SIGCHLD to the main thread; if this is sent to a random
>thread instead, that could explain the random crashes.
>
>My question is: does this bug still exist with Cygwin,
>and if so is it likely to get fixed soon?

You pointed to an archived mail messages which implies that was fixed
more than a year ago.  What makes you think it is still a problem?

I'd expect that if it was still a problem our emacs maintainer would
be on top of it.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 14 Jun 2013 18:17:01 GMT) Full text and rfc822 format available.

Message #92 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: cygwin <at> cygwin.com, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 14 Jun 2013 21:16:44 +0300

> Date: Fri, 14 Jun 2013 10:45:47 -0700
> From: Paul Eggert <eggert <at> cs.ucla.edu>
> Cc: 14569 <at> debbugs.gnu.org
> 
> Cygwin developers, I'm worried about a Cygwin bug where
> pthread_kill may not send a signal to the correct thread.
> This bug may be causing Emacs to crash.  The Cygwin bug is
> discussed in this thread:
> 
> http://cygwin.com/ml/cygwin/2012-05/msg00472.html

Caveat: I'm not a Cygwin developer, and don't even use Cygwin.

> Emacs uses pthread_kill to redirect
> SIGCHLD to the main thread; if this is sent to a random
> thread instead, that could explain the random crashes.

It should be easy to instrument deliver_child_signal so that it prints
something when it redirects SIGCHLD, and then the Cygwin users could
see if there's such a report immediately before the crash, or at all.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 14 Jun 2013 18:47:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 14 Jun 2013 20:31:01 GMT) Full text and rfc822 format available.

Message #98 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: cygwin <at> cygwin.com
Cc: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 14 Jun 2013 16:22:26 -0400

On 6/14/2013 2:03 PM, Christopher Faylor wrote:
> On Fri, Jun 14, 2013 at 10:45:47AM -0700, Paul Eggert wrote:
>> Cygwin developers, I'm worried about a Cygwin bug where
>> pthread_kill may not send a signal to the correct thread.
>> This bug may be causing Emacs to crash.  The Cygwin bug is
>> discussed in this thread:
>>
>> http://cygwin.com/ml/cygwin/2012-05/msg00472.html
>>
>> Emacs uses pthread_kill to redirect
>> SIGCHLD to the main thread; if this is sent to a random
>> thread instead, that could explain the random crashes.
>>
>> My question is: does this bug still exist with Cygwin,
>> and if so is it likely to get fixed soon?
>
> You pointed to an archived mail messages which implies that was fixed
> more than a year ago.  What makes you think it is still a problem?
>
> I'd expect that if it was still a problem our emacs maintainer would
> be on top of it.

Unfortunately, the emacs maintainer doesn't have any idea why the recent 
emacs changes are causing random crashes on Cygwin.  It's almost 
impossible to catch this under gdb; and the one time it was caught, the 
backtrace didn't make sense.  Also, the crash doesn't occur when emacs 
is run under strace.

I'm not going to speculate on whether the problem is caused by a bug in 
Cygwin's pthread_kill.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 15 Jun 2013 06:27:02 GMT) Full text and rfc822 format available.

Message #101 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: 14569 <at> debbugs.gnu.org
Subject: Fwd: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 14 Jun 2013 23:25:50 -0700

[Forwarding this to 14569 <at> debbugs.gnu.org; I don't know how to
correlate Cygwin version 1.7.17 with the version numbers mentioned
in Bug#14569.]

-------- Original Message --------
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 15 Jun 2013 02:21:47 -0400
From: Christopher Faylor <cgf-use-the-mailinglist-please <at> cygwin.com>
Reply-To: cygwin <at> cygwin.com
To: cygwin <at> cygwin.com
CC: Paul Eggert <eggert <at> cs.ucla.edu>

On Fri, Jun 14, 2013 at 11:01:54PM -0700, Paul Eggert wrote:
>On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>You pointed to an archived mail messages which implies that was fixed
>>more than a year ago.  What makes you think it is still a problem?
>
>The message I pointed to
><http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>
>>Testcase signal/kill: Signals may or may not reach the correct thread
>>with 1.7.12-1 and newer.
>
>Confirmed.  I think the reason is that we only have a single event to
>signal that a POSIX signal arrived instead of a per-thread event, but
>I'm not sure.  This is cgf's domain so I leave it at that for now.
>
>I interpreted this to mean "the existence of the bug is confirmed,
>here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>see any followup message where cgf (is that you?) dealt with it.  My
>apologies if I misinterpreted the email.

Oops.  I didn't read Corinna's message as thoroughly as I should have.
Sorry.

That particular issue was supposed to have been fixed in Cygwin 1.7.17,
released in October 2012.

cgf

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 15 Jun 2013 07:05:01 GMT) Full text and rfc822 format available.

Message #104 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 15 Jun 2013 10:04:03 +0300

> Date: Fri, 14 Jun 2013 16:22:26 -0400
> From: Ken Brown <kbrown <at> cornell.edu>
> Cc: 14569 <at> debbugs.gnu.org
> 
> Unfortunately, the emacs maintainer doesn't have any idea why the recent 
> emacs changes are causing random crashes on Cygwin.  It's almost 
> impossible to catch this under gdb; and the one time it was caught, the 
> backtrace didn't make sense.  Also, the crash doesn't occur when emacs 
> is run under strace.

What are the difficulties of catching this when Emacs is run under
GDB?

P.S.  I removed the Cygwin list from the adressees, as I don't think
we have any evidence at this time that this is a Cygwin problem.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 15 Jun 2013 09:55:02 GMT) Full text and rfc822 format available.

Message #107 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 14569 <at> debbugs.gnu.org, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 15 Jun 2013 11:54:16 +0200

Hello.

15 jun 2013 kl. 09:04 skrev Eli Zaretskii <eliz <at> gnu.org>:

>> Date: Fri, 14 Jun 2013 16:22:26 -0400
>> From: Ken Brown <kbrown <at> cornell.edu>
>> Cc: 14569 <at> debbugs.gnu.org
>> 
>> Unfortunately, the emacs maintainer doesn't have any idea why the recent 
>> emacs changes are causing random crashes on Cygwin.  It's almost 
>> impossible to catch this under gdb; and the one time it was caught, the 
>> backtrace didn't make sense.  Also, the crash doesn't occur when emacs 
>> is run under strace.
> 
> What are the difficulties of catching this when Emacs is run under
> GDB?

Its not difficult, but takes some time.  The error happens when make compiles one lisp file at the time with emacs.  It appears to be random so it does not crash in the same lisp-file.  Thus, you have to start emacs from gdb, and repet run, quit in gdb for each lisp file until it crashes.  Running in gdb is slow with cygwin, and running in gdb makes the bug appear less often.

The make rule is in lisp/Makefile, compile-onefile.

	Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 15 Jun 2013 10:43:02 GMT) Full text and rfc822 format available.

Message #110 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: 14569 <at> debbugs.gnu.org, kbrown <at> cornell.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 15 Jun 2013 13:42:37 +0300

> From: Jan Djärv <jan.h.d <at> swipnet.se>
> Date: Sat, 15 Jun 2013 11:54:16 +0200
> Cc: Ken Brown <kbrown <at> cornell.edu>,
>  14569 <at> debbugs.gnu.org
> 
> > What are the difficulties of catching this when Emacs is run under
> > GDB?
> 
> Its not difficult, but takes some time.  The error happens when make compiles one lisp file at the time with emacs.  It appears to be random so it does not crash in the same lisp-file.  Thus, you have to start emacs from gdb, and repet run, quit in gdb for each lisp file until it crashes.  Running in gdb is slow with cygwin, and running in gdb makes the bug appear less often.
> 
> The make rule is in lisp/Makefile, compile-onefile.

Would it make things easier if compile-onefile is modified to invoke
Emacs under GDB to begin with, using the --args switch to GDB?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 15 Jun 2013 12:48:01 GMT) Full text and rfc822 format available.

Message #113 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 "kbrown <at> cornell.edu" <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 15 Jun 2013 14:47:07 +0200

Hello.

15 jun 2013 kl. 12:42 skrev Eli Zaretskii <eliz <at> gnu.org>:

>> From: Jan Djärv <jan.h.d <at> swipnet.se>
>> Date: Sat, 15 Jun 2013 11:54:16 +0200
>> Cc: Ken Brown <kbrown <at> cornell.edu>,
>> 14569 <at> debbugs.gnu.org
>> 
>>> What are the difficulties of catching this when Emacs is run under
>>> GDB?
>> 
>> Its not difficult, but takes some time.  The error happens when make compiles one lisp file at the time with emacs.  It appears to be random so it does not crash in the same lisp-file.  Thus, you have to start emacs from gdb, and repet run, quit in gdb for each lisp file until it crashes.  Running in gdb is slow with cygwin, and running in gdb makes the bug appear less often.
>> 
>> The make rule is in lisp/Makefile, compile-onefile.
> 
> Would it make things easier if compile-onefile is modified to invoke
> Emacs under GDB to begin with, using the --args switch to GDB?

That is what I do. 

      Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 15 Jun 2013 13:56:02 GMT) Full text and rfc822 format available.

Message #116 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Cygwin <cygwin <at> cygwin.com>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 15 Jun 2013 15:54:21 +0200

Christopher Faylor wrote
>>On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>>You pointed to an archived mail messages which implies that was fixed
>>>more than a year ago.  What makes you think it is still a problem?
>>
>>The message I pointed to
>><http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>>
>>>Testcase signal/kill: Signals may or may not reach the correct thread
>>>with 1.7.12-1 and newer.
>>
>>Confirmed.  I think the reason is that we only have a single event to
>>signal that a POSIX signal arrived instead of a per-thread event, but
>>I'm not sure.  This is cgf's domain so I leave it at that for now.
>>
>>I interpreted this to mean "the existence of the bug is confirmed,
>>here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>>see any followup message where cgf (is that you?) dealt with it.  My
>>apologies if I misinterpreted the email.
>
> Oops.  I didn't read Corinna's message as thoroughly as I should have.
> Sorry.
>
> That particular issue was supposed to have been fixed in Cygwin 1.7.17,
> released in October 2012.

Out of curiosity, I tried the test cases I found in that thread, more 
precisely here:

  http://cygwin.com/ml/cygwin/2012-05/msg00434.html


and the results are:

$ gcc otto_test1.c -o otto_test1
$ ./otto_test1
Testing deferred pthread_cancel()

Thread 0 starting (0x200102c0)
Thread 1 starting (0x20010360)
Thread 2 starting (0x20010400)

Cancelling thread 2 (0x20010400)
Thread 2 exiting (0x20010400)
Cancelling thread 1 (0x20010360)
Thread 1 exiting (0x20010360)
Cancelling thread 0 (0x200102c0)
Thread 0 exiting (0x200102c0)

Thread 0 is gone (0x200102c0)
Thread 1 is gone (0x20010360)
Thread 2 is gone (0x20010400)

$ gcc otto_test2.c -o otto_test2
$ ./otto_test2
Testing asynchronous pthread_cancel()

Thread 0 starting (0x200102c0)
Changing canceltype from 0 to 1
Thread 1 starting (0x20010360)
Changing canceltype from 0 to 1
Thread 2 starting (0x20010400)
Changing canceltype from 0 to 1

Cancelling thread 2 (0x20010400)
Thread 2 exiting (0x20010400)
Cancelling thread 1 (0x20010360)
Thread 1 exiting (0x20010360)
Cancelling thread 0 (0x200102c0)
Thread 0 exiting (0x200102c0)

Thread 0 is gone (0x200102c0)
Thread 1 is gone (0x20010360)
Thread 2 is gone (0x20010400)

$ gcc otto_test3.c -o otto_test3
$ ./otto_test3
Testing pthread_kill()

Thread 0 starting (0x200102c0)
Thread 1 starting (0x20010360)
Thread 2 starting (0x20010400)

Sending SIGUSR1 to thread 2 (0x20010400)
Thread 2 executes signal handler (0x20010400)
Thread 2 encountered an error: Interrupted system call (0x20010400)
Sending SIGUSR1 to thread 1 (0x20010360)
Thread 1 executes signal handler (0x20010360)
Thread 1 encountered an error: Interrupted system call (0x20010360)
Sending SIGUSR1 to thread 0 (0x200102c0)
Thread 0 executes signal handler (0x200102c0)
Thread 0 encountered an error: Interrupted system call (0x200102c0)

Are the errors in the last test case to be expected under the 20130612 
snapshot (CYGWIN_NT-5.1, 1.7.21s 20130612 21:06:59, i686 Cygwin)?


Ciao,
Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 16 Jun 2013 13:13:02 GMT) Full text and rfc822 format available.

Message #119 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: cygwin <at> cygwin.com, Paul Eggert <eggert <at> cs.ucla.edu>, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 16 Jun 2013 09:11:21 -0400

[Message part 1 (text/plain, inline)]

[Adding the bug address back to the CC so that this gets archived.]

On 6/15/2013 9:54 AM, Angelo Graziosi wrote:
> Christopher Faylor wrote
>>> On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>>> You pointed to an archived mail messages which implies that was fixed
>>>> more than a year ago.  What makes you think it is still a problem?
>>>
>>> The message I pointed to
>>> <http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>>>
>>>> Testcase signal/kill: Signals may or may not reach the correct thread
>>>> with 1.7.12-1 and newer.
>>>
>>> Confirmed.  I think the reason is that we only have a single event to
>>> signal that a POSIX signal arrived instead of a per-thread event, but
>>> I'm not sure.  This is cgf's domain so I leave it at that for now.
>>>
>>> I interpreted this to mean "the existence of the bug is confirmed,
>>> here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>>> see any followup message where cgf (is that you?) dealt with it.  My
>>> apologies if I misinterpreted the email.
>>
>> Oops.  I didn't read Corinna's message as thoroughly as I should have.
>> Sorry.
>>
>> That particular issue was supposed to have been fixed in Cygwin 1.7.17,
>> released in October 2012.
>
> Out of curiosity, I tried the test cases I found in that thread, more
> precisely here:
>
>    http://cygwin.com/ml/cygwin/2012-05/msg00434.html
>
>
> and the results are:
>
> $ gcc otto_test1.c -o otto_test1
> $ ./otto_test1
> Testing deferred pthread_cancel()
>
> Thread 0 starting (0x200102c0)
> Thread 1 starting (0x20010360)
> Thread 2 starting (0x20010400)
>
> Cancelling thread 2 (0x20010400)
> Thread 2 exiting (0x20010400)
> Cancelling thread 1 (0x20010360)
> Thread 1 exiting (0x20010360)
> Cancelling thread 0 (0x200102c0)
> Thread 0 exiting (0x200102c0)
>
> Thread 0 is gone (0x200102c0)
> Thread 1 is gone (0x20010360)
> Thread 2 is gone (0x20010400)
>
> $ gcc otto_test2.c -o otto_test2
> $ ./otto_test2
> Testing asynchronous pthread_cancel()
>
> Thread 0 starting (0x200102c0)
> Changing canceltype from 0 to 1
> Thread 1 starting (0x20010360)
> Changing canceltype from 0 to 1
> Thread 2 starting (0x20010400)
> Changing canceltype from 0 to 1
>
> Cancelling thread 2 (0x20010400)
> Thread 2 exiting (0x20010400)
> Cancelling thread 1 (0x20010360)
> Thread 1 exiting (0x20010360)
> Cancelling thread 0 (0x200102c0)
> Thread 0 exiting (0x200102c0)
>
> Thread 0 is gone (0x200102c0)
> Thread 1 is gone (0x20010360)
> Thread 2 is gone (0x20010400)
>
> $ gcc otto_test3.c -o otto_test3
> $ ./otto_test3
> Testing pthread_kill()
>
> Thread 0 starting (0x200102c0)
> Thread 1 starting (0x20010360)
> Thread 2 starting (0x20010400)
>
> Sending SIGUSR1 to thread 2 (0x20010400)
> Thread 2 executes signal handler (0x20010400)
> Thread 2 encountered an error: Interrupted system call (0x20010400)
> Sending SIGUSR1 to thread 1 (0x20010360)
> Thread 1 executes signal handler (0x20010360)
> Thread 1 encountered an error: Interrupted system call (0x20010360)
> Sending SIGUSR1 to thread 0 (0x200102c0)
> Thread 0 executes signal handler (0x200102c0)
> Thread 0 encountered an error: Interrupted system call (0x200102c0)
>
> Are the errors in the last test case to be expected under the 20130612
> snapshot (CYGWIN_NT-5.1, 1.7.21s 20130612 21:06:59, i686 Cygwin)?

I can replicate this on my system, consistently.  There's clearly a 
problem, but it's not the same as in the original Cygwin bug report.  In 
the present case, the signal is received by the right thread, but 
something goes wrong afterwards.

I'm attaching the test case for ease of reference.

Ken

[otto_test3.c (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 16 Jun 2013 17:53:01 GMT) Full text and rfc822 format available.

Message #122 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: cygwin <at> cygwin.com
Cc: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 16 Jun 2013 13:51:33 -0400


On 6/16/2013 11:01 AM, Christopher Faylor wrote:
> On Sun, Jun 16, 2013 at 09:11:21AM -0400, Ken Brown wrote:
>> [Adding the bug address back to the CC so that this gets archived.]
>>
>> On 6/15/2013 9:54 AM, Angelo Graziosi wrote:
>>> Christopher Faylor wrote
>>>>> On 06/14/2013 11:03 AM, Christopher Faylor wrote:
>>>>>> You pointed to an archived mail messages which implies that was fixed
>>>>>> more than a year ago.  What makes you think it is still a problem?
>>>>>
>>>>> The message I pointed to
>>>>> <http://cygwin.com/ml/cygwin/2012-05/msg00472.html> says this:
>>>>>
>>>>>> Testcase signal/kill: Signals may or may not reach the correct thread
>>>>>> with 1.7.12-1 and newer.
>>>>>
>>>>> Confirmed.  I think the reason is that we only have a single event to
>>>>> signal that a POSIX signal arrived instead of a per-thread event, but
>>>>> I'm not sure.  This is cgf's domain so I leave it at that for now.
>>>>>
>>>>> I interpreted this to mean "the existence of the bug is confirmed,
>>>>> here's why the bug occurs, and I'll let cgf deal with it".  I didn't
>>>>> see any followup message where cgf (is that you?) dealt with it.  My
>>>>> apologies if I misinterpreted the email.
>>>>
>>>> Oops.  I didn't read Corinna's message as thoroughly as I should have.
>>>> Sorry.
>>>>
>>>> That particular issue was supposed to have been fixed in Cygwin 1.7.17,
>>>> released in October 2012.
>>>
>>> Out of curiosity, I tried the test cases I found in that thread, more
>>> precisely here:
>>>
>>>     http://cygwin.com/ml/cygwin/2012-05/msg00434.html
>>>
>>>
>>> and the results are:
>>>
>>> $ gcc otto_test1.c -o otto_test1
>>> $ ./otto_test1
>>> Testing deferred pthread_cancel()
>>>
>>> Thread 0 starting (0x200102c0)
>>> Thread 1 starting (0x20010360)
>>> Thread 2 starting (0x20010400)
>>>
>>> Cancelling thread 2 (0x20010400)
>>> Thread 2 exiting (0x20010400)
>>> Cancelling thread 1 (0x20010360)
>>> Thread 1 exiting (0x20010360)
>>> Cancelling thread 0 (0x200102c0)
>>> Thread 0 exiting (0x200102c0)
>>>
>>> Thread 0 is gone (0x200102c0)
>>> Thread 1 is gone (0x20010360)
>>> Thread 2 is gone (0x20010400)
>>>
>>> $ gcc otto_test2.c -o otto_test2
>>> $ ./otto_test2
>>> Testing asynchronous pthread_cancel()
>>>
>>> Thread 0 starting (0x200102c0)
>>> Changing canceltype from 0 to 1
>>> Thread 1 starting (0x20010360)
>>> Changing canceltype from 0 to 1
>>> Thread 2 starting (0x20010400)
>>> Changing canceltype from 0 to 1
>>>
>>> Cancelling thread 2 (0x20010400)
>>> Thread 2 exiting (0x20010400)
>>> Cancelling thread 1 (0x20010360)
>>> Thread 1 exiting (0x20010360)
>>> Cancelling thread 0 (0x200102c0)
>>> Thread 0 exiting (0x200102c0)
>>>
>>> Thread 0 is gone (0x200102c0)
>>> Thread 1 is gone (0x20010360)
>>> Thread 2 is gone (0x20010400)
>>>
>>> $ gcc otto_test3.c -o otto_test3
>>> $ ./otto_test3
>>> Testing pthread_kill()
>>>
>>> Thread 0 starting (0x200102c0)
>>> Thread 1 starting (0x20010360)
>>> Thread 2 starting (0x20010400)
>>>
>>> Sending SIGUSR1 to thread 2 (0x20010400)
>>> Thread 2 executes signal handler (0x20010400)
>>> Thread 2 encountered an error: Interrupted system call (0x20010400)
>>> Sending SIGUSR1 to thread 1 (0x20010360)
>>> Thread 1 executes signal handler (0x20010360)
>>> Thread 1 encountered an error: Interrupted system call (0x20010360)
>>> Sending SIGUSR1 to thread 0 (0x200102c0)
>>> Thread 0 executes signal handler (0x200102c0)
>>> Thread 0 encountered an error: Interrupted system call (0x200102c0)
>>>
>>> Are the errors in the last test case to be expected under the 20130612
>>> snapshot (CYGWIN_NT-5.1, 1.7.21s 20130612 21:06:59, i686 Cygwin)?
>>
>> I can replicate this on my system, consistently.  There's clearly a
>> problem, but it's not the same as in the original Cygwin bug report.  In
>> the present case, the signal is received by the right thread, but
>> something goes wrong afterwards.
>
> Try it on Linux.  I don't see any difference.  "An error" in this case
> seems to be the script working as designed.
>
> % man sem_wait
>
>      SEM_WAIT(3)   Linux Programmer's Manual      SEM_WAIT(3)
>
>
>
>      NAME
> 	   sem_wait, sem_timedwait, sem_trywait - lock a semaphore
>
>      ...
>
>      ERRORS
> 	   EINTR  The call was interrupted by a signal handler; see signal(7).

Yeah, I missed that.  Sorry for the noise.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 16 Jun 2013 18:22:01 GMT) Full text and rfc822 format available.

Message #125 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: 14569 <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 16 Jun 2013 14:20:28 -0400

[Message part 1 (text/plain, inline)]

On 6/16/2013 1:51 PM, Ken Brown wrote:
>>      ERRORS
>>        EINTR  The call was interrupted by a signal handler; see
>> signal(7).

I've revised the test case (attached) so that it checks for EINTR.  It 
now runs as expected on Cygwin:

Testing pthread_kill()

Thread 0 starting (0x800102a8)
Thread 1 starting (0x80010348)
Thread 2 starting (0x800103e8)

Sending SIGUSR1 to thread 2 (0x800103e8)
Thread 2 executes signal handler (0x800103e8)
Thread 2 woke up just fine
Sending SIGUSR1 to thread 1 (0x80010348)
Thread 1 executes signal handler (0x80010348)
Thread 1 woke up just fine
Sending SIGUSR1 to thread 0 (0x800102a8)
Thread 0 executes signal handler (0x800102a8)
Thread 0 woke up just fine

So there's no reason to think that a pthread_kill bug is causing the 
problem.  Back to the drawing board.

Ken

[otto_test4.c (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 17 Jun 2013 01:58:02 GMT) Full text and rfc822 format available.

Message #128 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: Eli Zaretskii <eliz <at> gnu.org>,
 "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 16 Jun 2013 21:56:45 -0400

On 6/15/2013 8:47 AM, Jan Djärv wrote:
> Hello.
>
> 15 jun 2013 kl. 12:42 skrev Eli Zaretskii <eliz <at> gnu.org>:
>
>>> From: Jan Djärv <jan.h.d <at> swipnet.se>
>>> Date: Sat, 15 Jun 2013 11:54:16 +0200
>>> Cc: Ken Brown <kbrown <at> cornell.edu>,
>>> 14569 <at> debbugs.gnu.org
>>>
>>>> What are the difficulties of catching this when Emacs is run under
>>>> GDB?
>>>
>>> Its not difficult, but takes some time.  The error happens when make compiles one lisp file at the time with emacs.  It appears to be random so it does not crash in the same lisp-file.  Thus, you have to start emacs from gdb, and repet run, quit in gdb for each lisp file until it crashes.  Running in gdb is slow with cygwin, and running in gdb makes the bug appear less often.
>>>
>>> The make rule is in lisp/Makefile, compile-onefile.
>>
>> Would it make things easier if compile-onefile is modified to invoke
>> Emacs under GDB to begin with, using the --args switch to GDB?
>
> That is what I do.

Can you tell me exactly how you modified the rule for compile-onefile? 
I tried but kept getting errors, so I'm obviously not doing it right.

Thanks.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 17 Jun 2013 06:24:02 GMT) Full text and rfc822 format available.

Message #131 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Jan Djärv <jan.h.d <at> swipnet.se>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>,
 "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 17 Jun 2013 08:22:54 +0200

Hello. 


17 jun 2013 kl. 03:56 skrev Ken Brown <kbrown <at> cornell.edu>:

> 
> Can you tell me exactly how you modified the rule for compile-onefile? I tried but kept getting errors, so I'm obviously not doing it right.

There is something fishy with quoting, Cygwin and gdb so you have to comment out BIG_STACK_OPTS. 

Then I added:

emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb -ex 'b abort' -ex run --args $(EMACS) $(EMACSOPT)

and used it:

compile-onefile:
	echo Compiling $(THEFILE)
	@# Use byte-compile-refresh-preloaded to try and work around some of
	@# the most common bootstrapping problems.
	$(emacsdbg) $(BYTE_COMPILE_FLAGS) \
		-l bytecomp -f byte-compile-refresh-preloaded \
		-f batch-byte-compile $(THEFILE)

The modifications where done in the generated Makefile. I guess it will work in Makefile.in also. 

      Jan D.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 17 Jun 2013 15:07:01 GMT) Full text and rfc822 format available.

Message #134 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Jan Djärv <jan.h.d <at> swipnet.se>
Cc: 14569 <at> debbugs.gnu.org, kbrown <at> cornell.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 17 Jun 2013 18:06:44 +0300

> Cc: Eli Zaretskii <eliz <at> gnu.org>,
>  "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
> From: Jan Djärv <jan.h.d <at> swipnet.se>
> Date: Mon, 17 Jun 2013 08:22:54 +0200
> 
> emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb -ex 'b abort' -ex run --args $(EMACS) $(EMACSOPT)

This should make your life quality better, I think:

  emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb --batch-silent --return-child-result -ex 'b abort' -ex run --args $(EMACS) $(EMACSOPT)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 17 Jun 2013 20:16:02 GMT) Full text and rfc822 format available.

Message #137 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Jan Djärv <jan.h.d <at> swipnet.se>, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 17 Jun 2013 16:15:33 -0400

On 6/17/2013 11:06 AM, Eli Zaretskii wrote:
>> Cc: Eli Zaretskii <eliz <at> gnu.org>,
>>   "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
>> From: Jan Djärv <jan.h.d <at> swipnet.se>
>> Date: Mon, 17 Jun 2013 08:22:54 +0200
>>
>> emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb -ex 'b abort' -ex run --args $(EMACS) $(EMACSOPT)
>
> This should make your life quality better, I think:
>
>    emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb --batch-silent --return-child-result -ex 'b abort' -ex run --args $(EMACS) $(EMACSOPT)

This causes gdb to exit, whether or not the breakpoint was hit, without 
giving the user a chance to get a backtrace.  I tried adding

  -x $(lisp)/commands.txt

where commands.txt contains

  commands
  bt
  end

but that didn't work.  gdb still exited (with an error) when 
compile-onefile failed, but it didn't print a backtrace first.  There 
has to be a way to get a backtrace when gdb runs in batch mode.  Do you 
know how, Eli?

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 18 Jun 2013 15:54:02 GMT) Full text and rfc822 format available.

Message #140 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 18 Jun 2013 18:53:51 +0300

> Date: Mon, 17 Jun 2013 16:15:33 -0400
> From: Ken Brown <kbrown <at> cornell.edu>
> CC: Jan Djärv <jan.h.d <at> swipnet.se>, 14569 <at> debbugs.gnu.org
> 
> >    emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb --batch-silent --return-child-result -ex 'b abort' -ex run --args $(EMACS) $(EMACSOPT)
> 
> This causes gdb to exit, whether or not the breakpoint was hit, without 
> giving the user a chance to get a backtrace.  I tried adding
> 
>    -x $(lisp)/commands.txt
> 
> where commands.txt contains
> 
>    commands
>    bt
>    end
> 
> but that didn't work.  gdb still exited (with an error) when 
> compile-onefile failed, but it didn't print a backtrace first.  There 
> has to be a way to get a backtrace when gdb runs in batch mode.  Do you 
> know how, Eli?

Sorry, I went overboard with --batch-silent, please use --batch
instead.  (--batch-silent prevents the backtrace from showing up.)

As for displaying the backtrace, just add the "bt" command to the
chain, like this:

  emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb --batch --return-child-result -ex 'b abort' -ex run -ex bt -ex cont --args $(EMACS) $(EMACSOPT)

GDB executes the commands given via -ex in order, so think of this as
if you typed the commands whenever GDB shows its prompt.

Note that I also added "continue", to let Emacs exit abnormally after
hitting the breakpoint (or a segfault).  When neither the breakpoint
nor a fatal signal fire, GDB will say "No stack." when Emacs exits
normally, but that's harmless.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Wed, 19 Jun 2013 20:25:01 GMT) Full text and rfc822 format available.

Message #143 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Wed, 19 Jun 2013 16:24:02 -0400

[Message part 1 (text/plain, inline)]

On 6/18/2013 11:53 AM, Eli Zaretskii wrote:
> As for displaying the backtrace, just add the "bt" command to the
> chain, like this:
>
>    emacsdbg = EMACSLOADPATH=$(lisp) LC_ALL=C gdb --batch --return-child-result -ex 'b abort' -ex run -ex bt -ex cont --args $(EMACS) $(EMACSOPT)
>
> GDB executes the commands given via -ex in order, so think of this as
> if you typed the commands whenever GDB shows its prompt.

Thanks, Eli.  I replaced "bt" by "thread apply all bt", which probably 
didn't provide additional useful information.  I then ran "make 
bootstrap" followed by repeated "make -k" until the bootstrap completed. 
 Early in the process there were two crashes with SIGSEGV, for which I 
got backtraces (attached).  In both cases the crash occurred in 
gmalloc.c, which probably explains why we're seeing problems only on Cygwin.

After that there were many compile failures with errors like those that 
others have reported:

Compiling gnus/gnus-cache.el
GLib (gthread-posix.c): Unexpected error from C library during 
'pthread_setspecific': Invalid argument.  Aborting.
Makefile:254: recipe for target `gnus/gnus-cache.elc' failed

But these compilations didn't invoke gdb, apparently because they 
involved Makefile targets other than compile-onefile.  So I didn't get 
any more backtraces.

I can modify the Makefile further if necessary, but the attached 
backtraces are a start.

Ken

[backtrace5.txt (text/plain, attachment)]

[backtrace6.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 20 Jun 2013 02:46:02 GMT) Full text and rfc822 format available.

Message #146 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 20 Jun 2013 05:45:42 +0300

> Date: Wed, 19 Jun 2013 16:24:02 -0400
> From: Ken Brown <kbrown <at> cornell.edu>
> CC: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org,
>         Paul Eggert <eggert <at> cs.ucla.edu>
> 
> After that there were many compile failures with errors like those that 
> others have reported:
> 
> Compiling gnus/gnus-cache.el
> GLib (gthread-posix.c): Unexpected error from C library during 
> 'pthread_setspecific': Invalid argument.  Aborting.
> Makefile:254: recipe for target `gnus/gnus-cache.elc' failed
> 
> But these compilations didn't invoke gdb, apparently because they 
> involved Makefile targets other than compile-onefile.

No, I think these failures didn't go through 'abort', that's why you
didn't get the backtrace.  You need to look at the pthread sources in
the file mentioned, and find out where to put the breakpoint to catch
that error.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 20 Jun 2013 03:02:02 GMT) Full text and rfc822 format available.

Message #149 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Wed, 19 Jun 2013 23:00:36 -0400

On 6/19/2013 10:45 PM, Eli Zaretskii wrote:
>> Date: Wed, 19 Jun 2013 16:24:02 -0400
>> From: Ken Brown <kbrown <at> cornell.edu>
>> CC: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org,
>>          Paul Eggert <eggert <at> cs.ucla.edu>
>>
>> After that there were many compile failures with errors like those that
>> others have reported:
>>
>> Compiling gnus/gnus-cache.el
>> GLib (gthread-posix.c): Unexpected error from C library during
>> 'pthread_setspecific': Invalid argument.  Aborting.
>> Makefile:254: recipe for target `gnus/gnus-cache.elc' failed
>>
>> But these compilations didn't invoke gdb, apparently because they
>> involved Makefile targets other than compile-onefile.
>
> No, I think these failures didn't go through 'abort', that's why you
> didn't get the backtrace.  You need to look at the pthread sources in
> the file mentioned, and find out where to put the breakpoint to catch
> that error.

The error message comes from 'g_thread_abort', which calls 'abort'.  The 
reason there was no backtrace is exactly what I said.  I know that's the 
case because I removed the "@" at the beginning of the Makefile rule so 
that the command would get echoed, but it didn't get echoed in the 
compilation above (and others like it).  On the other hand, it did get 
echoed in the SIGSEGV examples that I mentioned in my previous mail.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 20 Jun 2013 15:55:01 GMT) Full text and rfc822 format available.

Message #152 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 20 Jun 2013 18:54:16 +0300

> Date: Wed, 19 Jun 2013 23:00:36 -0400
> From: Ken Brown <kbrown <at> cornell.edu>
> CC: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
> 
> On 6/19/2013 10:45 PM, Eli Zaretskii wrote:
> >> Date: Wed, 19 Jun 2013 16:24:02 -0400
> >> From: Ken Brown <kbrown <at> cornell.edu>
> >> CC: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org,
> >>          Paul Eggert <eggert <at> cs.ucla.edu>
> >>
> >> After that there were many compile failures with errors like those that
> >> others have reported:
> >>
> >> Compiling gnus/gnus-cache.el
> >> GLib (gthread-posix.c): Unexpected error from C library during
> >> 'pthread_setspecific': Invalid argument.  Aborting.
> >> Makefile:254: recipe for target `gnus/gnus-cache.elc' failed
> >>
> >> But these compilations didn't invoke gdb, apparently because they
> >> involved Makefile targets other than compile-onefile.
> >
> > No, I think these failures didn't go through 'abort', that's why you
> > didn't get the backtrace.  You need to look at the pthread sources in
> > the file mentioned, and find out where to put the breakpoint to catch
> > that error.
> 
> The error message comes from 'g_thread_abort', which calls 'abort'.  The 
> reason there was no backtrace is exactly what I said.  I know that's the 
> case because I removed the "@" at the beginning of the Makefile rule so 
> that the command would get echoed, but it didn't get echoed in the 
> compilation above (and others like it).  On the other hand, it did get 
> echoed in the SIGSEGV examples that I mentioned in my previous mail.

Sorry, I forgot that there's one more rule:

  # An old-fashioned suffix rule, which, according to the GNU Make manual,
  # cannot have prerequisites.
  .el.elc:
	  @echo Compiling $<
	  @# The BIG_STACK_OPTS are only needed to byte-compile the byte-compiler
	  @# files, which is normally done in compile-first, but may also be
	  @# recompiled via this rule.
	  @$(emacs) $(BYTE_COMPILE_FLAGS) \
		  -f batch-byte-compile $<

Instrument it in the same way, and you should be able to catch the
other problems as well.

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 22 Jun 2013 15:15:02 GMT) Full text and rfc822 format available.

Message #155 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org, eggert <at> cs.ucla.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 22 Jun 2013 11:13:37 -0400

[Message part 1 (text/plain, inline)]

On 6/20/2013 11:54 AM, Eli Zaretskii wrote:
> Sorry, I forgot that there's one more rule:
>
>    # An old-fashioned suffix rule, which, according to the GNU Make manual,
>    # cannot have prerequisites.
>    .el.elc:
> 	  @echo Compiling $<
> 	  @# The BIG_STACK_OPTS are only needed to byte-compile the byte-compiler
> 	  @# files, which is normally done in compile-first, but may also be
> 	  @# recompiled via this rule.
> 	  @$(emacs) $(BYTE_COMPILE_FLAGS) \
> 		  -f batch-byte-compile $<
>
> Instrument it in the same way, and you should be able to catch the
> other problems as well.

Thanks.  I've now got some further backtraces, this time involving glib. 
 I'm using glib-2.34.3, which I compiled without optimization.  There 
are two kinds of problems, in addition to the gmalloc.c crashes that I 
mentioned earlier.

1. When the 'abort' breakpoint is hit, the backtrace (abbreviated) looks 
like this:

#0  0x6acdb0f8 in abort from /usr/bin/cygglib-2.0-0.dll
#1  0x6acd6eb0 in g_thread_abort at gthread-posix.c:76
#2  0x6acd7758 in g_private_set at gthread-posix.c:1026
#3  0x6acb9fc2 in g_thread_self at gthread.c:1003
#4  0x6ac93ee3 in g_main_context_iteration at gmain.c:3351
#5  0x6ac956ea in glib_worker_main at gmain.c:5028
#6  0x6acb9d29 in g_thread_proxy at gthread.c:797
#7  0x610ffe1a in pthread::thread_init_wrapper(void*)@4 at 
cygwin/thread.cc:1947
#8  0x6108974c in thread_wrapper at cygwin/miscfuncs.cc:600

A full backtrace of all threads is attached.

2. There is sometimes a SIGSEGV in g_slist_remove (in the main thread), 
with an abbreviated backtrace like this:

#0  0x6acb06dc in g_slist_remove at gslist.c:418
#1  0x6ac951f8 in g_child_watch_finalize at gmain.c:4580
#2  0x6ac928fa in g_source_unref_internal at gmain.c:1825
#3  0x6ac929fc in g_source_unref at gmain.c:1870
#4  0x005b371e in init_process_emacs at process.c:7100
#5  0x004ee1fc in main at emacs.c:1471

Again, a full backtrace of all threads is attached.

I don't know enough about glib to do much with this.  Jan or Paul, can 
you help?  Jan, if you want to install my unoptimized build of glib with 
debugging symbols, you can get it from my personal Cygwin repository:

  http://sanibeltranquility.com/cygwin/

There are instructions on that page.  You would need to install 
glib2.0-debuginfo, libglib2.0-devel, and libglib2.0_0.  You also need 
cygwin-debuginfo (from a regular Cygwin mirror).

Ken

[backtrace_abort.txt (text/plain, attachment)]

[backtrace_segv.txt (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 22 Jun 2013 19:05:02 GMT) Full text and rfc822 format available.

Message #158 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 22 Jun 2013 12:04:13 -0700

On 06/22/13 08:13, Ken Brown wrote:
> Jan or Paul, can you help?

The second trace suggests there's a race condition bug in Cygwin glib,
which I've tried to work around in trunk bzr 113138.  Does that help?
(At least, does it make Emacs crash more reliably?....)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sat, 22 Jun 2013 20:51:02 GMT) Full text and rfc822 format available.

Message #161 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sat, 22 Jun 2013 22:49:47 +0200

Paul Eggert wrote:
> The second trace suggests there's a race condition bug in Cygwin glib,
> which I've tried to work around in trunk bzr 113138.  Does that help?

No.

> (At least, does it make Emacs crash more reliably?....)

No.


Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 23 Jun 2013 15:58:02 GMT) Full text and rfc822 format available.

Message #164 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Eli Zaretskii <eliz <at> gnu.org>, jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 23 Jun 2013 11:56:13 -0400

On 6/22/2013 3:04 PM, Paul Eggert wrote:
> On 06/22/13 08:13, Ken Brown wrote:
>> Jan or Paul, can you help?
>
> The second trace suggests there's a race condition bug in Cygwin glib,
> which I've tried to work around in trunk bzr 113138.  Does that help?
> (At least, does it make Emacs crash more reliably?....)

As Angelo said, the answer to both questions is "No".  But there's 
actually a big difference.  First, the 'abort' breakpoint is never hit. 
 Second, there are no more SEGVs reported in glib.  There are lots of 
SEGVs in _malloc_internal_nolock, and some further SEGVs that gdb can't 
pinpoint.  I think all of the crashes in _malloc_internal_nolock are 
coming when it's called by special_realloc.  Maybe the latter needs to 
be calling _malloc_internal instead of _malloc_inernal_nolock.  I don't 
have any more time right now, but I'll try that later.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 23 Jun 2013 16:14:02 GMT) Full text and rfc822 format available.

Message #167 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org, jan.h.d <at> swipnet.se
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 23 Jun 2013 19:13:33 +0300

> Date: Sun, 23 Jun 2013 11:56:13 -0400
> From: Ken Brown <kbrown <at> cornell.edu>
> CC: Eli Zaretskii <eliz <at> gnu.org>, jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org
> 
> As Angelo said, the answer to both questions is "No".  But there's 
> actually a big difference.  First, the 'abort' breakpoint is never hit. 
>   Second, there are no more SEGVs reported in glib.  There are lots of 
> SEGVs in _malloc_internal_nolock, and some further SEGVs that gdb can't 
> pinpoint.  I think all of the crashes in _malloc_internal_nolock are 
> coming when it's called by special_realloc.  Maybe the latter needs to 
> be calling _malloc_internal instead of _malloc_inernal_nolock.  I don't 
> have any more time right now, but I'll try that later.

If the crashes in gmalloc don't happen unless glib is linked in, then
it's possible that its memory management conflicts in some way with
gmalloc (and possibly ralloc, if Cygwin uses that as well).

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 23 Jun 2013 18:23:02 GMT) Full text and rfc822 format available.

Message #170 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: jan.h.d <at> swipnet.se, 14569 <at> debbugs.gnu.org, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 23 Jun 2013 11:22:13 -0700

On 06/23/2013 09:13 AM, Eli Zaretskii wrote:
> If the crashes in gmalloc don't happen unless glib is linked in, then
> it's possible that its memory management conflicts in some way with
> gmalloc (and possibly ralloc, if Cygwin uses that as well).

Thanks, I think that's the problem: the new code invokes glib
primitives before the memory allocator is set up on Cygwin, which is a
no-no.  I moved the glib SIGCHLD tickling to later, in trunk bzr 113142;
does that help?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 23 Jun 2013 19:51:02 GMT) Full text and rfc822 format available.

Message #173 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 23 Jun 2013 21:49:29 +0200

Paul Eggert wrote:
> On 06/23/2013 09:13 AM, Eli Zaretskii wrote:
>> If the crashes in gmalloc don't happen unless glib is linked in, then
>> it's possible that its memory management conflicts in some way with
>> gmalloc (and possibly ralloc, if Cygwin uses that as well).
>
> Thanks, I think that's the problem: the new code invokes glib
> primitives before the memory allocator is set up on Cygwin, which is a
> no-no.  I moved the glib SIGCHLD tickling to later, in trunk bzr 113142;
> does that help?

No. Rev. 113146 fails in similar manner:

GLib (gthread-posix.c): Unexpected error from C library during 
'pthread_setspecific': Invalid argument.  Aborting.
Makefile:232: recipe for target `compile-onefile' failed
make[3]: *** [compile-onefile] Aborted

Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Sun, 23 Jun 2013 20:48:02 GMT) Full text and rfc822 format available.

Message #176 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 23 Jun 2013 22:47:23 +0200

Il 23/06/2013 21.49, Angelo Graziosi ha scritto:
> Paul Eggert wrote:
>> On 06/23/2013 09:13 AM, Eli Zaretskii wrote:
>>> If the crashes in gmalloc don't happen unless glib is linked in, then
>>> it's possible that its memory management conflicts in some way with
>>> gmalloc (and possibly ralloc, if Cygwin uses that as well).
>>
>> Thanks, I think that's the problem: the new code invokes glib
>> primitives before the memory allocator is set up on Cygwin, which is a
>> no-no.  I moved the glib SIGCHLD tickling to later, in trunk bzr 113142;
>> does that help?
>
> No. Rev. 113146 fails in similar manner:
>
> GLib (gthread-posix.c): Unexpected error from C library during
> 'pthread_setspecific': Invalid argument.  Aborting.
> Makefile:232: recipe for target `compile-onefile' failed
> make[3]: *** [compile-onefile] Aborted

Sometimes the bootstrap hangs:

[...]
make[3]: ingresso nella directory "work/emacs/Work/lisp"
Compiling work/emacs/src/../lisp/emacs-lisp/map-ynp.el
make[3]: uscita dalla directory "work/emacs/Work/lisp"
make[3]: ingresso nella directory "work/emacs/Work/lisp"
Compiling work/emacs/src/../lisp/cus-start.el
WAIT WAIT WAIT...
IT HANGS

and I cannot break it with CTRL-C. To kill the "bootstrap-emacs" process 
I need to kill it with Windows task manager. The Cygwin "kill -9..." 
command doesn't help... it helps only to kill some "make" processes...

   Angelo

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 00:35:02 GMT) Full text and rfc822 format available.

Message #179 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Sun, 23 Jun 2013 17:34:02 -0700

On 06/23/2013 12:49 PM, Angelo Graziosi wrote:
> No. Rev. 113146 fails in similar manner:

I tried something more-conservative, as trunk bzr 113148;
can you please give it a try?  If it doesn't work, please
try commenting out this line in src/process.c.

    g_source_unref (source);

And if that doesn't work, please try commenting out
the previous line as well:

    GSource *source = g_child_watch_source_new (getpid ());

Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 02:44:02 GMT) Full text and rfc822 format available.

Message #182 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org, kbrown <at> cornell.edu
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 05:43:27 +0300

> Date: Sun, 23 Jun 2013 22:47:23 +0200
> From: Angelo Graziosi <angelo.graziosi <at> alice.it>
> Cc: eggert <at> cs.ucla.edu, 14569 <at> debbugs.gnu.org
> 
> make[3]: ingresso nella directory "work/emacs/Work/lisp"
> Compiling work/emacs/src/../lisp/emacs-lisp/map-ynp.el
> make[3]: uscita dalla directory "work/emacs/Work/lisp"
> make[3]: ingresso nella directory "work/emacs/Work/lisp"
> Compiling work/emacs/src/../lisp/cus-start.el
> WAIT WAIT WAIT...
> IT HANGS
> 
> and I cannot break it with CTRL-C. To kill the "bootstrap-emacs" process 
> I need to kill it with Windows task manager. The Cygwin "kill -9..." 
> command doesn't help... it helps only to kill some "make" processes...

When it hangs, try attaching GDB to Emacs, and if that succeeds, show
a backtrace from all the threads ("thread apply all bt").

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 11:05:01 GMT) Full text and rfc822 format available.

Message #185 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 07:02:50 -0400

On 6/23/2013 8:34 PM, Paul Eggert wrote:
> I tried something more-conservative, as trunk bzr 113148;
> can you please give it a try?

That fixed it for me.  I just ran two consecutive bootstraps without a 
problem.  Angelo, can you confirm?

Thanks, Paul.

One question: Can you explain why you think there's a race condition bug 
in Cygwin glib?  I should probably report this to the Cygwin glib 
maintainer, but I need to understand it first.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 14:35:02 GMT) Full text and rfc822 format available.

Message #188 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 07:34:27 -0700

On 06/24/2013 04:02 AM, Ken Brown wrote:
> Can you explain why you think there's a race condition bug in Cygwin glib?

It clearly has that feel.  The problem is that if one does this:

  g_source_unref (g_child_watch_source_new (getpid ());

at the obvious spot in Emacs startup, just as glib is
spinning worker threads that do stuff, Emacs goes
kaflooey.  But if one waits until the worker threads
have stabilized (which is what the latest patch does),
it's OK.

It could be a bug in Emacs too.  Emacs's memory allocator
isn't thread-safe, right?  And glib uses threads.  Which
memory allocator is the Cygwin port using, exactly?  If
it's using Emacs's allocator (i.e., compiling gmalloc.c),
that's a bug in Emacs.  If it's using Cygwin's, it's more
likely a bug either in the Cygwin allocator or in glib.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 15:54:02 GMT) Full text and rfc822 format available.

Message #191 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 11:51:54 -0400

On 6/24/2013 10:34 AM, Paul Eggert wrote:
> On 06/24/2013 04:02 AM, Ken Brown wrote:
>> Can you explain why you think there's a race condition bug in Cygwin glib?
>
> It clearly has that feel.  The problem is that if one does this:
>
>    g_source_unref (g_child_watch_source_new (getpid ());
>
> at the obvious spot in Emacs startup, just as glib is
> spinning worker threads that do stuff, Emacs goes
> kaflooey.  But if one waits until the worker threads
> have stabilized (which is what the latest patch does),
> it's OK.
>
> It could be a bug in Emacs too.  Emacs's memory allocator
> isn't thread-safe, right?  And glib uses threads.  Which
> memory allocator is the Cygwin port using, exactly?  If
> it's using Emacs's allocator (i.e., compiling gmalloc.c),
> that's a bug in Emacs.  If it's using Cygwin's, it's more
> likely a bug either in the Cygwin allocator or in glib.

The Cygwin port uses Emacs's allocator.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 16:56:02 GMT) Full text and rfc822 format available.

Message #194 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: "14569-done <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 09:55:10 -0700

On 06/24/13 08:51, Ken Brown wrote:
> The Cygwin port uses Emacs's allocator.

I just looked at Emacs's allocator, and I was wrong:
it tries to be thread-safe, if HAVE_PTHREAD is defined.
Is it defined on Cygwin?  If not, that's a bug; if so,
possibly there is still an incompatibility between
Cygwin threading and Emacs's allocator, but it'll
require some Cygwin expertise to debug.

At any rate *this* bug is now fixed, so I'm marking it
as done.  If the Cygwin port has strange memory-related
problems in the future, you might try the following patch (not
that it's necessarily the right thing...).

=== modified file 'configure.ac'
--- configure.ac	2013-06-24 14:27:25 +0000
+++ configure.ac	2013-06-24 16:53:54 +0000
@@ -1805,7 +1805,7 @@ dnl See comments in aix4-2.h about maybe
 system_malloc=no
 case "$opsys" in
   ## darwin ld insists on the use of malloc routines in the System framework.
-  darwin|sol2-10) system_malloc=yes ;;
+  cygwin|darwin|sol2-10) system_malloc=yes ;;
 esac
 
 if test "${system_malloc}" = "yes"; then

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 17:44:02 GMT) Full text and rfc822 format available.

Message #197 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569-done <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 13:16:37 -0400

On 6/24/2013 12:55 PM, Paul Eggert wrote:
> On 06/24/13 08:51, Ken Brown wrote:
>> The Cygwin port uses Emacs's allocator.
>
> I just looked at Emacs's allocator, and I was wrong:
> it tries to be thread-safe, if HAVE_PTHREAD is defined.
> Is it defined on Cygwin?  If not, that's a bug; if so,
> possibly there is still an incompatibility between
> Cygwin threading and Emacs's allocator, but it'll
> require some Cygwin expertise to debug.

Yes, HAVE_PTHREAD is defined on Cygwin.

> At any rate *this* bug is now fixed, so I'm marking it
> as done.  If the Cygwin port has strange memory-related
> problems in the future, you might try the following patch (not
> that it's necessarily the right thing...).
>
> === modified file 'configure.ac'
> --- configure.ac	2013-06-24 14:27:25 +0000
> +++ configure.ac	2013-06-24 16:53:54 +0000
> @@ -1805,7 +1805,7 @@ dnl See comments in aix4-2.h about maybe
>   system_malloc=no
>   case "$opsys" in
>     ## darwin ld insists on the use of malloc routines in the System framework.
> -  darwin|sol2-10) system_malloc=yes ;;
> +  cygwin|darwin|sol2-10) system_malloc=yes ;;
>   esac
>
>   if test "${system_malloc}" = "yes"; then

I've tried this, but it doesn't work.  I would be glad to switch to 
Cygwin's malloc if I could, but I haven't been able to figure out how.  See

  http://debbugs.gnu.org/cgi/bugreport.cgi?bug=11519#71

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 17:52:02 GMT) Full text and rfc822 format available.

Message #200 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 19:50:13 +0200

Il 24/06/2013 13.02, Ken Brown ha scritto:
> On 6/23/2013 8:34 PM, Paul Eggert wrote:
>> I tried something more-conservative, as trunk bzr 113148;
>> can you please give it a try?
>
> That fixed it for me.  I just ran two consecutive bootstraps without a
> problem.  Angelo, can you confirm?

I would say: No.

3 times the build was completed and the last it failed.

In each case, the build log shows errors.

A short summary.

1. Build with cygwin snapshot 20130619 and GCC 4.5.3 without parallel 
build (make bootstrap). The build is completed but the log shows:
================================================================

[...]
Compiling /work/emacs/lisp/org/ob-comint.el
      2 [main] emacs 2692 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - WFSO timed out after longjmp
    354 [main] emacs 2692 open_stackdumpfile: Dumping stack trace to 
emacs.exe.stackdump
Wrote /work/emacs/lisp/org/ob-comint.elc
Compiling /work/emacs/lisp/org/ob-css.el
Wrote /work/emacs/lisp/org/ob-css.elc
Compiling /work/emacs/lisp/org/ob-ditaa.el
Wrote /work/emacs/lisp/org/ob-ditaa.elc
Compiling /work/emacs/lisp/org/ob-dot.el
      2 [main] emacs 2408 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - WFSO timed out after longjmp
    318 [main] emacs 2408 open_stackdumpfile: Dumping stack trace to 
emacs.exe.stackdump
Wrote /work/emacs/lisp/org/ob-dot.elc
Compiling /work/emacs/lisp/org/ob-emacs-lisp.el
Wrote /work/emacs/lisp/org/ob-emacs-lisp.elc
[...]


2. Build with cygwin 1.17.20 and clang-3.1 with parallel build (make -j3 
bootstrap). The build is completed but the log shows:
================================================================

[...]
GLib (gthread-posix.c): Unexpected error from C library during 
'pthread_setspecific': Invalid argument.  Aborting.
Makefile:251: recipe for target `erc/erc-imenu.elc' failed
make[3]: *** [erc/erc-imenu.elc] Aborted
make[3]: *** Attesa per i processi non terminati....
Wrote /work/emacs/lisp/erc/erc-identd.elc
Wrote /work/emacs/lisp/erc/erc-join.elc

(Notice: the GLib error and the completed build)


3. Build with cygwin 1.17.20 and GCC 4.5.3 without parallel build (make 
bootstrap). The build is completed but the log shows:
================================================================

[...]
Compiling /work/emacs/lisp/cedet/semantic/decorate/mode.el
      3 [main] emacs 1136 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - WFSO timed out after longjmp
    402 [main] emacs 1136 open_stackdumpfile: Dumping stack trace to 
emacs.exe.stackdump

In end of data:
../../lisp/cedet/semantic/decorate/mode.el:560:1:Warning: the following
[...]
Compiling /work/emacs/lisp/net/soap-client.el
      3 [main] emacs 1136 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
Wrote /work/emacs/lisp/net/soap-client.elc
[...]
Compiling /work/emacs/lisp/net/tramp-sh.el
      3 [main] emacs 2512 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
      3 [main] emacs 1516 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
      3 [main] emacs 3572 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
      3 [main] emacs 2828 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
      3 [main] emacs 1520 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
Wrote /work/emacs/lisp/net/tramp-sh.elc
Compiling /work/emacs/lisp/net/tramp-smb.el
[...]


4. Build with cygwin 1.17.20 and GCC 4.5.3 with parallel build (make -j3 
bootstrap). The build failed:
================================================================

[...]
No MH variant found on the system
Wrote /work/emacs/lisp/gnus/gnus-logic.elc
      2 [main] emacs 2436 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
Compiling /work/emacs/lisp/gnus/gnus-mlspl.el
      2 [main] emacs 1320 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
      2 [main] emacs 2356 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
Wrote /work/emacs/lisp/gnus/gnus-mh.elc
Compiling /work/emacs/lisp/gnus/gnus-msg.el
      2 [main] emacs 3032 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - pthread_mutex::_fixup_after_fork () doesn't 
understand PROCESS_SHARED mutex's
Wrote /work/emacs/lisp/gnus/gnus-mlspl.elc
[...]
Wrote /work/emacs/lisp/mail/uce.elc
Compiling /work/emacs/lisp/mail/uudecode.el
    590 [main] emacs 820 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - WFSO timed out after longjmp
    928 [main] emacs 820 open_stackdumpfile: Dumping stack trace to 
emacs.exe.stackdump
Wrote /work/emacs/lisp/mail/unrmail.elc
[...]
Wrote /work/emacs/lisp/org/org-indent.elc
Compiling /work/emacs/lisp/org/org-inlinetask.el
Compiling /work/emacs/lisp/org/org-irc.el
      2 [main] emacs 340 
C:\cygwin-2\home\angelo\work\emacs\Work\src\emacs.exe: *** fatal error 
in forked process - WFSO timed out after longjmp
    349 [main] emacs 340 open_stackdumpfile: Dumping stack trace to 
emacs.exe.stackdump
Wrote /work/emacs/lisp/org/org-info.elc
[...]
Compiling /work/emacs/lisp/progmodes/pascal.el
Wrote /work/emacs/lisp/progmodes/opascal.elc
GLib (gthread-posix.c): Unexpected error from C library during 
'pthread_setspecific': Invalid argument.  Aborting.
Makefile:251: recipe for target `progmodes/octave.elc' failed
make[3]: *** [progmodes/octave.elc] Aborted (creato dump del core)
make[3]: *** Attesa per i processi non terminati....
Wrote /work/emacs/lisp/progmodes/pascal.elc
[...]
Wrote /work/emacs/lisp/textmodes/flyspell.elc
Compiling /work/emacs/lisp/textmodes/page-ext.el
GLib (gthread-posix.c): Unexpected error from C library during 
'pthread_setspecific': Invalid argument.  Aborting.
Wrote /work/emacs/lisp/textmodes/nroff-mode.elc
Makefile:251: recipe for target `textmodes/makeinfo.elc' failed
make[3]: *** [textmodes/makeinfo.elc] Aborted (creato dump del core)
make[3]: *** Attesa per i processi non terminati....
Wrote /work/emacs/lisp/textmodes/page-ext.elc
make[3]: uscita dalla directory "/work/emacs/Work/lisp"
Makefile:279: recipe for target `compile-main' failed
make[2]: *** [compile-main] Error 2
make[2]: uscita dalla directory "/work/emacs/Work/lisp"
Makefile:367: recipe for target `lisp' failed
make[1]: *** [lisp] Error 2
make[1]: uscita dalla directory "/work/emacs/Work"
Makefile:1002: recipe for target `bootstrap' failed
make: *** [bootstrap] Error 2


I don't know if we can say that the bug is fixed...

(Here is on Win XP 32 SP3, Athlon 64 X2)

Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 21:23:01 GMT) Full text and rfc822 format available.

Message #203 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 24 Jun 2013 14:05:49 -0700

On 06/24/2013 10:50 AM, Angelo Graziosi wrote:
> I would say: No.

Ouch.  As it happens I didn't successfully mark the
bug as "done", as I thought, so it's still live.

Please try commenting out this line in src/process.c.

    g_source_unref (source);

And if that doesn't work, please try commenting out
the previous line as well:

    GSource *source = g_child_watch_source_new (getpid ());

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 24 Jun 2013 23:51:02 GMT) Full text and rfc822 format available.

Message #206 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 25 Jun 2013 01:50:18 +0200

Il 24/06/2013 23.05, Paul Eggert ha scritto:
> On 06/24/2013 10:50 AM, Angelo Graziosi wrote:
>> I would say: No.
>
> Ouch.  As it happens I didn't successfully mark the
> bug as "done", as I thought, so it's still live.
>
> Please try commenting out this line in src/process.c.
>
>      g_source_unref (source);
>
> And if that doesn't work, please try commenting out
> the previous line as well:
>
>      GSource *source = g_child_watch_source_new (getpid ());
>

Only after applying this 2nd solution, i.e. the patch

$ cat process.c.patch
--- emacs-trunk/src/process.c	2013-06-24 12:28:49.562500000 +0200
+++ emacs/src/process.c	2013-06-25 01:11:52.890625000 +0200
@@ -7085,8 +7085,8 @@
      Do this here, rather than early in Emacs initialization where it
      might make more sense, to try to avoid bugs in Cygwin glib 
(Bug#14569).  */
   {
-    GSource *source = g_child_watch_source_new (getpid ());
-    g_source_unref (source);
+    /*GSource *source = g_child_watch_source_new (getpid ());
+      g_source_unref (source);*/
   }
 #endif

the bootstrap completed *without* errors! (With just the first, the same 
errors shows up in the build log...)

My suggestion is to wait few days before declaring this bug FIXED, so 
that we can do more tests.

Anyway, as you prefer.


Ciao... oops, Good Night,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 25 Jun 2013 13:35:02 GMT) Full text and rfc822 format available.

Message #209 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>,
 "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 25 Jun 2013 09:34:29 -0400

On 6/24/2013 7:50 PM, Angelo Graziosi wrote:
> Only after applying this 2nd solution, i.e. the patch
>
> $ cat process.c.patch
> --- emacs-trunk/src/process.c    2013-06-24 12:28:49.562500000 +0200
> +++ emacs/src/process.c    2013-06-25 01:11:52.890625000 +0200
> @@ -7085,8 +7085,8 @@
>        Do this here, rather than early in Emacs initialization where it
>        might make more sense, to try to avoid bugs in Cygwin glib
> (Bug#14569).  */
>     {
> -    GSource *source = g_child_watch_source_new (getpid ());
> -    g_source_unref (source);
> +    /*GSource *source = g_child_watch_source_new (getpid ());
> +      g_source_unref (source);*/
>     }
>   #endif
>
> the bootstrap completed *without* errors! (With just the first, the same
> errors shows up in the build log...)

My experience is the same.  Thanks for the reminder that it's necessary 
to check the build log for error messages, even when the build appears 
to complete successfully.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 25 Jun 2013 13:57:02 GMT) Full text and rfc822 format available.

Message #212 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>,
 "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 25 Jun 2013 09:55:44 -0400

On 6/25/2013 9:34 AM, Ken Brown wrote:
> On 6/24/2013 7:50 PM, Angelo Graziosi wrote:
>> Only after applying this 2nd solution, i.e. the patch
>>
>> $ cat process.c.patch
>> --- emacs-trunk/src/process.c    2013-06-24 12:28:49.562500000 +0200
>> +++ emacs/src/process.c    2013-06-25 01:11:52.890625000 +0200
>> @@ -7085,8 +7085,8 @@
>>        Do this here, rather than early in Emacs initialization where it
>>        might make more sense, to try to avoid bugs in Cygwin glib
>> (Bug#14569).  */
>>     {
>> -    GSource *source = g_child_watch_source_new (getpid ());
>> -    g_source_unref (source);
>> +    /*GSource *source = g_child_watch_source_new (getpid ());
>> +      g_source_unref (source);*/
>>     }
>>   #endif
>>
>> the bootstrap completed *without* errors! (With just the first, the same
>> errors shows up in the build log...)
>
> My experience is the same.  Thanks for the reminder that it's necessary
> to check the build log for error messages, even when the build appears
> to complete successfully.

Question for Paul: I'm trying to understand the code that led to this 
problem in the first place, and I'm puzzled by the asymmetry between 
block_child_signal and unblock_child_signal.  The first blocks SIGCHLD, 
while the second unblocks *all* signals.  Why is this the right thing to do?

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 25 Jun 2013 14:52:01 GMT) Full text and rfc822 format available.

Message #215 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 25 Jun 2013 07:51:23 -0700

On 06/25/2013 06:55 AM, Ken Brown wrote:
> I'm puzzled by the asymmetry between block_child_signal and unblock_child_signal.  The first blocks SIGCHLD, while the second unblocks *all* signals.  Why is this the right thing to do?
> 
> Ken

I didn't write that code, but here's my guess.
Emacs normally runs with all signals unblocked.
Unblocking everything is the right thing to do,
if there's a bug elsewhere in Emacs that inadvertently
leaves signals blocked.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 25 Jun 2013 15:52:02 GMT) Full text and rfc822 format available.

Message #218 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 25 Jun 2013 11:51:20 -0400

On 6/25/2013 10:51 AM, Paul Eggert wrote:
> On 06/25/2013 06:55 AM, Ken Brown wrote:
>> I'm puzzled by the asymmetry between block_child_signal and unblock_child_signal.  The first blocks SIGCHLD, while the second unblocks *all* signals.  Why is this the right thing to do?
>>
>> Ken
>
> I didn't write that code, but here's my guess.

According to 'bzr log', you introduced those two functions in rev 111081.

> Emacs normally runs with all signals unblocked.
> Unblocking everything is the right thing to do,
> if there's a bug elsewhere in Emacs that inadvertently
> leaves signals blocked.

I still don't get the asymmetry.  By your reasoning, it would seem that 
block_child_signal should set the mask so that only SIGCHLD is blocked. 
 And can unblock_child_signal really be sure that there's no good 
reason for a signal to be blocked?

But I don't want to belabor this.  You know much more about this than I 
do, so if you're sure the existing code is right, I'll drop it.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 25 Jun 2013 16:20:02 GMT) Full text and rfc822 format available.

Message #221 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 25 Jun 2013 09:18:56 -0700

On 06/25/2013 08:51 AM, Ken Brown wrote:
> you introduced those two functions in rev 111081.

Yes, but if I recall correctly they refactored existing
code.  Existing practice in Emacs is inconsistent: sometimes it
unblocks everything, sometimes it reestablishes
the old mask.  As far as I know the difference
never matters, so either is "right".

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 27 Jun 2013 14:57:01 GMT) Full text and rfc822 format available.

Message #224 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 27 Jun 2013 07:56:27 -0700

On 06/24/2013 04:50 PM, Angelo Graziosi wrote:
> the bootstrap completed *without* errors!

OK, thanks, as trunk bzr 113206 I installed a change
to skip the gnulib tickling on Cygwin.

Although this should fix the bootstrap failure, I expect that
this reintroduces a bug into Cygwin Emacs, namely,
Emacs can sometimes lose track of subprocesses and/or kill off
unrelated processes; see Bug#12980 and Bug#8855.
Fixing this will require someone with access to Cygwin
and knowledge of how to debug threads under Cygwin,
neither of which I have.  Since the issue appears only
under Cygwin it could well be a Cygwin bug rather than an
Emacs or glib bug.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 27 Jun 2013 16:45:02 GMT) Full text and rfc822 format available.

Message #227 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 27 Jun 2013 18:44:09 +0200

Il 27/06/2013 16.56, Paul Eggert ha scritto:
> On 06/24/2013 04:50 PM, Angelo Graziosi wrote:
>> the bootstrap completed *without* errors!
>
> OK, thanks, as trunk bzr 113206 I installed a change
> to skip the gnulib tickling on Cygwin.

Now Emacs trunk build OB... :)

>
> Although this should fix the bootstrap failure, I expect that
> this reintroduces a bug into Cygwin Emacs, namely,
> Emacs can sometimes lose track of subprocesses and/or kill off
> unrelated processes; see Bug#12980 and Bug#8855.

In more than six months, only one or two times Emacs, from trunk, closed 
unexpectedly. Usually it is enough stable.

> Fixing this will require someone with access to Cygwin
> and knowledge of how to debug threads under Cygwin,
> neither of which I have.  Since the issue appears only
> under Cygwin it could well be a Cygwin bug rather than an
> Emacs or glib bug.
>

perhaps... Cygwin "Creators" can help... ;-)


Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 27 Jun 2013 17:12:01 GMT) Full text and rfc822 format available.

Message #230 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 27 Jun 2013 18:54:21 +0200

Il 27/06/2013 18.44, Angelo Graziosi ha scritto:
> Il 27/06/2013 16.56, Paul Eggert ha scritto:
>> On 06/24/2013 04:50 PM, Angelo Graziosi wrote:
>>> the bootstrap completed *without* errors!
>>
>> OK, thanks, as trunk bzr 113206 I installed a change
>> to skip the gnulib tickling on Cygwin.
>
> Now Emacs trunk build OB... :)
>
>>
>> Although this should fix the bootstrap failure, I expect that
>> this reintroduces a bug into Cygwin Emacs, namely,
>> Emacs can sometimes lose track of subprocesses and/or kill off
>> unrelated processes; see Bug#12980 and Bug#8855.
>
> In more than six months, only one or two times Emacs, from trunk, closed
> unexpectedly. Usually it is enough stable.
>
>> Fixing this will require someone with access to Cygwin
>> and knowledge of how to debug threads under Cygwin,
>> neither of which I have.  Since the issue appears only
>> under Cygwin it could well be a Cygwin bug rather than an
>> Emacs or glib bug.
>>
>
> perhaps... Cygwin "Creators" can help... ;-)
>

Just a note I forgot before...


I think the best way to start in fixing/clarifying this issue is to 
reproduce it with a simple test case in plain C.

In that case, Cygwin guys will know how to fix it or could clarify what 
is wrong etc.

>
> Ciao,
>   Angelo.
>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 27 Jun 2013 19:34:01 GMT) Full text and rfc822 format available.

Message #233 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 27 Jun 2013 15:32:16 -0400

On 6/27/2013 10:56 AM, Paul Eggert wrote:
> On 06/24/2013 04:50 PM, Angelo Graziosi wrote:
>> the bootstrap completed *without* errors!
>
> OK, thanks, as trunk bzr 113206 I installed a change
> to skip the gnulib tickling on Cygwin.
>
> Although this should fix the bootstrap failure, I expect that
> this reintroduces a bug into Cygwin Emacs, namely,
> Emacs can sometimes lose track of subprocesses and/or kill off
> unrelated processes; see Bug#12980 and Bug#8855.
> Fixing this will require someone with access to Cygwin
> and knowledge of how to debug threads under Cygwin,
> neither of which I have.  Since the issue appears only
> under Cygwin it could well be a Cygwin bug rather than an
> Emacs or glib bug.

Another alternative is to replace

    if (! noninteractive || initialized)

by

    if (! noninteractive)

at least on Cygwin.  That allows the bootstrap to complete without 
errors.  Assuming this doesn't cause other problems, we wouldn't have to 
worry about reintroducing bugs into (interactive) Cygwin Emacs.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 28 Jun 2013 05:17:02 GMT) Full text and rfc822 format available.

Message #236 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Thu, 27 Jun 2013 22:16:23 -0700

On 06/27/2013 09:44 AM, Angelo Graziosi wrote:
> In more than six months, only one or two times Emacs, from trunk,
> closed unexpectedly. Usually it is enough stable.

The remaining bug won't cause Cygwin Emacs to close unexpectedly.
What it'll do, is cause Cygwin Emacs to lose track of
its child processes -- it'll think they've finished,
when they haven't, or vice versa.  And it can cause
Cygwin Emacs to kill innocent-bystander processes.
These problems are rare, but I expect they can occur
under Cygwin, just as they used to occur under
POSIXish sytems before we redid child process handling.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 28 Jun 2013 12:22:01 GMT) Full text and rfc822 format available.

Message #239 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 28 Jun 2013 08:20:34 -0400

On 6/27/2013 3:32 PM, Ken Brown wrote:
> On 6/27/2013 10:56 AM, Paul Eggert wrote:
>> On 06/24/2013 04:50 PM, Angelo Graziosi wrote:
>>> the bootstrap completed *without* errors!
>>
>> OK, thanks, as trunk bzr 113206 I installed a change
>> to skip the gnulib tickling on Cygwin.
>>
>> Although this should fix the bootstrap failure, I expect that
>> this reintroduces a bug into Cygwin Emacs, namely,
>> Emacs can sometimes lose track of subprocesses and/or kill off
>> unrelated processes; see Bug#12980 and Bug#8855.
>> Fixing this will require someone with access to Cygwin
>> and knowledge of how to debug threads under Cygwin,
>> neither of which I have.  Since the issue appears only
>> under Cygwin it could well be a Cygwin bug rather than an
>> Emacs or glib bug.
> 
> Another alternative is to replace
> 
>      if (! noninteractive || initialized)
> 
> by
> 
>      if (! noninteractive)
> 
> at least on Cygwin.  That allows the bootstrap to complete without 
> errors.  Assuming this doesn't cause other problems, we wouldn't have to 
> worry about reintroducing bugs into (interactive) Cygwin Emacs.

Just to be clear, here's what I'm proposing:

=== modified file 'src/process.c'
--- src/process.c       2013-06-27 14:47:52 +0000
+++ src/process.c       2013-06-28 11:30:42 +0000
@@ -7092,18 +7092,23 @@
   inhibit_sentinels = 0;

 #ifndef CANNOT_DUMP
+#ifdef CYGWIN
+  if (! noninteractive)
+#else
   if (! noninteractive || initialized)
 #endif
+#endif
     {
-#if defined HAVE_GLIB && !defined WINDOWSNT && !defined CYGWIN
+#if defined HAVE_GLIB && !defined WINDOWSNT
       /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
         this should always fail, but is enough to initialize glib's
         private SIGCHLD handler, allowing the code below to copy it into
         LIB_CHILD_HANDLER.

-        For some reason tickling causes Cygwin bootstrap to fail, so it's
-        skipped under Cygwin.  FIXME: Skipping the tickling likely causes
-        bugs in subprocess handling under Cygwin (Bug#14569).  */
+        For some reason tickling causes Cygwin bootstrap to fail, so
+        it's done under Cygwin only in the interactive case.  FIXME:
+        Skipping the tickling may cause bugs in subprocess handling
+        under Cygwin in the noninteractive case (Bug#14569).  */
       g_source_unref (g_child_watch_source_new (getpid ()));
 #endif
       catch_child_signal ();


Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 28 Jun 2013 14:51:02 GMT) Full text and rfc822 format available.

Message #242 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 28 Jun 2013 07:50:08 -0700

On 06/28/2013 05:20 AM, Ken Brown wrote:

>  #ifndef CANNOT_DUMP
> +#ifdef CYGWIN
> +  if (! noninteractive)
> +#else
>    if (! noninteractive || initialized)
>  #endif
> +#endif

I'm dubious about this proposal.

If there's an obscure race-condition bug during bootstrapping
that makes Emacs crash, why isn't it plausible that a similar
bug could occur during normal operation?  Bootstrapping is
a more-intense activity that could well be more likely to
trigger races, but isn't it more plausible that the races
could occur at any time?

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 28 Jun 2013 15:30:03 GMT) Full text and rfc822 format available.

Message #245 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 28 Jun 2013 11:29:23 -0400

On 6/28/2013 10:50 AM, Paul Eggert wrote:
> On 06/28/2013 05:20 AM, Ken Brown wrote:
>
>>   #ifndef CANNOT_DUMP
>> +#ifdef CYGWIN
>> +  if (! noninteractive)
>> +#else
>>     if (! noninteractive || initialized)
>>   #endif
>> +#endif
>
> I'm dubious about this proposal.
>
> If there's an obscure race-condition bug during bootstrapping
> that makes Emacs crash, why isn't it plausible that a similar
> bug could occur during normal operation?  Bootstrapping is
> a more-intense activity that could well be more likely to
> trigger races, but isn't it more plausible that the races
> could occur at any time?

I don't know, because I don't know when the race during bootstrapping 
was happening.  If it was happening when emacs was doing the tickling 
(in init_process_emacs), then my suggested change could conceivably 
cause emacs to crash immediately after startup.  Assuming this doesn't 
happen often, I think it's better than having bugs in subprocess handling.

On the other hand, if the race happens when emacs *executes* the glib 
handler (stored in lib_child_handler), then I agree with you that my 
proposal is unacceptable.

I would suggest that we try my proposal but leave the bug open while we 
see how it works.  If people start seeing random crashes, then we'll 
know it was a bad idea and we can revert it.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 28 Jun 2013 16:24:02 GMT) Full text and rfc822 format available.

Message #248 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 28 Jun 2013 18:22:22 +0200

Il 28/06/2013 17.29, Ken Brown ha scritto:
> On 6/28/2013 10:50 AM, Paul Eggert wrote:
>> On 06/28/2013 05:20 AM, Ken Brown wrote:
>>
>>>   #ifndef CANNOT_DUMP
>>> +#ifdef CYGWIN
>>> +  if (! noninteractive)
>>> +#else
>>>     if (! noninteractive || initialized)
>>>   #endif
>>> +#endif
>>
>> I'm dubious about this proposal.
>>
>> If there's an obscure race-condition bug during bootstrapping
>> that makes Emacs crash, why isn't it plausible that a similar
>> bug could occur during normal operation?  Bootstrapping is
>> a more-intense activity that could well be more likely to
>> trigger races, but isn't it more plausible that the races
>> could occur at any time?
>
> I don't know, because I don't know when the race during bootstrapping
> was happening.  If it was happening when emacs was doing the tickling
> (in init_process_emacs), then my suggested change could conceivably
> cause emacs to crash immediately after startup.  Assuming this doesn't
> happen often, I think it's better than having bugs in subprocess handling.
>
> On the other hand, if the race happens when emacs *executes* the glib
> handler (stored in lib_child_handler), then I agree with you that my
> proposal is unacceptable.
>
> I would suggest that we try my proposal but leave the bug open while we
> see how it works.  If people start seeing random crashes, then we'll
> know it was a bad idea and we can revert it.

Just for completeness...

I have bootstrapped r. 113214 with Ken's patch. Emacs has been build 
fine, no errors. I have installed it and after 3 hours it is still 
running...

I would adopt Ken's idea and ping people to use/bootstrap trunk... Let's 
see if we can catch Mobydick...


Ciao,
 Angelo.

(PS. Why Thunderbird refuses 14569 <at> debbugs.gnu.org address in my replay? 
I have always to change manually it in bug-gnu-emacs <at> gnu.org...)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Fri, 28 Jun 2013 21:42:01 GMT) Full text and rfc822 format available.

Message #251 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Fri, 28 Jun 2013 17:40:38 -0400

On 6/28/2013 11:29 AM, Ken Brown wrote:
> I don't know, because I don't know when the race during bootstrapping 
> was happening.  If it was happening when emacs was doing the tickling 
> (in init_process_emacs), then my suggested change could conceivably 
> cause emacs to crash immediately after startup.  Assuming this doesn't 
> happen often, I think it's better than having bugs in subprocess handling.
> 
> On the other hand, if the race happens when emacs *executes* the glib 
> handler (stored in lib_child_handler), then I agree with you that my 
> proposal is unacceptable.

I've done some further testing [*] and determined that the bootstrap failures always occur as a result of the tickling, as I had hoped.  This should mean that, if my patch is applied, the only problem will be a possible random crash right after emacs is started.  The only question is how often this will happen in practice.  I think we can only determine this by applying the patch and asking users to test it.

Ken

[*] I tested this by applying the following patch and then bootstrapping:

=== modified file 'src/process.c'
--- src/process.c       2013-06-27 14:47:52 +0000
+++ src/process.c       2013-06-28 21:30:27 +0000
@@ -7095,7 +7095,7 @@
   if (! noninteractive || initialized)
 #endif
     {
-#if defined HAVE_GLIB && !defined WINDOWSNT && !defined CYGWIN
+#if defined HAVE_GLIB && !defined WINDOWSNT
       /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
         this should always fail, but is enough to initialize glib's
         private SIGCHLD handler, allowing the code below to copy it into
@@ -7105,6 +7105,9 @@
         skipped under Cygwin.  FIXME: Skipping the tickling likely causes
         bugs in subprocess handling under Cygwin (Bug#14569).  */
       g_source_unref (g_child_watch_source_new (getpid ()));
+      fprintf (stderr, "Glib has been tickled.\n");
+      sleep (1);
+      fprintf (stderr, "Calling catch_child_signal.\n");
 #endif
       catch_child_signal ();
     }

Every error that occurred was like the following:

Compiling obsolete/pgg.el
Glib has been tickled.
GLib (gthread-posix.c): Unexpected error from C library during 'pthread_setspecific': Invalid argument.  Aborting.
Makefile:251: recipe for target `obsolete/pgg.elc' failed
make[2]: *** [obsolete/pgg.elc] Aborted

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 11:22:02 GMT) Full text and rfc822 format available.

Message #254 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 07:21:25 -0400

On 6/28/2013 5:40 PM, Ken Brown wrote:
> I've done some further testing [*] and determined that the bootstrap failures always occur as a result of the tickling, as I had hoped.  This should mean that, if my patch is applied, the only problem will be a possible random crash right after emacs is started.  The only question is how often this will happen in practice.  I think we can only determine this by applying the patch and asking users to test it.

Last night I began running a loop in which emacs (patched as I proposed) 
repeatedly starts and then exits after 15 seconds [*].  So far there 
hasn't been a single failure after more than 1300 iterations.  I don't 
know what's different about bootstrapping, but it seems that tickling 
Glib doesn't cause problems on Cygwin in ordinary interactive use of 
Emacs.  (Keep in mind that my previous test, quoted above, showed that 
the failure during bootstrapping always occurred within 1 second after 
Glib got tickled.)

If no one objects, I'll go ahead and apply my patch later today.

Ken

[*] I'm running the following script:

#! /bin/bash
count=0
while true
do
    count=$((count + 1))
    echo "Try $count; starting Emacs."
    if emacs -l test_emacs.el
    then
	echo "Emacs exited normally."
    else
	echo "Emacs exited abnormally."
    fi
    sleep 1
done

test_emacs.el contains the following:

(sit-for 15)
(kill-emacs)

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 12:47:02 GMT) Full text and rfc822 format available.

Message #257 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 14:28:58 +0200

Il 01/07/2013 13.21, Ken Brown ha scritto:

> Last night I began running a loop in which emacs (patched as I proposed)
> repeatedly starts and then exits after 15 seconds [*].  So far there
> hasn't been a single failure after more than 1300 iterations.  I don't
> know what's different about bootstrapping, but it seems that tickling
> Glib doesn't cause problems on Cygwin in ordinary interactive use of
> Emacs.  (Keep in mind that my previous test, quoted above, showed that
> the failure during bootstrapping always occurred within 1 second after
> Glib got tickled.)
>
> If no one objects, I'll go ahead and apply my patch later today.

+1

Regarding this:

http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-06/msg00963.html


shouldn't it flagged, in some manner, to Cygwin ("Creators") list? For 
example,

"Bootstrapping Emacs with this patch fails on Cygwin so and so... but 
not on GNU/Linux... Have you some idea?..."

After all, trying to bootstrap Emacs trunk is not so much work... The 
big work, perhaps, is in understanding the failure, I think..


Ciao
Angelo

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 13:53:02 GMT) Full text and rfc822 format available.

Message #260 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 09:51:23 -0400

On 7/1/2013 8:28 AM, Angelo Graziosi wrote:
> Il 01/07/2013 13.21, Ken Brown ha scritto:
>
>> Last night I began running a loop in which emacs (patched as I proposed)
>> repeatedly starts and then exits after 15 seconds [*].  So far there
>> hasn't been a single failure after more than 1300 iterations.  I don't
>> know what's different about bootstrapping, but it seems that tickling
>> Glib doesn't cause problems on Cygwin in ordinary interactive use of
>> Emacs.  (Keep in mind that my previous test, quoted above, showed that
>> the failure during bootstrapping always occurred within 1 second after
>> Glib got tickled.)
>>
>> If no one objects, I'll go ahead and apply my patch later today.
>
> +1
>
> Regarding this:
>
> http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-06/msg00963.html
>
>
> shouldn't it flagged, in some manner, to Cygwin ("Creators") list? For
> example,
>
> "Bootstrapping Emacs with this patch fails on Cygwin so and so... but
> not on GNU/Linux... Have you some idea?..."
>
> After all, trying to bootstrap Emacs trunk is not so much work... The
> big work, perhaps, is in understanding the failure, I think..

Yes, I agree in principle, but I'm not yet sure it's a Cygwin bug, and I 
haven't been able to come up with a simple test case that exhibits the 
problem.  My naive attempt didn't work:

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 14:06:02 GMT) Full text and rfc822 format available.

Message #263 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 10:04:00 -0400

On 7/1/2013 9:51 AM, Ken Brown wrote:
> On 7/1/2013 8:28 AM, Angelo Graziosi wrote:
>> Il 01/07/2013 13.21, Ken Brown ha scritto:
>>
>>> Last night I began running a loop in which emacs (patched as I proposed)
>>> repeatedly starts and then exits after 15 seconds [*].  So far there
>>> hasn't been a single failure after more than 1300 iterations.  I don't
>>> know what's different about bootstrapping, but it seems that tickling
>>> Glib doesn't cause problems on Cygwin in ordinary interactive use of
>>> Emacs.  (Keep in mind that my previous test, quoted above, showed that
>>> the failure during bootstrapping always occurred within 1 second after
>>> Glib got tickled.)
>>>
>>> If no one objects, I'll go ahead and apply my patch later today.
>>
>> +1
>>
>> Regarding this:
>>
>> http://lists.gnu.org/archive/html/bug-gnu-emacs/2013-06/msg00963.html
>>
>>
>> shouldn't it flagged, in some manner, to Cygwin ("Creators") list? For
>> example,
>>
>> "Bootstrapping Emacs with this patch fails on Cygwin so and so... but
>> not on GNU/Linux... Have you some idea?..."
>>
>> After all, trying to bootstrap Emacs trunk is not so much work... The
>> big work, perhaps, is in understanding the failure, I think..

[Sorry, I accidentally sent an unfinished reply.  I'll restart.]

Yes, I agree in principle, but I'm not yet sure it's a Cygwin bug, and I 
haven't been able to come up with a simple test case that exhibits the 
problem.  My naive attempt didn't work: I wrote a little C program that 
tickled Glib exactly as in process.c.  I ran it thousands of times 
without an error.  So I have to try to figure out what's different 
during bootstrapping.  And Emacs's gmalloc.c may be part or all of the 
problem too.  That's not compiled on GNU/Linux.

I'd like to keep trying to track this down for a while before asking on 
the Cygwin list.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 14:20:02 GMT) Full text and rfc822 format available.

Message #266 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 07:19:27 -0700

On 07/01/2013 04:21 AM, Ken Brown wrote:
> 
> Last night I began running a loop in which emacs (patched as I proposed) repeatedly starts and then
> exits after 15 seconds [*].  So far there hasn't been a single failure after more than 1300 iterations.

I wouldn't expect your test case to exercise the bug.
The bug occurs when Gtk or Glib activity is occurring
in some other thread at the same time that Emacs is
running.  To reproduce the bug, one must have a
race condition like that.  In your test case Emacs
is idle, so it's unlikely to exhibit the bug.

A couple more things.  Since the bug comes into play
only when glib is tickled, shouldn't the Cygwin case
suppress only the tickling, not the catching of child
signals?

Also, wouldn't it be better to give Cygwin maintainers
an easy way to reproduce the bug, say by compiling
with a special flag?

So, how about the following patch instead?

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2013-06-30 22:29:23 +0000
+++ src/ChangeLog	2013-07-01 14:17:45 +0000
@@ -1,3 +1,10 @@
+2013-07-01  Paul Eggert  <eggert <at> cs.ucla.edu>
+
+	Tickle glib when debugging under Cygwin (Bug#14569).
+	* process.c (init_process_emacs) [CYGWIN && TICKLE_GLIB_BUGFIX]:
+	Tickle glib in this case, too, so that Cygwin maintainers
+	can reproduce the bug more easily.
+
 2013-06-30  Michal Nazarewicz  <mina86 <at> mina86.com>
 
 	* buffer.c (FKill_buffer): Run `kill-buffer-query-functions'

=== modified file 'src/process.c'
--- src/process.c	2013-06-27 14:47:52 +0000
+++ src/process.c	2013-07-01 14:12:31 +0000
@@ -7095,16 +7095,24 @@
   if (! noninteractive || initialized)
 #endif
     {
-#if defined HAVE_GLIB && !defined WINDOWSNT && !defined CYGWIN
+#if defined HAVE_GLIB && !defined WINDOWSNT
       /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
 	 this should always fail, but is enough to initialize glib's
 	 private SIGCHLD handler, allowing the code below to copy it into
 	 LIB_CHILD_HANDLER.
 
-	 For some reason tickling causes Cygwin bootstrap to fail, so it's
-	 skipped under Cygwin.  FIXME: Skipping the tickling likely causes
-	 bugs in subprocess handling under Cygwin (Bug#14569).  */
-      g_source_unref (g_child_watch_source_new (getpid ()));
+	 Under Cygwin as of July 2013, tickling causes bootstrap to fail,
+	 so do it only when Emacs is compiled with -DTICKLE_GLIB_BUGFIX;
+	 this is to help Cygwin maintainers reproduce the bug.
+	 FIXME: Skipping the tickling likely causes bugs in subprocess
+	 handling under Cygwin (Bug#14569).  */
+# if defined CYGWIN && !defined TICKLE_GLIB_BUGFIX
+      bool tickle_glib = 0;
+# else
+      bool tickle_glib = 1;
+# endif
+      if (tickle_glib)
+	g_source_unref (g_child_watch_source_new (getpid ()));
 #endif
       catch_child_signal ();
     }

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 16:19:01 GMT) Full text and rfc822 format available.

Message #269 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 12:16:25 -0400

On 7/1/2013 10:19 AM, Paul Eggert wrote:
> On 07/01/2013 04:21 AM, Ken Brown wrote:
>>
>> Last night I began running a loop in which emacs (patched as I proposed) repeatedly starts and then
>> exits after 15 seconds [*].  So far there hasn't been a single failure after more than 1300 iterations.
>
> I wouldn't expect your test case to exercise the bug.
> The bug occurs when Gtk or Glib activity is occurring
> in some other thread at the same time that Emacs is
> running.  To reproduce the bug, one must have a
> race condition like that.  In your test case Emacs
> is idle, so it's unlikely to exhibit the bug.
>
> A couple more things.  Since the bug comes into play
> only when glib is tickled, shouldn't the Cygwin case
> suppress only the tickling, not the catching of child
> signals?
>
> Also, wouldn't it be better to give Cygwin maintainers
> an easy way to reproduce the bug, say by compiling
> with a special flag?
>
> So, how about the following patch instead?
>
> === modified file 'src/ChangeLog'
> --- src/ChangeLog	2013-06-30 22:29:23 +0000
> +++ src/ChangeLog	2013-07-01 14:17:45 +0000
> @@ -1,3 +1,10 @@
> +2013-07-01  Paul Eggert  <eggert <at> cs.ucla.edu>
> +
> +	Tickle glib when debugging under Cygwin (Bug#14569).
> +	* process.c (init_process_emacs) [CYGWIN && TICKLE_GLIB_BUGFIX]:
> +	Tickle glib in this case, too, so that Cygwin maintainers
> +	can reproduce the bug more easily.
> +
>   2013-06-30  Michal Nazarewicz  <mina86 <at> mina86.com>
>
>   	* buffer.c (FKill_buffer): Run `kill-buffer-query-functions'
>
> === modified file 'src/process.c'
> --- src/process.c	2013-06-27 14:47:52 +0000
> +++ src/process.c	2013-07-01 14:12:31 +0000
> @@ -7095,16 +7095,24 @@
>     if (! noninteractive || initialized)
>   #endif
>       {
> -#if defined HAVE_GLIB && !defined WINDOWSNT && !defined CYGWIN
> +#if defined HAVE_GLIB && !defined WINDOWSNT
>         /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
>   	 this should always fail, but is enough to initialize glib's
>   	 private SIGCHLD handler, allowing the code below to copy it into
>   	 LIB_CHILD_HANDLER.
>
> -	 For some reason tickling causes Cygwin bootstrap to fail, so it's
> -	 skipped under Cygwin.  FIXME: Skipping the tickling likely causes
> -	 bugs in subprocess handling under Cygwin (Bug#14569).  */
> -      g_source_unref (g_child_watch_source_new (getpid ()));
> +	 Under Cygwin as of July 2013, tickling causes bootstrap to fail,
> +	 so do it only when Emacs is compiled with -DTICKLE_GLIB_BUGFIX;
> +	 this is to help Cygwin maintainers reproduce the bug.
> +	 FIXME: Skipping the tickling likely causes bugs in subprocess
> +	 handling under Cygwin (Bug#14569).  */
> +# if defined CYGWIN && !defined TICKLE_GLIB_BUGFIX
> +      bool tickle_glib = 0;
> +# else
> +      bool tickle_glib = 1;
> +# endif
> +      if (tickle_glib)
> +	g_source_unref (g_child_watch_source_new (getpid ()));
>   #endif
>         catch_child_signal ();
>       }

Yes, this looks good.  Please go ahead and apply it.

If it turns out that this really is a Cygwin/Glib bug (and not, say, a 
bug in gmalloc.c), it will be much easier to find the problem if I can 
provide the Cygwin maintainers with a test case in C, independent of 
Emacs.  Is there a simple way to simulate the kind of race condition 
that you think is going on here?

Thanks.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 17:54:01 GMT) Full text and rfc822 format available.

Message #272 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 19:31:37 +0200

Il 01/07/2013 16.19, Paul Eggert ha scritto:
> On 07/01/2013 04:21 AM, Ken Brown wrote:
>>
>> Last night I began running a loop in which emacs (patched as I proposed) repeatedly starts and then
>> exits after 15 seconds [*].  So far there hasn't been a single failure after more than 1300 iterations.
>
> I wouldn't expect your test case to exercise the bug.
> The bug occurs when Gtk or Glib activity is occurring
> in some other thread at the same time that Emacs is
> running.  To reproduce the bug, one must have a
> race condition like that.  In your test case Emacs
> is idle, so it's unlikely to exhibit the bug.
>
> A couple more things.  Since the bug comes into play
> only when glib is tickled, shouldn't the Cygwin case
> suppress only the tickling, not the catching of child
> signals?
>
> Also, wouldn't it be better to give Cygwin maintainers
> an easy way to reproduce the bug, say by compiling
> with a special flag?
>
> So, how about the following patch instead?
>
> === modified file 'src/ChangeLog'
> --- src/ChangeLog	2013-06-30 22:29:23 +0000
> +++ src/ChangeLog	2013-07-01 14:17:45 +0000
> @@ -1,3 +1,10 @@
> +2013-07-01  Paul Eggert  <eggert <at> cs.ucla.edu>
> +
> +	Tickle glib when debugging under Cygwin (Bug#14569).
> +	* process.c (init_process_emacs) [CYGWIN && TICKLE_GLIB_BUGFIX]:
> +	Tickle glib in this case, too, so that Cygwin maintainers
> +	can reproduce the bug more easily.
> +
>   2013-06-30  Michal Nazarewicz  <mina86 <at> mina86.com>
>
>   	* buffer.c (FKill_buffer): Run `kill-buffer-query-functions'
>
> === modified file 'src/process.c'
> --- src/process.c	2013-06-27 14:47:52 +0000
> +++ src/process.c	2013-07-01 14:12:31 +0000
> @@ -7095,16 +7095,24 @@
>     if (! noninteractive || initialized)
>   #endif
>       {
> -#if defined HAVE_GLIB && !defined WINDOWSNT && !defined CYGWIN
> +#if defined HAVE_GLIB && !defined WINDOWSNT
>         /* Tickle glib's child-handling code.  Ask glib to wait for Emacs itself;
>   	 this should always fail, but is enough to initialize glib's
>   	 private SIGCHLD handler, allowing the code below to copy it into
>   	 LIB_CHILD_HANDLER.
>
> -	 For some reason tickling causes Cygwin bootstrap to fail, so it's
> -	 skipped under Cygwin.  FIXME: Skipping the tickling likely causes
> -	 bugs in subprocess handling under Cygwin (Bug#14569).  */
> -      g_source_unref (g_child_watch_source_new (getpid ()));
> +	 Under Cygwin as of July 2013, tickling causes bootstrap to fail,
> +	 so do it only when Emacs is compiled with -DTICKLE_GLIB_BUGFIX;
> +	 this is to help Cygwin maintainers reproduce the bug.
> +	 FIXME: Skipping the tickling likely causes bugs in subprocess
> +	 handling under Cygwin (Bug#14569).  */
> +# if defined CYGWIN && !defined TICKLE_GLIB_BUGFIX
> +      bool tickle_glib = 0;
> +# else
> +      bool tickle_glib = 1;
> +# endif
> +      if (tickle_glib)
> +	g_source_unref (g_child_watch_source_new (getpid ()));
>   #endif
>         catch_child_signal ();
>       }


It looks a nice solution. I have applied the patch and bootstrapped with

  CFLAGS=-DTICKLE_GLIB_BUGFIX ./my_build.sh

and it fails as expected. Instead the bootstrap

  ./my_build.sh

is completed just fine.

This way Cygwin gurus have a possibility to catch Mobydick, if it exists...


Ciao,
 Angelo.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 18:42:02 GMT) Full text and rfc822 format available.

Message #275 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 14:40:55 -0400

I found the bug.  It's that malloc_enable_thread doesn't get called in 
batch mode, because of the following in emacs.c:

#if defined (HAVE_PTHREAD) && !defined (SYSTEM_MALLOC) && !defined 
(DOUG_LEA_MALLOC)
  if (! noninteractive)
    {
      extern void malloc_enable_thread (void);

      malloc_enable_thread ();
    }
#endif

Removing " if (! noninteractive)" solves the problem.  Will it break 
something else?  I have no idea why malloc_enable_thread was only being 
called in the interactive case.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 21:08:01 GMT) Full text and rfc822 format available.

Message #278 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>,
 Angelo Graziosi <angelo.graziosi <at> alice.it>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 14:07:45 -0700

On 07/01/2013 11:40 AM, Ken Brown wrote:
> Removing " if (! noninteractive)" solves the problem.  Will it break something else?

I don't see why it would.  I installed that as part of trunk bzr 113247,
and thanks for finding the underlying fault.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Mon, 01 Jul 2013 21:48:01 GMT) Full text and rfc822 format available.

Message #281 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Angelo Graziosi <angelo.graziosi <at> alice.it>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: bug-emacs <bug-gnu-emacs <at> gnu.org>, Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 23:47:00 +0200

Il 01/07/2013 23.07, Paul Eggert ha scritto:
> On 07/01/2013 11:40 AM, Ken Brown wrote:
>> Removing " if (! noninteractive)" solves the problem.  Will it break something else?
>
> I don't see why it would.  I installed that as part of trunk bzr 113247,
> and thanks for finding the underlying fault.
>

It seems that Mobydick has been catched! Rev. 113247 bootstraps OB... :-)

Many many thanks!

Ciao
Angelo

Reply sent to Ken Brown <kbrown <at> cornell.edu>:
You have taken responsibility. (Mon, 01 Jul 2013 22:42:02 GMT) Full text and rfc822 format available.

Notification sent to Katsumi Yamaoka <yamaoka <at> jpl.org>:
bug acknowledged by developer. (Mon, 01 Jul 2013 22:42:03 GMT) Full text and rfc822 format available.

Message #286 received at 14569-done <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Angelo Graziosi <angelo.graziosi <at> alice.it>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 14569-done <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Mon, 01 Jul 2013 18:41:33 -0400

On 7/1/2013 5:47 PM, Angelo Graziosi wrote:
> Il 01/07/2013 23.07, Paul Eggert ha scritto:
>> On 07/01/2013 11:40 AM, Ken Brown wrote:
>>> Removing " if (! noninteractive)" solves the problem.  Will it break
>>> something else?
>>
>> I don't see why it would.  I installed that as part of trunk bzr 113247,
>> and thanks for finding the underlying fault.
>>
>
> It seems that Mobydick has been catched! Rev. 113247 bootstraps OB... :-)
>
> Many many thanks!

Thanks for confirming.  I'm closing the bug.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 02 Jul 2013 02:21:02 GMT) Full text and rfc822 format available.

Message #289 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 14569 <at> debbugs.gnu.org
Cc: Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 02 Jul 2013 11:19:54 +0900

Ken Brown wrote:
> On 7/1/2013 5:47 PM, Angelo Graziosi wrote:
>> Il 01/07/2013 23.07, Paul Eggert ha scritto:
>>> On 07/01/2013 11:40 AM, Ken Brown wrote:
>>>> Removing " if (! noninteractive)" solves the problem.  Will it break
>>>> something else?
>>>
>>> I don't see why it would.  I installed that as part of trunk bzr 113247,
>>> and thanks for finding the underlying fault.
>>>
>>
>> It seems that Mobydick has been catched! Rev. 113247 bootstraps OB... :-)
>>
>> Many many thanks!

> Thanks for confirming.  I'm closing the bug.

Many many thanks, too!

However, there is still a problem that got to arise recently.
Though I feel like the frequency got decreased.  That is that
Cygwin Emacs sometimes freezes for a couple of ten seconds
unexpectedly.  It happens irregularly while I'm writing something.
I'm not sure it is concerned to, but some asynchronous processes
are running at that time; though I don't believe that my keystroke
for self-insert-command triggers a communication with one of them.
Those are display-time (a timer) and ispell (and ndtp and Sj3).

In addition, how about Bug#14553 ?
<http://thread.gmane.org/gmane.emacs.bugs/74799/focus=74799>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 02 Jul 2013 05:24:02 GMT) Full text and rfc822 format available.

Message #292 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 02 Jul 2013 14:23:18 +0900

Katsumi Yamaoka wrote:
> However, there is still a problem that got to arise recently.
> Though I feel like the frequency got decreased.  That is that
> Cygwin Emacs sometimes freezes for a couple of ten seconds
> unexpectedly.  It happens irregularly while I'm writing something.

Also this sometimes happens:

Memory exhausted--use C-x s then exit and restart Emacs
Error running timer `display-time-event-handler': (error "Memory exhausted--use C-x s then exit and restart Emacs")
Memory exhausted--use C-x s then exit and restart Emacs [3 times]

At that time I can do neither `C-x s' nor exit; what I can do
then is only to kill the Emacs process (so I transcribed the above
messages by hand).  AFAICR, it didn't happen until Jun 2013.

> In addition, how about Bug#14553 ?
> <http://thread.gmane.org/gmane.emacs.bugs/74799/focus=74799>

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 02 Jul 2013 11:23:01 GMT) Full text and rfc822 format available.

Message #295 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Katsumi Yamaoka <yamaoka <at> jpl.org>
Cc: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 02 Jul 2013 07:22:00 -0400

On 7/2/2013 1:23 AM, Katsumi Yamaoka wrote:
> Katsumi Yamaoka wrote:
>> However, there is still a problem that got to arise recently.
>> Though I feel like the frequency got decreased.  That is that
>> Cygwin Emacs sometimes freezes for a couple of ten seconds
>> unexpectedly.  It happens irregularly while I'm writing something.
>
> Also this sometimes happens:
>
> Memory exhausted--use C-x s then exit and restart Emacs
> Error running timer `display-time-event-handler': (error "Memory exhausted--use C-x s then exit and restart Emacs")
> Memory exhausted--use C-x s then exit and restart Emacs [3 times]
>
> At that time I can do neither `C-x s' nor exit; what I can do
> then is only to kill the Emacs process (so I transcribed the above
> messages by hand).  AFAICR, it didn't happen until Jun 2013.

Please make a new bug report about this problem.  AFAIK, it has nothing 
to do with the present bug.  And it would help greatly if you could find 
the first bzr revision that exhibits the problem.

>> In addition, how about Bug#14553 ?
>> <http://thread.gmane.org/gmane.emacs.bugs/74799/focus=74799>

I'll take a look when I get a chance.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Tue, 02 Jul 2013 13:58:03 GMT) Full text and rfc822 format available.

Message #298 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: Ken Brown <kbrown <at> cornell.edu>
Cc: 14569 <at> debbugs.gnu.org
Subject: Re: bug#14569: 24.3.50; bootstrap fails on Cygwin
Date: Tue, 02 Jul 2013 22:57:30 +0900

Ken Brown <kbrown <at> cornell.edu> wrote:
> On 7/2/2013 1:23 AM, Katsumi Yamaoka wrote:
[...]
>> Memory exhausted--use C-x s then exit and restart Emacs
>> Error running timer `display-time-event-handler': (error "Memory
>> exhausted--use C-x s then exit and restart Emacs")
>> Memory exhausted--use C-x s then exit and restart Emacs [3 times]

> Please make a new bug report about this problem.  AFAIK, it has
> nothing to do with the present bug.  And it would help greatly if you
> could find the first bzr revision that exhibits the problem.

I did so.  Thanks.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 04 Jul 2013 00:59:02 GMT) Full text and rfc822 format available.

Message #301 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Ashish SHUKLA <ashish <at> members.fsf.org>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: Emacs segfaulting on FreeBSD 9.1-RELEASE (amd64) during memory
 allocation.
Date: Wed, 03 Jul 2013 17:58:10 -0700

[CC'ing to 14569 <at> debbugs.gnu.org as I think it's related
to that bug....]

On 07/03/2013 09:00 AM, Ashish SHUKLA wrote:

> I tried bzr revision r113270 on my FreeBSD 9.1-RELEASE (amd64), and it
> segfaulted during bootstrap process...
> #v+
> #3367385 0x00000008080742c6 in pthread_mutex_lock () from /lib/libthr.so.3
> #3367386 0x000000000066d846 in _malloc_internal (size=1016) at gmalloc.c:901
> #3367387 0x000000000066d8c1 in malloc (size=1016) at gmalloc.c:925
> #3367388 0x000000000066ec30 in calloc (nmemb=1, size=1016) at gmalloc.c:1492
> #3367389 0x00000008080764bd in ?? () from /lib/libthr.so.3
> #3367390 0x0000000808076d5b in ?? () from /lib/libthr.so.3
> #3367391 0x00000008080742c6 in pthread_mutex_lock () from /lib/libthr.so.3
> #3367392 0x000000000066d846 in _malloc_internal (size=1016) at gmalloc.c:901
> #3367393 0x000000000066d8c1 in malloc (size=1016) at gmalloc.c:925
> #3367394 0x000000000066ec30 in calloc (nmemb=1, size=1016) at gmalloc.c:1492
...

This is no doubt fallout from trunk bzr 113247, needed for
Cygwin.  I installed what I hope fixes the problem for
FreeBSD, without reintroducing the Cygwin bug, as trunk
bzr 113275; please give it a try.

I don't use either Cygwin or FreeBSD, so I'm afraid
I have to rely on others to check these fixes.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 04 Jul 2013 02:14:02 GMT) Full text and rfc822 format available.

Message #304 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Ashish SHUKLA <ashish <at> members.fsf.org>,
 "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: bug#14569: Emacs segfaulting on FreeBSD 9.1-RELEASE (amd64) during
 memory allocation.
Date: Wed, 03 Jul 2013 22:13:00 -0400

On 7/3/2013 8:58 PM, Paul Eggert wrote:
> This is no doubt fallout from trunk bzr 113247, needed for
> Cygwin.  I installed what I hope fixes the problem for
> FreeBSD, without reintroducing the Cygwin bug, as trunk
> bzr 113275; please give it a try.
>
> I don't use either Cygwin or FreeBSD, so I'm afraid
> I have to rely on others to check these fixes.

Cygwin still bootstraps OK after this change.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 04 Jul 2013 02:32:01 GMT) Full text and rfc822 format available.

Message #307 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: wahjava.ml <at> gmail.com
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>, emacs-devel <at> gnu.org
Subject: Re: Emacs segfaulting on FreeBSD 9.1-RELEASE (amd64) during memory
 allocation.
Date: Thu, 04 Jul 2013 07:57:03 +0530

[Message part 1 (text/plain, inline)]

On Wed, 03 Jul 2013 17:58:10 -0700, Paul Eggert <eggert <at> cs.ucla.edu> said:
> [CC'ing to 14569 <at> debbugs.gnu.org as I think it's related
> to that bug....]

> On 07/03/2013 09:00 AM, Ashish SHUKLA wrote:

>> I tried bzr revision r113270 on my FreeBSD 9.1-RELEASE (amd64), and it
>> segfaulted during bootstrap process...
>> #v+
>> #3367385 0x00000008080742c6 in pthread_mutex_lock () from /lib/libthr.so.3
>> #3367386 0x000000000066d846 in _malloc_internal (size=1016) at gmalloc.c:901
>> #3367387 0x000000000066d8c1 in malloc (size=1016) at gmalloc.c:925
>> #3367388 0x000000000066ec30 in calloc (nmemb=1, size=1016) at gmalloc.c:1492
>> #3367389 0x00000008080764bd in ?? () from /lib/libthr.so.3
>> #3367390 0x0000000808076d5b in ?? () from /lib/libthr.so.3
>> #3367391 0x00000008080742c6 in pthread_mutex_lock () from /lib/libthr.so.3
>> #3367392 0x000000000066d846 in _malloc_internal (size=1016) at gmalloc.c:901
>> #3367393 0x000000000066d8c1 in malloc (size=1016) at gmalloc.c:925
>> #3367394 0x000000000066ec30 in calloc (nmemb=1, size=1016) at gmalloc.c:1492
> ...

> This is no doubt fallout from trunk bzr 113247, needed for
> Cygwin.  I installed what I hope fixes the problem for
> FreeBSD, without reintroducing the Cygwin bug, as trunk
> bzr 113275; please give it a try.

> I don't use either Cygwin or FreeBSD, so I'm afraid
> I have to rely on others to check these fixes.

It didn't fix for me. And this time I ran compilation with "memoryuse" limit set
to 128M and "vmemorysize" limit set to 512M, and back trace is not so helpful:

#v+
#1528 0x00000008080742c6 in pthread_mutex_lock () from /lib/libthr.so.3
#1529 0x000000000066d852 in _malloc_internal (size=1016) at gmalloc.c:901
#1530 0x000000000066d8cd in malloc (size=1016) at gmalloc.c:925
#1531 0x000000000066ec3c in calloc (nmemb=1, size=1016) at gmalloc.c:1492
#1532 0x00000008080764bd in ?? () from /lib/libthr.so.3
#1533 0x0000000808076d5b in ?? () from /lib/libthr.so.3
#1534 0x00000008080742c6 in pthread_mutex_lock () from /lib/libthr.so.3
#1535 0x000000000066d852 in _malloc_internal (size=1016) at gmalloc.c:901
#1536 0x000000000066d8cd in malloc (size=1016) at gmalloc.c:925
#1537 0x0000000000000000 in ?? ()
#v-

HTH
-- 
Ashish SHUKLA

“We are not an endangered species ourselves yet, but this is not for lack of
trying.” (Douglas Adams, "Last Chance to See", 1991)

Sent from my Emacs

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 04 Jul 2013 06:24:02 GMT) Full text and rfc822 format available.

Message #310 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: wahjava.ml <at> gmail.com
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: Emacs segfaulting on FreeBSD 9.1-RELEASE (amd64) during memory
 allocation.
Date: Wed, 03 Jul 2013 23:23:50 -0700

On 07/03/2013 07:27 PM, wahjava.ml <at> gmail.com wrote:
> It didn't fix for me.

OK, please try again, with trunk bzr 113278.
Again, I can't easily test this on either Cygwin or FreeBSD.

Also, please don't CC: to emacs-devel <at> gnu.org, as this
is just a bug and is not worth bothering all Emacs developers
about.  CC:ing to 14569 <at> debbugs.gnu.org should suffice.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 04 Jul 2013 11:01:01 GMT) Full text and rfc822 format available.

Message #313 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: Ken Brown <kbrown <at> cornell.edu>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: wahjava.ml <at> gmail.com, "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: bug#14569: Emacs segfaulting on FreeBSD 9.1-RELEASE (amd64) during
 memory allocation.
Date: Thu, 04 Jul 2013 06:59:45 -0400

On 7/4/2013 2:23 AM, Paul Eggert wrote:
> On 07/03/2013 07:27 PM, wahjava.ml <at> gmail.com wrote:
>> It didn't fix for me.
>
> OK, please try again, with trunk bzr 113278.
> Again, I can't easily test this on either Cygwin or FreeBSD.

Still OK on Cygwin.

Ken

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Thu, 04 Jul 2013 19:11:01 GMT) Full text and rfc822 format available.

Message #316 received at 14569 <at> debbugs.gnu.org (full text, mbox):

From: wahjava.ml <at> gmail.com (Ashish SHUKLA)
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: "14569 <at> debbugs.gnu.org" <14569 <at> debbugs.gnu.org>
Subject: Re: Emacs segfaulting on FreeBSD 9.1-RELEASE (amd64) during memory
 allocation.
Date: Fri, 05 Jul 2013 00:39:14 +0530

[Message part 1 (text/plain, inline)]

On Wed, 03 Jul 2013 23:23:50 -0700, Paul Eggert <eggert <at> cs.ucla.edu> said:
> On 07/03/2013 07:27 PM, wahjava.ml <at> gmail.com wrote:
>> It didn't fix for me.

> OK, please try again, with trunk bzr 113278.
> Again, I can't easily test this on either Cygwin or FreeBSD.

r113278 compiles fine on 9.1-RELEASE (amd64), and sending this email from
r113284.

> Also, please don't CC: to emacs-devel <at> gnu.org, as this
> is just a bug and is not worth bothering all Emacs developers
> about.  CC:ing to 14569 <at> debbugs.gnu.org should suffice.

Sorry.

Thanks
-- 
Ashish SHUKLA

“It has been said that a careful reading of Anna Karenina, if it teaches you
nothing else, will teach you how to make strawberry jam.” (Julian Mitchell,
"Radio Times", 30 October 1976)

Sent from my Emacs

[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#14569; Package emacs. (Wed, 17 Jul 2013 06:37:03 GMT) Full text and rfc822 format available.

Message #319 received at 14569-done <at> debbugs.gnu.org (full text, mbox):

From: Katsumi Yamaoka <yamaoka <at> jpl.org>
To: 14766-done <at> debbugs.gnu.org, 14569-done <at> debbugs.gnu.org
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, michael albinus <michael.albinus <at> gmx.de>,
 Ken Brown <kbrown <at> cornell.edu>
Subject: Re: bug#14766: 24.3.50; sometimes "Memory exhausted" on Cygwin
Date: Wed, 17 Jul 2013 15:36:01 +0900

I'm closing these two bugs:

bug#14569: 24.3.50; bootstrap fails on Cygwin
bug#14766: 24.3.50; sometimes "Memory exhausted" on Cygwin

Katsumi Yamaoka wrote:
> Ken Brown wrote:
>> On 7/3/2013 6:01 AM, Katsumi Yamaoka wrote:
>>> Paul Eggert wrote:
>>>> On 07/02/2013 08:24 PM, Ken Brown wrote:
>>>>> Is it possible that gfilenotify doesn't work well with the lucid toolkit?
>>>>> Or perhaps the tickling of glib causes problems when the lucid
>>>>> toolkit is used?
>>>
>>>> It's possible, but lucid isn't multithreaded.
>>>
>>>> What happens if you append --without-file-notification
>>>> to the 'configure' options?  I expect that's what's
>>>> dragging in glib.
>>>
>>> I tried --without-file-notification.  AFAICT no difference presents,
>>> if anything, I feel like the frequency of the freezing is increased.
>>> Emacs links cygglib-2.0-0.dll .

>> What if you also add --without-rsvg?

> Oh, Emacs was built without glib.  It still sometimes freezes,
> though I haven't seen "Memory exhausted" yet so far.  I'll keep
> trying it.  Thanks.

I don't know why, sorry, but Emacs on Cygwin doesn't make the memory
exhausted, it doesn't freeze, and it bootstraps smoothly these days.
`system-configuration-options' I use now is:
	"--verbose --with-x-toolkit=lucid --without-imagemagick\
	--without-dbus --without-gconf --without-gsettings"
I.e., Emacs is built with glib.

Thanks.

As for the derived bug
"Emacs segfaulting on FreeBSD 9.1-RELEASE (amd64) during memory allocation."
that is labeled with bug#14569 (the same), I believe it's been solved:

http://thread.gmane.org/gmane.emacs.devel/161494/focus=75909

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 14 Aug 2013 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 12 years and 136 days ago.

Previous Next

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.

GNU bug report logs - #14569 24.3.50; bootstrap fails on Cygwin

GNU bug report logs - #14569
24.3.50; bootstrap fails on Cygwin