GNU bug report logs - #42140
26.3; sigsegv when using nss-docker

Previous Next

Package: emacs;

Reported by: Hans van den Bogert <hansbogert <at> gmail.com>

Date: Tue, 30 Jun 2020 15:11:07 UTC

Severity: normal

Found in version 26.3

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 42140 in the body.
You can then email your comments to 42140 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#42140; Package emacs. (Tue, 30 Jun 2020 15:11:07 GMT) Full text and rfc822 format available.

Acknowledgement sent to Hans van den Bogert <hansbogert <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Tue, 30 Jun 2020 15:11:07 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Hans van den Bogert <hansbogert <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.3; sigsegv when using nss-docker
Date: Tue, 30 Jun 2020 12:13:24 +0200
[Message part 1 (text/plain, inline)]
Dear Bug squashers,

To reproduce, have 'nss-docker'[1] installed. This library can be added to
nsswitch.conf to intercept .docker host requests.
I have not had other problematic programs icw nss-docker.

Since emacs 26, and most likely due to it's premiered use of
multi-threadedness, a simple `m-x list-packages`, with multiple repos
configured (e.g. gnu, melpa), will crash with sigsegv with high
probability.

I am not well-versed enough in debugging multithreaded emacs to conclude
if this is a problem in emacs or nss-docker. But to iterate, since I
have not encountered this at all with other programs, I'll start at
emacs.

Thanks in advance for any effort,

Hans

[1] https://github.com/dex4er/nss-docker

Starting program: /usr/bin/emacs -u /tmp
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffdb75b700 (LWP 26156)]
[New Thread 0x7fffdaa77700 (LWP 26157)]
[New Thread 0x7fffd9fdc700 (LWP 26158)]
[New Thread 0x7fffd8f51b40 (LWP 26232)]
[New Thread 0x7fffd8cffb40 (LWP 26233)]
NSS DEBUG: Called _nss_debug_gethostbyname4_r with args (name: 
elpa.gnu.org)
NSS DEBUG: Called _nss_debug_gethostbyname4_r with args (name: 
stable.melpa.org)
[New Thread 0x7fffd8f39b40 (LWP 26234)]
_nss_docker_gethostbyname2_r(name="elpa.gnu.org", af=10)
_nss_docker_gethostbyname2_r(name="stable.melpa.org", af=10)
_nss_docker_gethostbyname3_r(name="elpa.gnu.org", af=10)
_nss_docker_gethostbyname2_r(name="elpa.gnu.org", af=2)
NSS DEBUG: Called _nss_debug_gethostbyname4_r with args (name: orgmode.org)
_nss_docker_gethostbyname3_r(name="elpa.gnu.org", af=2)

Thread 6 "emacs" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd8cffb40 (LWP 26233)]
0x00007fffd8aafcd5 in _nss_docker_gethostbyname3_r 
(name=0x2e6f732e312e302d ,
af=2002936162, result=0x6e672d78756e696c, buffer=0x2d34365f3638782f ,
buflen=7091318039310988591, errnop=0x312e6f732e312e, 
herrnop=0x302d77626162696c, ttlp=0x302e6f732e6563, 
canonp=0x697672657373746e)
at libnss_docker.c:72
72 ) {
(gdb) bt full
#0 0x00007fffd8aafcd5 in _nss_docker_gethostbyname3_r 
(name=0x2e6f732e312e302d , af=2002936162, result=0x6e672d78756e696c, 
buffer=0x2d34365f3638782f , buflen=7091318039310988591, 
errnop=0x312e6f732e312e, herrnop=0x302d77626162696c, 
ttlp=0x302e6f732e6563, canonp=0x697672657373746e)
at libnss_docker.c:72
name_len = 3414407380873671541
hostname = 
"86_64-linux-gnu/libX11-xcb.so\000libXxf86vm.so.1\000/usr/lib/x86_64-linux-gnu/libXxf86vm.so.1\000libXxf86vm.so.1\000/usr/lib/i386-linux-gnu/libXxf86vm.so.1\000libXxf86vm.so\000/usr/lib/x86_64-linux-gnu/libXxf86vm.so\000li"...
hostname_suffix_ptr = 0x312e6f732e616162
docker_api_addr =
{sun_family = 12593, sun_path = 
".so.6\000libX11.so.6\000/usr/lib/i386-linux-gnu/libX11.so.6\000libX11.so\000/usr/lib/x86_64-linux-gnu/libX11.so\000libX11-x"}
docker_api_addr_len = 1869819507
buffer_size = 3346019690390575202
buffer_offset = 7795575320214437942
sockfd = 788541486
req_message_buffer = 
"86_64-linux-gnu/libX11-xcb.so.1\000libX11-xcb.so.1\000/usr/lib/i386-linux-gnu/libX11-xcb.so.1\000libX11-xcb.so\000/usr/lib"
req_message_len = 7596498840077020928
res_message_buffer = Python Exception value requires 102400 bytes, which 
is more than max-value-size:
#1 0x00007fffd8ab0518 in _nss_docker_gethostbyname2_r (name=0x3ba8368 
"stable.melpa.org", af=10, result=0x7fffd8cfe7d0, buffer=0x7fffd8cfea40 
"\377\002", buflen=1024, errnop=0x7fffd8cff948, herrnop=0x7fffd8cff9ac) 
at libnss_docker.c:340
#2 0x00007fffebf70f9f in gaih_inet (name=name <at> entry=0x3ba8368 
"stable.melpa.org", service=, req=req <at> entry=0x3ba8338, 
pai=pai <at> entry=0x7fffd8cfe9c8, naddrs=naddrs <at> entry=0x7fffd8cfe9c4, 
tmpbuf=tmpbuf <at> entry=0x7fffd8cfea30) at ../sysdeps/posix/getaddrinfo.c:873
th = {h_name = 0x0, h_aliases = 0x0, h_addrtype = 0, h_length = 0, 
h_addr_list = 0x0}
localcanon = 0x0
fct = 0x7fffd8ab04a4 <_nss_docker_gethostbyname2_r>
fct4 =
pat = 0x7fffd8cfe7b8
no_inet6_data = 0
nip = 0x2c5eb30
status =
no_more = 0
no_data = 0
inet6_status = NSS_STATUS_UNAVAIL
res_ctx = 0x7fffc8000b20
res_enable_inet6 =
tp =
st = 0x7fffd8cfe6f0
at = 0x7fffd8cfe6b0
got_ipv6 = false
canon = 0x0
orig_name = 0x3ba8368 "stable.melpa.org"
alloca_used =
port =
malloc_name = false
addrmem = 0x0
canonbuf = 0x0
result = 0
#3 0x00007fffebf72ce4 in __GI_getaddrinfo (name=, service=, 
hints=0x3ba8338, pai=pai <at> entry=0x3ba8318)
at ../sysdeps/posix/getaddrinfo.c:2300
tmpbuf =
{data = 0x7fffd8cfea40, length = 1024, __space = {__align = 
{__max_align_ll = 767, __max_align_ld = 5.1301383008835767187e-4937}, 
__c = "\377\002", '\000' , 
"\002@\352\317\330\377\177\000\000\000\000\000\000\000\000\000\000ff02::2\000ip6-allrouters", 
'\000' , 
"v\352\317\330\377\177\000\000\000\000\000\000\000\000\000\000ts\n", 
'\000' ...}}
i = 0
last_i = 0
nresults = 0
p = 0x0
gaih_service = {name = 0x3ba8379 "443", num = 443}
pservice =
local_hints =
{ai_flags = 0, ai_family = 0, ai_socktype = 0, ai_protocol = 0, 
ai_addrlen = 0, ai_addr = 0x0, ai_canonname = 0x0, ai_next = 0x0}
in6ai = 0x0
in6ailen = 0
seen_ipv4 = false
seen_ipv6 = false
check_pf_called = false
end = 0x7fffd8cfe9c8
naddrs = 0
__PRETTY_FUNCTION__ = "getaddrinfo"
#4 0x00007fffecb5a058 in handle_requests (arg=) at gai_misc.c:317
req = 0x3ba8300
srchp =
lastp =
runp = 0x3d84690
---Type to continue, or q to quit---xbackq
__PRETTY_FUNCTION__ = "handle_requests"
#5 0x00007fffecd646db in start_thread (arg=0x7fffd8cffb40) at 
pthread_create.c:463
pd = 0x7fffd8cffb40
now =
unwind_buf =
{cancel_jmp_buf = {{jmp_buf = {140736830896960, -2868501273485909582, 
140736830894080, 0, 64505488, 140737488329792, 2868433241719562674, 
2868459701649662386}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 
0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call =
#6 0x00007fffebf8c88f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) xbacktrace
Undefined command: "xbacktrace". Try "help".

In GNU Emacs 26.3 (build 2, x86_64-pc-linux-gnu, GTK+ Version 3.22.30)
of 2019-09-16 built on lcy01-amd64-030
Windowing system distributor 'The X.Org Foundation
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42140; Package emacs. (Tue, 30 Jun 2020 15:41:01 GMT) Full text and rfc822 format available.

Message #8 received at 42140 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Hans van den Bogert <hansbogert <at> gmail.com>
Cc: 42140 <at> debbugs.gnu.org
Subject: Re: bug#42140: 26.3; sigsegv when using nss-docker
Date: Tue, 30 Jun 2020 18:40:14 +0300
> From: Hans van den Bogert <hansbogert <at> gmail.com>
> Date: Tue, 30 Jun 2020 12:13:24 +0200
> 
> Since emacs 26, and most likely due to it's premiered use of 
> multi-threadedness, a simple `m-x list-packages`, with multiple repos 
> configured (e.g. gnu, melpa), will crash with sigsegv with high 
> probability. 
> 
> I am not well-versed enough in debugging multithreaded emacs to conclude 
> if this is a problem in emacs or nss-docker. But to iterate, since I 
> have not encountered this at all with other programs, I'll start at 
> emacs. 

Emacs is not multithreaded.  If you never start any additional Lisp
threads, only one thread ever runs (not counting GTK threads, but
those aren't new in Emacs 26).

The backtrace seems to suggest its a problem in nss-docker, since the
crash is in its code.  Are you sure this is an Emacs problem?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42140; Package emacs. (Tue, 30 Jun 2020 21:19:02 GMT) Full text and rfc822 format available.

Message #11 received at 42140 <at> debbugs.gnu.org (full text, mbox):

From: Hans van den Bogert <hansbogert <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42140 <at> debbugs.gnu.org
Subject: Re: bug#42140: 26.3; sigsegv when using nss-docker
Date: Tue, 30 Jun 2020 22:15:42 +0200

On 6/30/20 5:40 PM, Eli Zaretskii wrote:
> Emacs is not multithreaded.  If you never start any additional Lisp
> threads, only one thread ever runs (not counting GTK threads, but
> those aren't new in Emacs 26).
> 
> The backtrace seems to suggest its a problem in nss-docker, since the
> crash is in its code.  Are you sure this is an Emacs problem?

> Emacs is not multithreaded.
You are right, poor choice of words; concurrent seems to be the proper 
word. The release notes of v26 do note the change to an async network layer:

Release note v26 snippet
--->8---
** The networking code has been reworked so that it's more
asynchronous than it was (when specifying :nowait t in
'make-network-process').  How asynchronous it is varies based on the
capabilities of the system, but on a typical GNU/Linux system the DNS
resolution, the connection, and (for TLS streams) the TLS negotiation
are all done without blocking the main Emacs thread.  To get
asynchronous TLS, the TLS boot parameters have to be passed in (see
the manual for details).
--->8---

> If you never start any additional Lisp
> threads, only one thread ever runs (not counting GTK threads, but
> those aren't new in Emacs 26).

I am an extreme novice wrt to emacs development, but I have to disagree,
in contrast to v25, I can see this async change in the debug prints 
which I added to `_nss_docker_*_r` functions; the order of internal 
method calls can interleave between `_nss_docker_gethostbyname2_r` 
invocations.

Further, Ithink I see 2 threads for 2 name resolves (is this what you 
meant with 'additional lisp threads'?):

```
Thread 7 (Thread 0x7fffd8ce7b40 (LWP 18899)):
#0  0x00007fffd8acecd5 in _nss_docker_gethostbyname3_r (name=Python 
Exception <class 'gdb.MemoryError'> Cannot access memory at address 
0x7fffd8ccd388:
#1  0x00007fffd8acf518 in _nss_docker_gethostbyname2_r (name=0x2d72768 
"orgmode.org", af=10, result=0x7fffd8ce67d0, buffer=0x7fffd8ce6a40 
"\377\002", buflen=1024, errnop=0x7fffd8ce7948, herrnop=0x7fffd8ce79ac)
    at libnss_docker.c:340
#2  0x00007fffebf70f9f in gaih_inet (name=name <at> entry=0x2d72768 
"orgmode.org", service=<optimized out>, req=req <at> entry=0x2d72738, 
pai=pai <at> entry=0x7fffd8ce69c8, naddrs=naddrs <at> entry=0x7fffd8ce69c4, 
tmpbuf=tmpbuf <at> entry=0x7fffd8ce6a30) at ../sysdeps/posix/getaddrinfo.c:873
#3  0x00007fffebf72ce4 in __GI_getaddrinfo (name=<optimized out>, 
service=<optimized out>, hints=0x2d72738, pai=pai <at> entry=0x2d72718) at 
../sysdeps/posix/getaddrinfo.c:2300
#4  0x00007fffecb5a058 in handle_requests (arg=<optimized out>) at 
gai_misc.c:317
#5  0x00007fffecd646db in start_thread (arg=0x7fffd8ce7b40) at 
pthread_create.c:463
#6  0x00007fffebf8c88f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
...
Thread 5 (Thread 0x7fffd8f51b40 (LWP 18897)):
#0  0x00007fffd8acecd5 in _nss_docker_gethostbyname3_r 
(name=0x2e6f732e312e302d <error: Cannot access memory at address 
0x2e6f732e312e302d>, af=2002936162, result=0x6e672d78756e696c, 
buffer=0x2d34365f3638782f <error: Cannot access memory at address 
0x2d34365f3638782f>, buflen=7091318039310988591, 
errnop=0x312e6f732e312e, herrnop=0x302d77626162696c, 
ttlp=0x302e6f732e6563, canonp=0x697672657373746e) at libnss_docker.c:72
#1  0x00007fffd8acf518 in _nss_docker_gethostbyname2_r (name=0x338a068 
"elpa.gnu.org", af=10, result=0x7fffd8f507d0, buffer=0x7fffd8f50a40 
"\377\002", buflen=1024, errnop=0x7fffd8f51948, herrnop=0x7fffd8f519ac) 
at libnss_docker.c:340
#2  0x00007fffebf70f9f in gaih_inet (name=name <at> entry=0x338a068 
"elpa.gnu.org", service=<optimized out>, req=req <at> entry=0x338a038, 
pai=pai <at> entry=0x7fffd8f509c8, naddrs=naddrs <at> entry=0x7fffd8f509c4, 
tmpbuf=tmpbuf <at> entry=0x7fffd8f50a30) at ../sysdeps/posix/getaddrinfo.c:873
#3  0x00007fffebf72ce4 in __GI_getaddrinfo (name=<optimized out>, 
service=<optimized out>, hints=0x338a038, pai=pai <at> entry=0x338a018) at 
../sysdeps/posix/getaddrinfo.c:2300
#4  0x00007fffecb5a058 in handle_requests (arg=<optimized out>) at 
gai_misc.c:317
#5  0x00007fffecd646db in start_thread (arg=0x7fffd8f51b40) at 
pthread_create.c:463
#6  0x00007fffebf8c88f in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```


If someone could help me point out where the libc/nss code is called on 
the emacs side, I can debug this further. Because tbh, I'm having 
difficulty pin-pointing that.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42140; Package emacs. (Wed, 01 Jul 2020 12:40:01 GMT) Full text and rfc822 format available.

Message #14 received at 42140 <at> debbugs.gnu.org (full text, mbox):

From: Hans van den Bogert <hansbogert <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42140 <at> debbugs.gnu.org
Subject: Re: bug#42140: 26.3; sigsegv when using nss-docker
Date: Wed, 1 Jul 2020 14:39:34 +0200
Just for information, I've bisected this to commit

    fdfb68690f Implement asynchronous name resolution


Hans





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42140; Package emacs. (Mon, 06 Jul 2020 06:51:02 GMT) Full text and rfc822 format available.

Message #17 received at 42140 <at> debbugs.gnu.org (full text, mbox):

From: Hans van den Bogert <hansbogert <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 42140 <at> debbugs.gnu.org
Subject: Re: bug#42140: 26.3; sigsegv when using nss-docker
Date: Mon, 6 Jul 2020 08:50:21 +0200
Dear Eli,

Please set this bug to 'invalid'. (could I've done this myself?)

The example in the manpages of `getaddrinfo_a` is enough to trigger this 
locally.
I am at my wit's end though where the real problem lies.

Sorry for the lack of confidence in emacs and for the overhead of this 
unneeded bug report ;)

Regards,

Hans




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Mon, 06 Jul 2020 16:32:01 GMT) Full text and rfc822 format available.

Notification sent to Hans van den Bogert <hansbogert <at> gmail.com>:
bug acknowledged by developer. (Mon, 06 Jul 2020 16:32:01 GMT) Full text and rfc822 format available.

Message #22 received at 42140-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Hans van den Bogert <hansbogert <at> gmail.com>
Cc: 42140-done <at> debbugs.gnu.org
Subject: Re: bug#42140: 26.3; sigsegv when using nss-docker
Date: Mon, 06 Jul 2020 19:31:17 +0300
> Cc: 42140 <at> debbugs.gnu.org
> From: Hans van den Bogert <hansbogert <at> gmail.com>
> Date: Mon, 6 Jul 2020 08:50:21 +0200
> 
> Please set this bug to 'invalid'. (could I've done this myself?)

You can always close a bug by sending email to
NNNN-done <at> debbugs.gnu.org, where NNNN is the bug number.  Like I did
now.

> The example in the manpages of `getaddrinfo_a` is enough to trigger this 
> locally.
> I am at my wit's end though where the real problem lies.

Thanks for telling us.  Could this be a bug with your kernel or the
standard C library?

> Sorry for the lack of confidence in emacs and for the overhead of this 
> unneeded bug report ;)

No need to apologize, it can happen with anyone.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#42140; Package emacs. (Tue, 07 Jul 2020 08:50:02 GMT) Full text and rfc822 format available.

Message #25 received at 42140-done <at> debbugs.gnu.org (full text, mbox):

From: Hans van den Bogert <hansbogert <at> gmail.com>
Cc: 42140-done <at> debbugs.gnu.org
Subject: Re: bug#42140: 26.3; sigsegv when using nss-docker
Date: Tue, 7 Jul 2020 10:49:03 +0200
On 7/6/20 6:31 PM, Eli Zaretskii wrote:
> Thanks for telling us.  Could this be a bug with your kernel or the
> standard C library?

The kernel seems unlikely, the only difference I can see is that 
nss_docker's _nss_docker_gethostbynameX_r seem 'off' on assembly level 
compared to for example, the equivalent functions of `nss_mdns_minimal` 
and libc's `nss_dns`.

The offsets when referencing stack locations on function entry are large 
(0xNNNNN), compared  to the straightforward function entry assembly I 
see in nss_mdns and nss_dns, with 'normal' offsets of 0xNNN. I've 
compared compiler flags and all, but I can't explain it. The weird 
things remains of course, why does the shared library work fine when 
it's called through the non async variant, `gethostbyname`.

But I think the discussion is out of scope for this list/tracker, though 
any pointers are welcome of course.

Regards,




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 04 Aug 2020 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 238 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.