Package: guix;
Reported by: Ludovic Courtès <ludo <at> gnu.org>
Date: Wed, 5 Oct 2022 21:02:01 UTC
Severity: normal
Tags: wontfix
Done: Ludovic Courtès <ludo <at> gnu.org>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 58320 in the body.
You can then email your comments to 58320 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
View this report as an mbox folder, status mbox, maintainer mbox
bug-hurd <at> gnu.org, bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Wed, 05 Oct 2022 21:02:02 GMT) Full text and rfc822 format available.Ludovic Courtès <ludo <at> gnu.org>
:bug-hurd <at> gnu.org, bug-guix <at> gnu.org
.
(Wed, 05 Oct 2022 21:02:02 GMT) Full text and rfc822 format available.Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: bug-guix <at> gnu.org Subject: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Wed, 05 Oct 2022 23:01:39 +0200
Hello! On AMD EPYC processors, as found on the build nodes of ci.guix.gnu.org, childhurd VMs fail to boot when running with ‘qemu-system-i386 -enable-kvm’ (the kvm-amd Linux kernel module is used), with the Hurd startup process hanging before /hurd/exec has been started: --8<---------------cut here---------------start------------->8--- module 0: ext2fs --multiboot-command-line=${kernel-command-line} --host-priv-por t=${host-port} --device-master-port=${device-port} --exec-server-task=${exec-tas k} --store-type=typed --x-xattr-translator-records ${root} $(task-create) $(task -resume) module 1: exec /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167/hu rd/exec $(exec-task=task-create) 2 multiboot modules task loaded: ext2fs --multiboot-command-line=root=device:hd0s1 root=3367134b-cfb d-1e90-2f38-dfd13367134b gnu.system=/gnu/store/m66ccpdzdbcd3k2fdvyaj8cgmk23lybn- system gnu.load=/gnu/store/m66ccpdzdbcd3k2fdvyaj8cgmk23lybn-system/boot --host-p riv-port=1 --device-master-port=2 --exec-server-task=3 --store-type=typed --x-xa ttr-translator-records device:hd0s1 task loaded: exec /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167 /hurd/exec start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] exec --8<---------------cut here---------------end--------------->8--- Kdb shows these two tasks: --8<---------------cut here---------------start------------->8--- Stopped at mach ine_idle+0x26: nop machine_idle(0,c102f380,f5f7bcd0,c102e3fa)+0x26 idle_thread_continue(f5f7ade0,36366000,f5f73868,f5f73890,0)+0x73 >>>>> user space <<<<< db> show all threads TASK THREADS 0 gnumach (f5f7cf00): 7 threads: 0 (f5f7be18) .W..N. 0xc11dac04 1 (f5f7bcd0) R..... 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 1 ext2fs (f5f7ce40): 6 threads: 0 (f5f7b520) .W.O.F(mach_msg_continue) 0 1 (f5f7b290) .W.O.F(mach_msg_receive_continue) 0 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 3 (f5f7b000) .W.O..(mach_msg_continue) 0 4 (f67d3e20) .W.O..(mach_msg_receive_continue) 0 5 (f67d3cd8) .W.O..(mach_msg_continue) 0 db> trace/t 0xf5f7b520 Continuation mach_msg_continue >>>>> user space <<<<< 0x80ccaec() --8<---------------cut here---------------end--------------->8--- For ext2fs.static, that just means thread 0 is here: --8<---------------cut here---------------start------------->8--- $ addr2line -e /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167/hurd/ext2fs.static 0x80ccaec /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/mach/mach_msg_trap.S:2 --8<---------------cut here---------------end--------------->8--- That doesn’t tell us much. The same image boots fine on the same CPU without ‘-enable-kvm’. However, keeping ‘-enable-kvm’ and adding ‘--cpu pentium’ and other variants of this option don’t make any difference, AFAICS. Ideas on how to debug this further, and/or ways to work around it without giving up on KVM? Thanks, Ludo’.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Thu, 06 Oct 2022 13:15:02 GMT) Full text and rfc822 format available.Message #8 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Thu, 06 Oct 2022 15:14:13 +0200
Hi! As suggested by Samuel on IRC, I did that early on in kdb: debug traps /on such that it would stop on each trap, hopefully allowing me to see why exec is not starting. --8<---------------cut here---------------start------------->8--- module 0: ext2fs --multiboot-command-line=${kernel-command-line} --host-priv-por t=${host-port} --device-master-port=${device-port} --exec-server-task=${exec-tas k} --store-type=typed --x-xattr-translator-records ${root} $(task-create) $(task -resume) module 1: exec /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167/hu rd/exec $(exec-task=task-create) 2 multiboot modules task loaded: ext2fs --multiboot-command-line=root=device:hd0s1 root=3367134b-cfb d-1e90-2f38-dfd13367134b gnu.system=/gnu/store/m66ccpdzdbcd3k2fdvyaj8cgmk23lybn- system gnu.load=/gnu/store/m66ccpdzdbcd3k2fdvyaj8cgmk23lybn-system/boot --host-p riv-port=1 --device-master-port=2 --exec-server-task=3 --store-type=typed --x-xa ttr-translator-records device:hd0s1 task loaded: exec /gnu/store/99sqiayswrxxb80331pl7jxin18wv28b-hurd-0.9-1.91a5167 /hurd/exec start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] execkernel: Page fault (14), code=6 Stopped at 0x1000: pushl 0x4(%ebx) >>>>> user space <<<<< 0x1000(bfffff24,0,0,1160b,0) 0x11627(bfffff9c,0,0,0,2) 0x11bb() db> show all threads TASK THREADS 0 gnumach (f5f7cf00): 7 threads: 0 (f5f7be18) .W..N. 0xc11dac04 1 (f5f7bcd0) R..O..(idle_thread_continue) 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 1 ext2fs (f5f7ce40): 6 threads: 0 (f5f7b520) .W.O.F(mach_msg_continue) 0 1 (f5f7b290) .W.O..(mach_msg_receive_continue) 0 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 3 (f5f7b000) .W.O..(mach_msg_continue) 0 4 (f67d4e20) .W.O..(mach_msg_receive_continue) 0 5 (f67d4cd8) .W.O..(mach_msg_continue) 0 2 exec (f5f7cd80): (f5f7b3d8) R..... --8<---------------cut here---------------end--------------->8--- Then lots of page faults with the same stack trace, seemingly endlessly: --8<---------------cut here---------------start------------->8--- db> c kernel: Page fault (14), code=6 Stopped at 0x1000: pushl 0x4(%ebx) >>>>> user space <<<<< 0x1000(bfffff24,0,0,1160b,0) 0x11627(bfffff9c,0,0,0,2) 0x11bb() --8<---------------cut here---------------end--------------->8--- When I “debug traps /off” and continue, the startup process hangs as normal, and at that point ‘show all threads’ no longer shows exec. On a “working” VM, with traps enabled early on in the same way, I don’t see any page fault until after exec, proc, auth, etc. have been started. Thoughts? Ludo’.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Thu, 06 Oct 2022 13:54:02 GMT) Full text and rfc822 format available.Message #11 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Samuel Thibault <samuel.thibault <at> gnu.org> To: Ludovic Courtès <ludo <at> gnu.org> Cc: bug-hurd <at> gnu.org, 58320 <at> debbugs.gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Thu, 6 Oct 2022 15:53:16 +0200
Ludovic Courtès, le jeu. 06 oct. 2022 15:14:13 +0200, a ecrit: > such that it would stop on each trap, hopefully allowing me to see why > exec is not starting. Also, better use exec.static to have static addresses. We use the dynamic version of exec "just because we can", but that makes debugging difficult. Samuel
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Thu, 06 Oct 2022 22:11:02 GMT) Full text and rfc822 format available.Message #14 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Fri, 07 Oct 2022 00:10:15 +0200
Samuel Thibault <samuel.thibault <at> gnu.org> skribis: > Ludovic Courtès, le jeu. 06 oct. 2022 15:14:13 +0200, a ecrit: >> such that it would stop on each trap, hopefully allowing me to see why >> exec is not starting. > > Also, better use exec.static to have static addresses. Thanks for the hint. Of course, the thing boots just fine on that machine when using ‘exec.static’. So the issue might be somewhere in ld.so, or triggered by ld.so. Any debugging tricks here? Ludo’.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Thu, 06 Oct 2022 22:43:01 GMT) Full text and rfc822 format available.Message #17 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Samuel Thibault <samuel.thibault <at> gnu.org> To: Ludovic Courtès <ludo <at> gnu.org> Cc: bug-hurd <at> gnu.org, 58320 <at> debbugs.gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Fri, 7 Oct 2022 00:42:19 +0200
Ludovic Courtès, le ven. 07 oct. 2022 00:10:15 +0200, a ecrit: > Samuel Thibault <samuel.thibault <at> gnu.org> skribis: > > > Ludovic Courtès, le jeu. 06 oct. 2022 15:14:13 +0200, a ecrit: > >> such that it would stop on each trap, hopefully allowing me to see why > >> exec is not starting. > > > > Also, better use exec.static to have static addresses. > > Thanks for the hint. > > Of course, the thing boots just fine on that machine when using > ‘exec.static’. Uh. At least you have a workaround :) > So the issue might be somewhere in ld.so, or triggered by ld.so. > Any debugging tricks here? maybe for a start check with show map $map2 what is actually mapped, whether it's just ld.so, or also exec, etc. Perhaps you can also try to run other programs than exec, like small dumb programs. Samuel
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Fri, 07 Oct 2022 08:25:01 GMT) Full text and rfc822 format available.Message #20 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Fri, 07 Oct 2022 10:24:22 +0200
Hi! Samuel Thibault <samuel.thibault <at> gnu.org> skribis: > Ludovic Courtès, le ven. 07 oct. 2022 00:10:15 +0200, a ecrit: [...] >> Of course, the thing boots just fine on that machine when using >> ‘exec.static’. > > Uh. At least you have a workaround :) Yup. :-) >> So the issue might be somewhere in ld.so, or triggered by ld.so. >> Any debugging tricks here? > > maybe for a start check with > > show map $map2 > > what is actually mapped, whether it's just ld.so, or also exec, etc. On the first trap (page fault) I see: --8<---------------cut here---------------start------------->8--- db> show all tasks ID TASK NAME [THREADS] 0 f5f7cf00 gnumach [7] 1 f5f7ce40 ext2fs [6] 2 f5f7cd80 exec [1] db> show map $map2 Map 0xf5f6ff30: name="exec", pmap=0xf5f71fa8,ref=1,nentries=5 size=290816,resident:290816,wired=0 version=14 map entry 0xf625ec08: start=0x0, end=0x1000 prot=1/7/copy, object=0xf5f6a7d0, offset=0x0 Object 0xf5f6a7d0: size=0x1000, 1 references 1 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5938968 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625ebb0: start=0x1000, end=0x26000 prot=5/7/copy, object=0xf5f6ad70, offset=0x0 Object 0xf5f6ad70: size=0x25000, 1 references 37 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82780 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb58: start=0x26000, end=0x34000 prot=1/7/copy, object=0xf5f6ad20, offset=0x0 Object 0xf5f6ad20: size=0xe000, 1 references 14 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82730 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb00: start=0x34000, end=0x37000 prot=3/7/copy, object=0xf5f6acd0, offset=0x0 Object 0xf5f6acd0: size=0x3000, 1 references 3 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f826e0 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eaa8: start=0xbfff0000, end=0xc0000000 prot=3/7/copy, object=0xf5f6ac80, offset=0x0 Object 0xf5f6ac80: size=0x10000, 1 references 16 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82690 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 --8<---------------cut here---------------end--------------->8--- The mappings appear to match the PT_LOAD sections of ld.so: --8<---------------cut here---------------start------------->8--- $ objdump -x /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1|head -16 /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1: file format elf32-i386 /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 architecture: i386, flags 0x00000150: HAS_SYMS, DYNAMIC, D_PAGED start address 0x000011b0 Program Header: LOAD off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12 filesz 0x00000dd8 memsz 0x00000dd8 flags r-- LOAD off 0x00001000 vaddr 0x00001000 paddr 0x00001000 align 2**12 filesz 0x000244a1 memsz 0x000244a1 flags r-x LOAD off 0x00026000 vaddr 0x00026000 paddr 0x00026000 align 2**12 filesz 0x0000d5e8 memsz 0x0000d5e8 flags r-- LOAD off 0x00033f60 vaddr 0x00034f60 paddr 0x00034f60 align 2**12 filesz 0x00001910 memsz 0x00001a6c flags rw- --8<---------------cut here---------------end--------------->8--- … so ‘exec_load’ is doing its job, it seems. I also tried to set up a breakpoint on ‘task_terminate’ to see what’s going on when the exec task vanishes: --8<---------------cut here---------------start------------->8--- module 0: ext2fs --multiboot-command-line=${kernel-command-line} --host-priv-por t=${host-port} --device-master-port=${device-port} --exec-server-task=${exec-tas k} --store-type=typed --x-xattr-translator-records ${root} $(task-create) $(task -resume) module 1: exec /gnu/store/0na7ic689glydngxyb7pazjixz9b6629-hurd-0.9-1.91a5167/hu rd/exec $(exec-task=task-create) 2 multiboot modules task loaded: ext2fs --multiboot-command-line=root=device:hd0s1 root=3367134b-cf6 6-e06b-ee4b-4b933367134b gnu.system=/gnu/store/g7kssjfrxqjpbr6r31idiasgglfph2y5- system gnu.load=/gnu/store/g7kssjfrxqjpbr6r31idiasgglfph2y5-system/boot --host-p riv-port=1 --device-master-port=2 --exec-server-task=3 --store-type=typed --x-xa ttr-translator-records device:hd0s1 task loaded: exec /gnu/store/0na7ic689glydngxyb7pazjixz9b6629-hurd-0.9-1.91a5167 /hurd/exec start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1] execKernel Breakpoint trap, eip 0xc10305c1 Breakpoint at task_terminate: pushl %ebp db> show all threads TASK THREADS 0 gnumach (f5f7cf00): 7 threads: 0 (f5f7be18) .W..N. 0xc11dac04 1 (f5f7bcd0) R..O..(idle_thread_continue) 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 1 ext2fs (f5f7ce40): 6 threads: 0 (f5f7b520) .W.O.F(mach_msg_continue) 0 1 (f5f7b290) .W.O..(mach_msg_receive_continue) 0 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 3 (f5f7b000) .W.O..(mach_msg_continue) 0 4 (f67d4e20) .W.O..(mach_msg_receive_continue) 0 5 (f67d4cd8) .W.O..(mach_msg_continue) 0 2 exec (f5f7cd80): (f5f7b3d8) R..... db> trace/t 0xf5f7b3d8 task_terminate(f625eb10,0,f5f7cd80,f5f7b3d8,c11da940) exception_try_task(1,1,bffefffc,ffffffff,c1202b4c)+0x58 exception(1,1,bffefffc,c10096da,f5957fbc)+0x7a interrupted_pc(1,1,bffefffc,c102ce99,c1202b40) trap_name(1,f5957f80,f5f73f4c,f5f73f58) vm_fault(f5f6ff30,bffef000,3,0,0,c1008ee4,f5f82550,fb7d9000)+0x74a user_trap(f5f7a718)+0x2df >>>>> Page fault (14) at 0x1000 <<<<< >>>>> user space <<<<< db> show map $map2 Map 0xf5f6ff30: name="exec", pmap=0xf5f71fa8,ref=1,nentries=5 size=290816,resident:290816,wired=0 version=14 map entry 0xf625ec08: start=0x0, end=0x1000 prot=1/7/copy, object=0xf5f6a7d0, offset=0x0 Object 0xf5f6a7d0: size=0x1000, 1 references 1 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5938968 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625ebb0: start=0x1000, end=0x26000 prot=5/7/copy, object=0xf5f6ad70, offset=0x0 Object 0xf5f6ad70: size=0x25000, 1 references 37 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82780 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 --8<---------------cut here---------------end--------------->8--- It says “page fault at 0x1000” but there is apparently a valid mapping at that address. Funny thing: if I set a breakpoint on ‘read_exec’ and continue each time it’s hit, the ‘exec’ process starts just fine. Could it be a synchronization issue somewhere? Thanks, Ludo’.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Fri, 07 Oct 2022 21:17:03 GMT) Full text and rfc822 format available.Message #23 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Samuel Thibault <samuel.thibault <at> gnu.org> To: Ludovic Courtès <ludo <at> gnu.org> Cc: bug-hurd <at> gnu.org, 58320 <at> debbugs.gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Fri, 7 Oct 2022 23:16:43 +0200
Ludovic Courtès, le ven. 07 oct. 2022 10:24:22 +0200, a ecrit: > trap, eip 0xc10305c1 > Breakpoint at task_terminate: pushl %ebp > db> show all threads > TASK THREADS > 0 gnumach (f5f7cf00): 7 threads: > 0 (f5f7be18) .W..N. 0xc11dac04 > 1 (f5f7bcd0) R..O..(idle_thread_continue) > 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 > 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c > 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 > 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 > 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 > 1 ext2fs (f5f7ce40): 6 threads: > 0 (f5f7b520) .W.O.F(mach_msg_continue) 0 > 1 (f5f7b290) .W.O..(mach_msg_receive_continue) 0 > 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 > 3 (f5f7b000) .W.O..(mach_msg_continue) 0 > 4 (f67d4e20) .W.O..(mach_msg_receive_continue) 0 > 5 (f67d4cd8) .W.O..(mach_msg_continue) 0 > 2 exec (f5f7cd80): (f5f7b3d8) R..... > db> trace/t 0xf5f7b3d8 > task_terminate(f625eb10,0,f5f7cd80,f5f7b3d8,c11da940) > exception_try_task(1,1,bffefffc,ffffffff,c1202b4c)+0x58 > exception(1,1,bffefffc,c10096da,f5957fbc)+0x7a > interrupted_pc(1,1,bffefffc,c102ce99,c1202b40) > trap_name(1,f5957f80,f5f73f4c,f5f73f58) > vm_fault(f5f6ff30,bffef000,3,0,0,c1008ee4,f5f82550,fb7d9000)+0x74a > user_trap(f5f7a718)+0x2df > >>>>> Page fault (14) at 0x1000 <<<<< > >>>>> user space <<<<< > db> show map $map2 > Map 0xf5f6ff30: name="exec", pmap=0xf5f71fa8,ref=1,nentries=5 > size=290816,resident:290816,wired=0 > version=14 > map entry 0xf625ec08: start=0x0, end=0x1000 > prot=1/7/copy, object=0xf5f6a7d0, offset=0x0 > Object 0xf5f6a7d0: size=0x1000, 1 references > 1 resident pages, 0 absent pages, 0 paging ops > memory object=0x0 (offset=0x0),control=0x0, name=0xf5938968 > uninitialized,temporary internal,copy_strategy=0 > shadow=0x0 (offset=0x0),copy=0x0 > map entry 0xf625ebb0: start=0x1000, end=0x26000 > prot=5/7/copy, object=0xf5f6ad70, offset=0x0 > Object 0xf5f6ad70: size=0x25000, 1 references > 37 resident pages, 0 absent pages, 0 paging ops > memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82780 > uninitialized,temporary internal,copy_strategy=0 > shadow=0x0 (offset=0x0),copy=0x0 > --8<---------------cut here---------------end--------------->8--- > > It says “page fault at 0x1000” but there is apparently a valid mapping > at that address. > > Funny thing: if I set a breakpoint on ‘read_exec’ and continue each time > it’s hit, the ‘exec’ process starts just fine. > > Could it be a synchronization issue somewhere? It'd be surprising that you never gets the issue later on with the system fully booted. About the backtrace: >>>>> user space <<<<< 0x1000(bfffff24,0,0,1160b,0) 0x11627(bfffff9c,0,0,0,2) 0x11bb() That is quite surprising actually: in my ld.so there is nothing useful at 0x1000. Perhaps you can check what 0x11627 is all about? Also, > Program Header: > LOAD off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12 > filesz 0x00000dd8 memsz 0x00000dd8 flags r-- We don't have this section in the Debian glibc. It'd probably be useful to know what this is about. > LOAD off 0x00001000 vaddr 0x00001000 paddr 0x00001000 align 2**12 > filesz 0x000244a1 memsz 0x000244a1 flags r-x > LOAD off 0x00026000 vaddr 0x00026000 paddr 0x00026000 align 2**12 > filesz 0x0000d5e8 memsz 0x0000d5e8 flags r-- > LOAD off 0x00033f60 vaddr 0x00034f60 paddr 0x00034f60 align 2**12 > filesz 0x00001910 memsz 0x00001a6c flags rw- Samuel
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Sat, 08 Oct 2022 15:53:02 GMT) Full text and rfc822 format available.Message #26 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Sat, 08 Oct 2022 17:52:11 +0200
Hi Samuel, Samuel Thibault <samuel.thibault <at> gnu.org> skribis: > About the backtrace: > >>>>>> user space <<<<< > 0x1000(bfffff24,0,0,1160b,0) > 0x11627(bfffff9c,0,0,0,2) > 0x11bb() > > That is quite surprising actually: in my ld.so there is nothing useful > at 0x1000. Perhaps you can check what 0x11627 is all about? Sure: --8<---------------cut here---------------start------------->8--- $ addr2line -e /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 0x1000 0x11627 0x11bb ??:0 /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333 :? --8<---------------cut here---------------end--------------->8--- That’s ‘_dl_fatal_printf’ calling ‘_exit’; it’s trying to tell us something. I’ll try and rebuild the system with the debugging patches at <https://lists.gnu.org/archive/html/bug-hurd/2011-11/msg00038.html>, to get early ld.so output, for lack of a better solution… >> Program Header: >> LOAD off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12 >> filesz 0x00000dd8 memsz 0x00000dd8 flags r-- > > We don't have this section in the Debian glibc. It'd probably be useful > to know what this is about. Address 0 is for the ‘_begin’ symbol, passed by -Wl,-defsym: --8<---------------cut here---------------start------------->8--- i586-pc-gnu-gcc -nostdlib -nostartfiles -r -o /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/librtld.os '-Wl,-(' /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/dl-allobjs.os /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/rtld-libc.a -lgcc '-Wl,-)' \ -Wl,-Map,/tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/librtld.os.map i586-pc-gnu-gcc -nostdlib -nostartfiles -shared -o /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/ld.so.new \ -Wl,-z,combreloc -Wl,-z,relro -Wl,--hash-style=both -Wl,-z,defs \ /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/librtld.os -Wl,--version-script=/tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/ld.map \ -Wl,-soname=ld.so.1 \ -Wl,-defsym=_begin=0 i586-pc-gnu-readelf -s /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/ld.so.new \ | gawk '($7 ~ /^UND(|EF)$/ && $1 != "0:" && $4 != "REGISTER") { print; p=1 } END { exit p != 0 }' mv -f /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/ld.so.new /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/build/elf/ld.so --8<---------------cut here---------------end--------------->8--- And indeed: --8<---------------cut here---------------start------------->8--- $ objdump -t /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1|grep _begin 00000000 l *ABS* 00000000 _begin --8<---------------cut here---------------end--------------->8--- That ‘-Wl,-defsym=_begin=0’ flag was removed in glibc commit 6f043e0ee7e477f50a44024ed0cb579d5e3f511d (April 2022). On darnassus it’s different but then it’s Debian’s glibc 2.35, natively built, so I don’t what conclusions can be drawn: --8<---------------cut here---------------start------------->8--- ludo <at> darnassus:~$ /lib/ld.so.1 --version ld.so (Debian GLIBC 2.35-1) stable release version 2.35. Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ludo <at> darnassus:~$ objdump -x /lib/ld.so.1 |head -40 /lib/ld.so.1: file format elf32-i386 /lib/ld.so.1 architecture: i386, flags 0x00000150: HAS_SYMS, DYNAMIC, D_PAGED start address 0x0001cc40 Program Header: LOAD off 0x00000000 vaddr 0x00000000 paddr 0x00000000 align 2**12 filesz 0x00038494 memsz 0x00038494 flags r-x LOAD off 0x00038c00 vaddr 0x00039c00 paddr 0x00039c00 align 2**12 filesz 0x00001ca8 memsz 0x00001e34 flags rw- DYNAMIC off 0x00039f24 vaddr 0x0003af24 paddr 0x0003af24 align 2**2 filesz 0x000000b8 memsz 0x000000b8 flags rw- NOTE off 0x00000114 vaddr 0x00000114 paddr 0x00000114 align 2**2 filesz 0x00000024 memsz 0x00000024 flags r-- --8<---------------cut here---------------end--------------->8--- Thanks for your feedback! Ludo’.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Sun, 09 Oct 2022 16:10:02 GMT) Full text and rfc822 format available.Message #29 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Sun, 09 Oct 2022 18:09:07 +0200
Hi! Ludovic Courtès <ludo <at> gnu.org> skribis: > $ addr2line -e /gnu/store/m8afvcgwmrfhvjpd7b0xllk8vv5isd6j-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 0x1000 0x11627 0x11bb > ??:0 > /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333 > :? > > > That’s ‘_dl_fatal_printf’ calling ‘_exit’; it’s trying to tell us > something. > > I’ll try and rebuild the system with the debugging patches at > <https://lists.gnu.org/archive/html/bug-hurd/2011-11/msg00038.html>, to > get early ld.so output, for lack of a better solution… I tried adapted the patches above and tried them, but it seems that ‘_dl_sysdep_start’ isn’t even reached. For example, I set a breakpoint on ‘mach_task_self’ (called from ‘__mach_init’, called from ‘_dl_sysdep_start’), but that’s never reached (I’m assuming ‘break/tu’ is reliable, is it?). The user-space backtrace upon trap remains unhelpful: --8<---------------cut here---------------start------------->8--- start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1]Kernel Breakpoint trap, eip 0xc1030d5b Breakpoint at task_resume: pushl %ebp db> debug traps /on db> b task_terminate set breakpoint #2 db> c Kernel Debug trap trap, eip 0xc1030d5b execkernel: Page fault (14), code=6 Stopped at 0x1000: pushl 0x4(%ebx) >>>>> user space <<<<< 0x1000(bfffff24,0,0,1160b,0) 0x11627(bfffff9c,0,0,0,2) 0x11bb() --8<---------------cut here---------------end--------------->8--- … where: --8<---------------cut here---------------start------------->8--- $ addr2line -e /gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 0x11627 0x11bb /tmp/guix-build-glibc-cross-i586-pc-gnu-2.33.drv-0/glibc-2.33/elf/dl-misc.c:333 :? $ objdump -S /gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1 --start-address=0x000011b0 |head -40 /gnu/store/4p1kab1c4h7h3kvgcm1hbjja4y5k9x4p-glibc-cross-i586-pc-gnu-2.33/lib/ld.so.1: file format elf32-i386 Disassembly of section .text: 000011b0 <_start>: 11b0: 89 e0 mov %esp,%eax 11b2: 83 ec 0c sub $0xc,%esp 11b5: 50 push %eax 11b6: e8 b5 0a 00 00 call 1c70 <_dl_start> 11bb: 83 c4 10 add $0x10,%esp --8<---------------cut here---------------end--------------->8--- So it would seem that ‘_dl_start’ is called and somehow then a tail-call to ‘_dl_fatal_printf’ is made. Through a dichotomy I tried to see how far it goes. The info I have so far is that ld.so errors out from elf/rtld.c:563 (line 565 is not reached): --8<---------------cut here---------------start------------->8--- 558: if (bootstrap_map.l_addr || ! bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)]) 559: { 560: /* Relocate ourselves so we can do normal function calls and 561: data access using the global offset table. */ 562: 563: ELF_DYNAMIC_RELOCATE (&bootstrap_map, 0, 0, 0); 564: } 565: bootstrap_map.l_relocated = 1; ... 578: __rtld_malloc_init_stubs (); --8<---------------cut here---------------end--------------->8--- It’s hard to be more precise because ELF_DYNAMIC_RELOCATE is a macro that expands to quite a lot of code. I don’t see the code path that would lead to a ‘_dl_fatal_printf’ call though. Ideas? :-) Ludo’.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Sun, 09 Oct 2022 19:10:01 GMT) Full text and rfc822 format available.Message #32 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Samuel Thibault <samuel.thibault <at> gnu.org> To: Ludovic Courtès <ludo <at> gnu.org> Cc: bug-hurd <at> gnu.org, 58320 <at> debbugs.gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Sun, 9 Oct 2022 21:09:33 +0200
Ludovic Courtès, le dim. 09 oct. 2022 18:09:07 +0200, a ecrit: > So it would seem that ‘_dl_start’ is called and somehow then a tail-call > to ‘_dl_fatal_printf’ is made. Perhaps you can build glibc without tail-call optimization? (-fno-optimize-sibling-calls) Samuel
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Mon, 10 Oct 2022 21:15:02 GMT) Full text and rfc822 format available.Message #35 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Mon, 10 Oct 2022 23:14:15 +0200
Ludovic Courtès <ludo <at> gnu.org> skribis: > Through a dichotomy I tried to see how far it goes. The info I have so > far is that ld.so errors out from elf/rtld.c:563 (line 565 is not > reached): > > 558: if (bootstrap_map.l_addr || ! bootstrap_map.l_info[VALIDX(DT_GNU_PRELINKED)]) > 559: { > 560: /* Relocate ourselves so we can do normal function calls and > 561: data access using the global offset table. */ > 562: > 563: ELF_DYNAMIC_RELOCATE (&bootstrap_map, 0, 0, 0); > 564: } > 565: bootstrap_map.l_relocated = 1; > ... > 578: __rtld_malloc_init_stubs (); Via brute force¹, I found that ‘__assert_fail’ is hit, with its first argument in $eax being: --8<---------------cut here---------------start------------->8--- db> x/c 0x28604,80 ELF32_R_TYPE (reloc->r_info) == R_386_RELATIVE\000\000map->l_in fo[VERSYMIDX (DT_VERSYM)] != NULL\000\000Fatal glibc error: Too many audit mo --8<---------------cut here---------------end--------------->8--- This comes from i386/dl-machine.h: --8<---------------cut here---------------start------------->8--- auto inline void __attribute ((always_inline)) elf_machine_rel_relative (Elf32_Addr l_addr, const Elf32_Rel *reloc, void *const reloc_addr_arg) { Elf32_Addr *const reloc_addr = reloc_addr_arg; assert (ELF32_R_TYPE (reloc->r_info) == R_386_RELATIVE); *reloc_addr += l_addr; } --8<---------------cut here---------------end--------------->8--- How can we get there? Looking at ‘_dl_start’, it could be that ‘elf_machine_load_address’ returns a bogus value and we end up reading wrong ELF data? Or it could be memory corruption somewhere. Or…? Thing is, it’s not fully deterministic (happens 9 times out of 10 with KVM, never happens without KVM). Ideas? :-) Ludo’. ¹ Building with ‘-fno-optimize-sibling-calls’ didn’t help get nicer backtraces, but that’s prolly because all that early relocation code is inlined.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Mon, 17 Oct 2022 12:52:02 GMT) Full text and rfc822 format available.Message #38 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Mon, 17 Oct 2022 14:51:01 +0200
Hi, Ludovic Courtès <ludo <at> gnu.org> skribis: > … so ‘exec_load’ is doing its job, it seems. Turns out that may not be the case. Here’s a *bad* mapping on the second ‘task_resume’ breakpoint (when ‘exec’ is about to start): --8<---------------cut here---------------start------------->8--- db> show all threads TASK THREADS 0 gnumach (f5f7cf00): 7 threads: 0 (f5f7be18) .W..N. 0xc11dac04 1 (f5f7bcd0) R..O..(idle_thread_continue) 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 1 ext2fs (f5f7ce40): 6 threads: 0 (f5f7b520) R....F 1 (f5f7b290) .W.O..(mach_msg_receive_continue) 0 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 3 (f5f7b000) .W.O..(mach_msg_continue) 0 4 (f67d3e20) .W.O..(mach_msg_receive_continue) 0 5 (f67d3cd8) .W.O..(mach_msg_continue) 0 2 exec (f5f7cd80): (f5f7b3d8) ..SO..(thread_bootstrap_return) db> trace task_resume(f593e010,fb7d9010,f5f73e80,c106972a) ipc_kobject_server(f593e000,3,18,0)+0x1eb mach_msg_trap(bffff4c0,3,18,20,8)+0x1703 >>>>> user space <<<<< db> x/tbx 0xcbc 0xf5f7b3d8 no memory is assigned to address 00000cbc 0 db> show map $map2 Map 0xf5f6ff30: name="exec", pmap=0xf5f71fa8,ref=1,nentries=5 size=290816,resident:225280,wired=0 version=13 map entry 0xf625ec08: start=0x0, end=0x1000 prot=1/7/copy, object=0x0, offset=0x0 map entry 0xf625ebb0: start=0x1000, end=0x26000 prot=5/7/copy, object=0xf5f6ad70, offset=0x0 Object 0xf5f6ad70: size=0x25000, 1 references 37 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82780 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb58: start=0x26000, end=0x34000 prot=1/7/copy, object=0xf5f6ad20, offset=0x0 Object 0xf5f6ad20: size=0xe000, 1 references 14 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82730 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb00: start=0x34000, end=0x37000 prot=3/7/copy, object=0xf5f6acd0, offset=0x0 Object 0xf5f6acd0: size=0x3000, 1 references 3 resident pages,--db_more-- --8<---------------cut here---------------end--------------->8--- Compare with what a “good” mapping looks like at that same moment: --8<---------------cut here---------------start------------->8--- start ext2fs: Hurd server bootstrap: ext2fs[device:hd0s1]Kernel Breakpoint trap, eip 0xc1030d5b Breakpoint at task_resume: pushl %ebp db> show all threads TASK THREADS 0 gnumach (f5f7cf00): 7 threads: 0 (f5f7be18) .W..N. 0xc11dac04 1 (f5f7bcd0) R..O..(idle_thread_continue) 2 (f5f7bb88) .W.ON.(reaper_thread_continue) 0xc12015d4 3 (f5f7ba40) .W.ON.(swapin_thread_continue) 0xc11f8e2c 4 (f5f7b8f8) .W.ON.(sched_thread_continue) 0 5 (f5f7b7b0) .W.ON.(io_done_thread_continue) 0xc1201f74 6 (f5f7b668) .W.ON.(net_thread_continue) 0xc11db0a8 1 ext2fs (f5f7ce40): 6 threads: 0 (f5f7b520) R....F 1 (f5f7b290) .W.O..(mach_msg_receive_continue) 0 2 (f5f7b148) .W.O..(mach_msg_receive_continue) 0 3 (f5f7b000) .W.O..(mach_msg_continue) 0 4 (f67d2e20) .W.O..(mach_msg_receive_continue) 0 5 (f67d2cd8) .W.O..(mach_msg_continue) 0 2 exec (f5f7cd80): (f5f7b3d8) ..SO..(thread_bootstrap_return) db> x/tbx 0xcbc 0xf5f7b3d8 8 db> show map $map2 Map 0xf5f6ff30: name="exec", pmap=0xf5f71fa8,ref=1,nentries=5 size=290816,resident:229376,wired=0 version=14 map entry 0xf625ec08: start=0x0, end=0x1000 prot=1/7/copy, object=0xf5f6ad70, offset=0x0 Object 0xf5f6ad70: size=0x1000, 1 references 1 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82780 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625ebb0: start=0x1000, end=0x26000 prot=5/7/copy, object=0xf5f6ad20, offset=0x0 Object 0xf5f6ad20: size=0x25000, 1 references 37 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82730 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb58: start=0x26000, end=0x34000 prot=1/7/copy, object=0xf5f6acd0, offset=0x0 Object 0xf5f6acd0: size=0xe000, 1 references 14 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f826e0 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eb00: start=0x34000, end=0x37000 prot=3/7/copy, object=0xf5f6ac80, offset=0x0 Object 0xf5f6ac80: size=0x3000, 1 references 3 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82690 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 map entry 0xf625eaa8: start=0xbfff0000, end=0xc0000000 prot=3/7/copy, object=0xf5f6ac30, offset=0x0 Object 0xf5f6ac30: size=0x10000, 1 references 1 resident pages, 0 absent pages, 0 paging ops memory object=0x0 (offset=0x0),control=0x0, name=0xf5f82640 uninitialized,temporary internal,copy_strategy=0 shadow=0x0 (offset=0x0),copy=0x0 --8<---------------cut here---------------end--------------->8--- Notice that 0xcbc reads a valid relocation, where 8 = R_386_RELATIVE. In the “bad” case, the first map entry is empty, with no associated memory object and zero resident pages. My reading of ‘read_exec’ is that the page is supposed to be populated eagerly by the ‘copyout’ call here: --8<---------------cut here---------------start------------->8--- static int read_exec(void *handle, vm_offset_t file_ofs, vm_size_t file_size, vm_offset_t mem_addr, vm_size_t mem_size, exec_sectype_t sec_type) { struct multiboot_module *mod = handle; [...] err = vm_allocate(user_map, &start_page, end_page - start_page, FALSE); assert(err == 0); assert(start_page == trunc_page(mem_addr)); if (file_size > 0) { err = copyout((char *)phystokv (mod->mod_start) + file_ofs, (void *)mem_addr, file_size); assert(err == 0); } [...] return 0; } --8<---------------cut here---------------end--------------->8--- There are interesting tricks in ‘copyout_retry’ to fake a page fault so the copy can actually be made, IIUC. Could it be that this bit isn’t quite working? Ideas? Problem with debugging this is that setting a breakpoint on ‘exec_load’ causes the system to boot fine (breaking on ‘task_resume’ is fine tough, go figure…). Ludo’.
bug-guix <at> gnu.org
:bug#58320
; Package guix
.
(Sun, 23 Oct 2022 13:59:01 GMT) Full text and rfc822 format available.Message #41 received at 58320 <at> debbugs.gnu.org (full text, mbox):
From: Ludovic Courtès <ludo <at> gnu.org> To: 58320 <at> debbugs.gnu.org Cc: bug-hurd <at> gnu.org Subject: Re: bug#58320: Hurd VM fails to boot on AMD EPYC (kvm-amd) Date: Sun, 23 Oct 2022 15:58:47 +0200
Hi, Ludovic Courtès <ludo <at> gnu.org> skribis: > Of course, the thing boots just fine on that machine when using > ‘exec.static’. It’s frustrating I did not get to the bottom of it, but time passes, so I pushed this workaround in Guix commit 3fb3bd3da530a5f82a169b1fa451474f9d90c3b6. Ludo’.
Ludovic Courtès <ludo <at> gnu.org>
to control <at> debbugs.gnu.org
.
(Thu, 29 Aug 2024 09:42:01 GMT) Full text and rfc822 format available.Ludovic Courtès <ludo <at> gnu.org>
to control <at> debbugs.gnu.org
.
(Thu, 29 Aug 2024 09:42:02 GMT) Full text and rfc822 format available.Debbugs Internal Request <help-debbugs <at> gnu.org>
to internal_control <at> debbugs.gnu.org
.
(Thu, 26 Sep 2024 11:24:07 GMT) Full text and rfc822 format available.
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.