GNU bug report logs - #55907
VFIO kernel module fails to capture PCI device

Previous Next

Package: guix;

Reported by: "Nick Zalutskiy" <nick <at> const.fun>

Date: Sat, 11 Jun 2022 13:41:02 UTC

Severity: normal

To reply to this bug, email your comments to 55907 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#55907; Package guix. (Sat, 11 Jun 2022 13:41:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Nick Zalutskiy" <nick <at> const.fun>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Sat, 11 Jun 2022 13:41:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Nick Zalutskiy" <nick <at> const.fun>
To: bug-guix <at> gnu.org
Subject: VFIO kernel module fails to capture PCI device
Date: Sat, 11 Jun 2022 09:40:18 -0400
[Message part 1 (text/plain, inline)]
Hello all,

I am trying to capture my graphics card at initrd, using vfio, to later pass it through to a virtual machine. Judging by dmesg, the VFIO module does load early, however, the card is not captured at that point and the amdgpu driver is later loaded instead.

This is what I have in my `operating-system` config:

> (kernel-arguments '("iommu=pt" "vfio-pci.ids=1002:73bf"))
>   (initrd-modules (cons* "vfio_pci" "vfio" "vfio_iommu_type1" "vfio_virqfd" %base-initrd-modules))

There are two video cards in the system, both AMD, but different models. The video card of interest is in a separate IOMMU group and the <vendor id>:<device id> combination is correct for my machine.

Best I can tell, vfio-pci.ids argument is not propagated to the module by initramfs. See the following:

Searching online I came up against a GitHub issue for a different initramfs generator that exhibited the same symptoms: VFIO module was loaded, kernel arguments were correct, yet the card was not captured by the vfio driver. The maintainer there did a great job tracking down and fixing the issue and came up with this insight https://github.com/anatol/booster/issues/20#issuecomment-808956316

> After reading kmod code I found that kernel does not use cmdline params for loadable modules. It was surprising for me. Instead it is expected that userspace handles cmdline parsing and provides required module params explicitly.

Another way to attach the correct driver to the gpu is to run a script at initrd, which I don't know how to accomplish with Guix. This approach has the advantage of working with two identical video cards (or disks, etc) See https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF#Using_identical_guest_and_host_GPUs

I tried following the kernel docs to rebind a different driver after boot, but I believe this doesn't work for video cards, and hasn't worked for me.

Any help is greatly appreciated!

Links:
Kernel docs for vfio
https://www.kernel.org/doc/html/latest/driver-api/vfio.html

Arch guide for GPU passthrough
https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF

Thank you!

-Nick
[Message part 2 (text/html, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#55907; Package guix. (Fri, 08 Sep 2023 12:44:02 GMT) Full text and rfc822 format available.

Message #8 received at 55907 <at> debbugs.gnu.org (full text, mbox):

From: Lars Rustand <rustand.lars <at> gmail.com>
To: 55907 <at> debbugs.gnu.org
Subject: VFIO kernel module fails to capture PCI device
Date: Fri, 8 Sep 2023 10:43:57 +0200
Hello Nick,

Did you ever figure this out? I am struggling with the same problem.


Thank you,

- Lars




Information forwarded to bug-guix <at> gnu.org:
bug#55907; Package guix. (Fri, 05 Jul 2024 04:29:02 GMT) Full text and rfc822 format available.

Message #11 received at 55907 <at> debbugs.gnu.org (full text, mbox):

From: Nikola Brković <nikolabrk <at> protonmail.com>
To: "55907 <at> debbugs.gnu.org" <55907 <at> debbugs.gnu.org>
Subject: VFIO kernel module fails to capture PCI device
Date: Thu, 04 Jul 2024 21:37:24 +0000
I have managed to get VFIO working by creating a service of boot-service-type which overrides the GPU driver with vfio-pci and binds the GPU to VFIO:

>(simple-service 'vfio-override boot-service-type
>    '(and (call-with-output-file "/sys/bus/pci/devices/0000:04:00.0/driver_override"
>      (lambda (p)
>       (display "vfio-pci" p)))
>     (call-with-output-file "/sys/bus/pci/drivers/vfio-pci/new_id"
>      (lambda (p)
>       (display "1002 665f" p)))
>  )
>)

Sorry for the hard-coded IDs, you should replace them with your own. You might need to unbind the GPU's audio card from its driver as well, after you're fully booted. QEMU will refuse to pass-through the GPU if the audio card is in the same IOMMU group and not using vfio-pci.

In my case, the service runs early enough in the boot process where amdgpu has not initialized the GPU yet. There might be a better way to accomplish this, I'm still new to Guix and Scheme.

Thanks,
Nikola




This bug report was last modified 251 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.