GNU bug report logs - #63197
video acceleration/libva segfaults caused by stale mesa shader cache

Previous Next

Package: guix;

Reported by: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>

Date: Mon, 1 May 2023 02:43:01 UTC

Severity: normal

To reply to this bug, email your comments to 63197 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#63197; Package guix. (Mon, 01 May 2023 02:43:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Maxim Cournoyer <maxim.cournoyer <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 01 May 2023 02:43:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: bug-guix <bug-guix <at> gnu.org>
Subject: video acceleration/libva segfaults caused by stale mesa shader cache
Date: Sun, 30 Apr 2023 22:42:41 -0400
Hi,

After reinstalling someone's desktop which has support for VA-API,
'vainfo' from 'libva-utils' would consume all the memory then crash.
Other applications relying on libva would crash as well, e.g. ffmpeg (or
its users, such as vlc/jami).  Here's a sample output from VLC:

--8<---------------cut here---------------start------------->8---
vlc received_605209834855384.mp4 
VLC media player 3.0.18 Vetinari (revision 3.0.13-8-g41878ff4f2)
[000000000109d770] main libvlc: Lancement de vlc avec l'interface par défaut. Utiliser « cvlc » pour démarrer VLC sans interface.
libva info: VA-API version 1.17.0
libva info: Trying to open /gnu/store/9pypr3c3y379shbwm9ilb4pik9mkfd83-mesa-22.2.4/lib/dri/radeonsi_drv_video.so
libva info: Found init function __vaDriverInit_1_17
Erreur de segmentation
--8<---------------cut here---------------end--------------->8---

After tracing the process, I noticed that the last thing it did was
loading its mesa shader cache, stored under:

--8<---------------cut here---------------start------------->8---
~/.cache/mesa_shader_cache
--8<---------------cut here---------------end--------------->8---

Deleting that directory resolved the issue.

It seems that'd be a bug in Mesa (for failing to determine that it
should have invalidated its cache going from version 21 to 22 post
core-updates merge).

-- 
Thanks,
Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#63197; Package guix. (Mon, 01 May 2023 02:59:01 GMT) Full text and rfc822 format available.

Message #8 received at 63197 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: 63197 <at> debbugs.gnu.org
Subject: Re: bug#63197: video acceleration/libva segfaults caused by stale
 mesa shader cache
Date: Sun, 30 Apr 2023 22:58:01 -0400
Hi,

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

> Hi,
>
> After reinstalling someone's desktop which has support for VA-API,
> 'vainfo' from 'libva-utils' would consume all the memory then crash.
> Other applications relying on libva would crash as well, e.g. ffmpeg (or
> its users, such as vlc/jami).  Here's a sample output from VLC:
>
> vlc received_605209834855384.mp4 
> VLC media player 3.0.18 Vetinari (revision 3.0.13-8-g41878ff4f2)
> [000000000109d770] main libvlc: Lancement de vlc avec l'interface par défaut. Utiliser « cvlc » pour démarrer VLC sans interface.
> libva info: VA-API version 1.17.0
> libva info: Trying to open /gnu/store/9pypr3c3y379shbwm9ilb4pik9mkfd83-mesa-22.2.4/lib/dri/radeonsi_drv_video.so
> libva info: Found init function __vaDriverInit_1_17
> Erreur de segmentation
>
>
> After tracing the process, I noticed that the last thing it did was
> loading its mesa shader cache, stored under:
>
> ~/.cache/mesa_shader_cache
>
> Deleting that directory resolved the issue.
>
> It seems that'd be a bug in Mesa (for failing to determine that it
> should have invalidated its cache going from version 21 to 22 post
> core-updates merge).

I've forwarded this report upstream here:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/8937

-- 
Thanks,
Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#63197; Package guix. (Thu, 15 Jun 2023 13:51:02 GMT) Full text and rfc822 format available.

Message #11 received at 63197 <at> debbugs.gnu.org (full text, mbox):

From: Giovanni Biscuolo <g <at> xelera.eu>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>, 63197 <at> debbugs.gnu.org
Subject: Re: bug#63197: video acceleration/libva segfaults caused by stale
 mesa shader cache
Date: Thu, 15 Jun 2023 15:49:50 +0200
[Message part 1 (text/plain, inline)]
Hi Maxim,

I learned about this issue today

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

[...]

>> After tracing the process, I noticed that the last thing it did was
>> loading its mesa shader cache, stored under:
>>
>> ~/.cache/mesa_shader_cache
>>
>> Deleting that directory resolved the issue.
>>
>> It seems that'd be a bug in Mesa (for failing to determine that it
>> should have invalidated its cache going from version 21 to 22 post
>> core-updates merge).

AFAIU this issue is still present using mesa 23 since Guillaume Le
Vaillant had to use this workaround yesterday [1] and reported his
backtrace upstream [2]

If I'm not wrong (i.e. vlc et al are now using mesa 23) this should also
be reported upstream (I can do it if needed).

AFAIU the only thing we can do to fix this bug is to disable the shader
cache (MESA_SHADER_CACHE_DISABLE=true) until a proper fix is found
upstream.

...or apply a patch to rename "~/.cache/mesa_shader_cache" to
"~/.cache/mesa<version>_shader_cache"

Alternatively, we should find a way to make Guix users aware of this
kind of problems and possible workarounds they can apply (it's not
related to this specific bug)


WDYT?

Thanks! Gio'


[1] id:871qify1i8.fsf <at> kitej

[2] https://gitlab.freedesktop.org/mesa/mesa/-/issues/8937#note_1960628

-- 
Giovanni Biscuolo

Xelera IT Infrastructures
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#63197; Package guix. (Sat, 17 Jun 2023 00:37:02 GMT) Full text and rfc822 format available.

Message #14 received at 63197 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Giovanni Biscuolo <g <at> xelera.eu>
Cc: 63197 <at> debbugs.gnu.org
Subject: Re: bug#63197: video acceleration/libva segfaults caused by stale
 mesa shader cache
Date: Fri, 16 Jun 2023 20:36:49 -0400
Hello,

Giovanni Biscuolo <g <at> xelera.eu> writes:

> Hi Maxim,
>
> I learned about this issue today
>
> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:
>
> [...]
>
>>> After tracing the process, I noticed that the last thing it did was
>>> loading its mesa shader cache, stored under:
>>>
>>> ~/.cache/mesa_shader_cache
>>>
>>> Deleting that directory resolved the issue.
>>>
>>> It seems that'd be a bug in Mesa (for failing to determine that it
>>> should have invalidated its cache going from version 21 to 22 post
>>> core-updates merge).
>
> AFAIU this issue is still present using mesa 23 since Guillaume Le
> Vaillant had to use this workaround yesterday [1] and reported his
> backtrace upstream [2]
>
> If I'm not wrong (i.e. vlc et al are now using mesa 23) this should also
> be reported upstream (I can do it if needed).

Which upstream are you thinking about?  My understanding is that this
problem is a Mesa problem, and it's already reported there (the issue
linked in [2]).

> AFAIU the only thing we can do to fix this bug is to disable the shader
> cache (MESA_SHADER_CACHE_DISABLE=true) until a proper fix is found
> upstream.

Disabling the shader cache sounds like a decent workaround or even
definitive solution.  One less stale cache to worry about...  If it's
like the Qt shader cache, the performance hit is probably too small to
be noticeable (maybe just slower startup times of complicated opengl
applications such as games?).

> ...or apply a patch to rename "~/.cache/mesa_shader_cache" to
> "~/.cache/mesa<version>_shader_cache"

That's another good idea.

> Alternatively, we should find a way to make Guix users aware of this
> kind of problems and possible workarounds they can apply (it's not
> related to this specific bug)

I would rather pursue the other above options you suggest, so that it
doesn't happen in the first place!

Thank you for sharing these ideas.

-- 
Maxim




Information forwarded to bug-guix <at> gnu.org:
bug#63197; Package guix. (Sat, 17 Jun 2023 10:15:01 GMT) Full text and rfc822 format available.

Message #17 received at 63197 <at> debbugs.gnu.org (full text, mbox):

From: Giovanni Biscuolo <g <at> xelera.eu>
To: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
Cc: 63197 <at> debbugs.gnu.org
Subject: Re: bug#63197: video acceleration/libva segfaults caused by stale
 mesa shader cache
Date: Sat, 17 Jun 2023 12:14:01 +0200
[Message part 1 (text/plain, inline)]
Hi Maxim

Maxim Cournoyer <maxim.cournoyer <at> gmail.com> writes:

[...]

>> AFAIU this issue is still present using mesa 23 since Guillaume Le
>> Vaillant had to use this workaround yesterday [1] and reported his
>> backtrace upstream [2]
>>
>> If I'm not wrong (i.e. vlc et al are now using mesa 23) this should also
>> be reported upstream (I can do it if needed).
>
> Which upstream are you thinking about?

mesa

> My understanding is that this problem is a Mesa problem, and it's
> already reported there (the issue linked in [2]).

yes but the original bug report mentions Mesa 22.2.4 and M. Briar asked:

--8<---------------cut here---------------start------------->8---

Mesa 22.2.x is already end-of-life and won't receive any fixes
anymore. Does this also happen on newer versions?

--8<---------------cut here---------------end--------------->8---
(https://gitlab.freedesktop.org/mesa/mesa/-/issues/8937#note_1891435)

IMHO there is no clear answer to that question in the bug thread, maybe
mesa developers still think it's just 22.2.X related

Now we have Mesa 23.0.3 in Giux, probably the one used by vlc when
Guillaume reported his issue upstream (mesa) on June 15

>> AFAIU the only thing we can do to fix this bug is to disable the shader
>> cache (MESA_SHADER_CACHE_DISABLE=true) until a proper fix is found
>> upstream.
>
> Disabling the shader cache sounds like a decent workaround or even
> definitive solution.  One less stale cache to worry about...

oh yes!  Unfortunately cache management is not so robust... sometimes :-(

> If it's like the Qt shader cache, the performance hit is probably too
> small to be noticeable (maybe just slower startup times of complicated
> opengl applications such as games?).

I don't know the cost in term of performance, I'm not a 3D expert at
all; from what I read on the web about shader chaches I guess it's a
real problem almost only for games and I guess it's not a problem at
all for media players like vlc et al: I'm just brainstorming but what
about having a mesa-with-cache-enabled version just for the games, if it
is really needed?

I should be able to propose a patch to disable the mesa shader cache,
but since I'm not an expert in this field I prefer to leave this
decision (to disable the cache, I mean) to someone else

>> ...or apply a patch to rename "~/.cache/mesa_shader_cache" to
>> "~/.cache/mesa<version>_shader_cache"
>
> That's another good idea.

I was just doing guesswork but the bug caused by this mesa upgrade
smells like a binary incompatibility between two versions (or just major
versions)... so a versioned shader cache makes sense to me

I'm not able to propose (I mean to code) such a patch, anyway

Anyway, users should know that they have to periodically clean unused
shader caches, since from what I read on the net the shader cache tends
to really /explode/ in terms of size, in some cases

>> Alternatively, we should find a way to make Guix users aware of this
>> kind of problems and possible workarounds they can apply (it's not
>> related to this specific bug)
>
> I would rather pursue the other above options you suggest, so that it
> doesn't happen in the first place!

I agree

> Thank you for sharing these ideas.

Thank you for your attention!

Happy hacking, Gio'

-- 
Giovanni Biscuolo

Xelera IT Infrastructures
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#63197; Package guix. (Mon, 26 Jun 2023 16:22:02 GMT) Full text and rfc822 format available.

Message #20 received at 63197 <at> debbugs.gnu.org (full text, mbox):

From: Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
To: Giovanni Biscuolo <g <at> xelera.eu>
Cc: 63197 <at> debbugs.gnu.org
Subject: Re: bug#63197: video acceleration/libva segfaults caused by stale
 mesa shader cache
Date: Mon, 26 Jun 2023 12:21:20 -0400
Hi Giovanni,

Giovanni Biscuolo <g <at> xelera.eu> writes:

[...]

>>> ...or apply a patch to rename "~/.cache/mesa_shader_cache" to
>>> "~/.cache/mesa<version>_shader_cache"
>>
>> That's another good idea.
>
> I was just doing guesswork but the bug caused by this mesa upgrade
> smells like a binary incompatibility between two versions (or just major
> versions)... so a versioned shader cache makes sense to me
>
> I'm not able to propose (I mean to code) such a patch, anyway
>
> Anyway, users should know that they have to periodically clean unused
> shader caches, since from what I read on the net the shader cache tends
> to really /explode/ in terms of size, in some cases
>
>>> Alternatively, we should find a way to make Guix users aware of this
>>> kind of problems and possible workarounds they can apply (it's not
>>> related to this specific bug)
>>
>> I would rather pursue the other above options you suggest, so that it
>> doesn't happen in the first place!

I've ping'd upstream with
https://gitlab.freedesktop.org/mesa/mesa/-/issues/8937#note_1975560.
Let's see what they say!

-- 
Thanks,
Maxim




This bug report was last modified 297 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.