GNU bug report logs - #75893
texlive: kpathsea doesn't use ls-R database

Previous Next

Package: guix;

Reported by: vicvbcun <guix <at> ikherbers.com>

Date: Mon, 27 Jan 2025 10:29:01 UTC

Severity: normal

To reply to this bug, email your comments to 75893 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#75893; Package guix. (Mon, 27 Jan 2025 10:29:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to vicvbcun <guix <at> ikherbers.com>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 27 Jan 2025 10:29:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: vicvbcun <guix <at> ikherbers.com>
To: bug-guix <at> gnu.org
Cc: andreas <at> enge.fr, guix <at> nicolasgoaziou.fr
Subject: texlive: kpathsea doesn't use ls-R database
Date: Mon, 27 Jan 2025 11:27:30 +0100
Hello Guix!

Consider the following example latex document:
--8<---------------cut here---------------start------------->8---
\documentclass{article}
	\usepackage{mathtools}

\begin{document}
	hello world
\end{document}
--8<---------------cut here---------------end--------------->8---
Compiling it with LuaLaTeX under strace in a shell with 
texlive-scheme-basic, texlive-collection-luatex and 
texlive-collection-latexextra, it seems like most of the time is spent 
recursively searching for input files:
--8<---------------cut here---------------start------------->8---
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 27.70    0.080138           2     30174           getdents64
 21.99    0.063605           4     15455       259 openat
 17.44    0.050460           3     16179        32 newfstatat
 14.37    0.041583           3     10440     10296 access
  8.42    0.024348           1     15196           close
  7.76    0.022456           1     15201           fstat
  0.79    0.002278           1      1868           write
--8<---------------cut here---------------end--------------->8---
and similarly for pdflatex.

As an extreme example, consider
--8<---------------cut here---------------start------------->8---
\documentclass{tudapub}

\begin{document}
	hello world
\end{document}
--8<---------------cut here---------------end--------------->8---
compiled with
--8<---------------cut here---------------start------------->8---
texlive-scheme-basic
texlive-collection-luatex
texlive-collection-latexextra
texlive-roboto texlive-urcls
texlive-xcharter
texlive-tuda-ci
--8<---------------cut here---------------end--------------->8---

This takes over 14 seconds (compared to about 2.7 seconds for lualatex 
from Arch Linux) and from strace:
--8<---------------cut here---------------start------------->8---
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 32.60    5.926537           3   1801518           getdents64
 26.46    4.809462           5    900841       284 openat
 20.90    3.799744           4    896057    895349 access
 10.19    1.851520           2    900557           close
  9.49    1.724891           1    900575           fstat
  0.28    0.050743           2     17680       229 newfstatat
  0.04    0.007077           1      6073           read
--8<---------------cut here---------------end--------------->8---

The cause for this seems to be kpathsea doesn't treat the ls-R database 
as authoritative.  It is opened but kpathsea falls back to recursive 
searching. 

In the package definition for texlive-libkpathsea, texmf.cnf is modified 
such that the TEXMF variable is set without !! in front of 
$TEXMFSYSCONFIG, $TEXMFSYSVAR and $TEXMFDIST. 
If I override $TEXMF via --cnf-line like
--8<---------------cut here---------------start------------->8---
lualatex \
	--cnf-line='TEXMF =
	{$TEXMFCONFIG,$TEXMFVAR,$TEXMFHOME,!!$TEXMFSYSCONFIG,!!$TEXMFSYSVAR,!!$TEXMFDIST}' \
	example.ltx
--8<---------------cut here---------------end--------------->8---
compilation time for the extreme example above falls to about 2.5 
seconds, without excessive searching. 

The comment above the substitution says that the !! construct wouldn't 
work for texlive-build-system or when building profiles.  I don't know 
if it would be possible to work around this but perhaps it could be 
possible to work around this if installed in profile (or environment)?

vicvbcun




Information forwarded to bug-guix <at> gnu.org:
bug#75893; Package guix. (Wed, 29 Jan 2025 18:13:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Goaziou <mail <at> nicolasgoaziou.fr>
To: bug-guix <at> gnu.org
Cc: andreas <at> enge.fr, vicvbcun <guix <at> ikherbers.com>, guix <at> nicolasgoaziou.fr
Subject: Re: texlive: kpathsea doesn't use ls-R database
Date: Wed, 29 Jan 2025 19:11:20 +0100
Hello,

vicvbcun <guix <at> ikherbers.com> writes:

> Consider the following example latex document:
>
> --8<---------------cut here---------------start------------->8---
> \documentclass{article}
> 	\usepackage{mathtools}
>
> \begin{document}
> 	hello world
> \end{document}
> --8<---------------cut here---------------end--------------->8---
>
> Compiling it with LuaLaTeX under strace in a shell with 
> texlive-scheme-basic, texlive-collection-luatex and 
> texlive-collection-latexextra, it seems like most of the time is spent 
> recursively searching for input files:
>
> --8<---------------cut here---------------start------------->8---
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>   27.70    0.080138           2     30174           getdents64
>   21.99    0.063605           4     15455       259 openat
>   17.44    0.050460           3     16179        32 newfstatat
>   14.37    0.041583           3     10440     10296 access
>    8.42    0.024348           1     15196           close
>    7.76    0.022456           1     15201           fstat
>    0.79    0.002278           1      1868           write
> --8<---------------cut here---------------end--------------->8---
>
> and similarly for pdflatex.
>
> As an extreme example, consider
>
> --8<---------------cut here---------------start------------->8---
> \documentclass{tudapub}
>
> \begin{document}
> 	hello world
> \end{document}
> --8<---------------cut here---------------end--------------->8---
>
> compiled with
>
> --8<---------------cut here---------------start------------->8---
> texlive-scheme-basic
> texlive-collection-luatex
> texlive-collection-latexextra
> texlive-roboto texlive-urcls
> texlive-xcharter
> texlive-tuda-ci
> --8<---------------cut here---------------end--------------->8---
>
>
> This takes over 14 seconds (compared to about 2.7 seconds for lualatex 
> from Arch Linux) and from strace:
>
> --8<---------------cut here---------------start------------->8---
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>   32.60    5.926537           3   1801518           getdents64
>   26.46    4.809462           5    900841       284 openat
>   20.90    3.799744           4    896057    895349 access
>   10.19    1.851520           2    900557           close
>    9.49    1.724891           1    900575           fstat
>    0.28    0.050743           2     17680       229 newfstatat
>    0.04    0.007077           1      6073           read
> --8<---------------cut here---------------end--------------->8---

Thank you for the report. I confirm the issue, unfortunately.

> The cause for this seems to be kpathsea doesn't treat the ls-R database 
> as authoritative.  It is opened but kpathsea falls back to recursive 
> searching.

AFAIU, this should not happen. According to "The TeX Live Guide 2024":

  If a file is not found in the database, by default Kpathsea goes ahead
  and searches the disk. If a particular path element begins with ‘!!’,
  however, only the database will be searched for that element, never
  the disk.

IOW, even if the "!!" prefix is not there, Kpathsea should first look
for files in ls-R, and then on the disk. As you point out, it doesn’t
happen like this, and I don’t know why.

> In the package definition for texlive-libkpathsea, texmf.cnf is modified 
> such that the TEXMF variable is set without !! in front of 
> $TEXMFSYSCONFIG, $TEXMFSYSVAR and $TEXMFDIST. 
> If I override $TEXMF via --cnf-line like
>
> --8<---------------cut here---------------start------------->8---
> lualatex \
> 	--cnf-line='TEXMF =
> 	{$TEXMFCONFIG,$TEXMFVAR,$TEXMFHOME,!!$TEXMFSYSCONFIG,!!$TEXMFSYSVAR,!!$TEXMFDIST}' \
> 	example.ltx
> --8<---------------cut here---------------end--------------->8---
>
> compilation time for the extreme example above falls to about 2.5 
> seconds, without excessive searching.

At least it proves our ls-R file is valid, at the expected location.

> The comment above the substitution says that the !! construct wouldn't 
> work for texlive-build-system or when building profiles.  I don't know 
> if it would be possible to work around this but perhaps it could be 
> possible to work around this if installed in profile (or environment)?

I don’t understand what you want to install in a profile. The ls-R file
is already built during profile generation. See "guix/profiles.scm".

Maybe we could keep "!!" prefix and create a ls-R file each time
`texlive-build-system' builds a package and every time
`texlive-updmap.cfg' is an input used to build documentation. In this
case I'm not sure about what should be done for packages propagating TeX
Live libraries without actually using them.

In any case, this would require some experimentation. And it still is
a workaround for a problem we don’t understand yet.

Regards,
-- 
Nicolas Goaziou






Information forwarded to bug-guix <at> gnu.org:
bug#75893; Package guix. (Thu, 30 Jan 2025 22:28:01 GMT) Full text and rfc822 format available.

Message #11 received at 75893 <at> debbugs.gnu.org (full text, mbox):

From: vicvbcun <guix <at> ikherbers.com>
To: Nicolas Goaziou <mail <at> nicolasgoaziou.fr>
Cc: andreas <at> enge.fr, guix <at> nicolasgoaziou.fr, 75893 <at> debbugs.gnu.org
Subject: Re: bug#75893: texlive: kpathsea doesn't use ls-R database
Date: Thu, 30 Jan 2025 23:27:29 +0100
[Message part 1 (text/plain, inline)]
Hello,

I have done some more experiments, looking at the `access' syscalls (the 
others are just the result of searching, I think).  I have attached 
everything in a tarball.

On 2025-01-29T19:11:20+0100, Nicolas Goaziou via Bug reports for GNU Guix wrote:
>Hello,
>
>vicvbcun <guix <at> ikherbers.com> writes:
>
>> Consider the following example latex document:
>>
>> --8<---------------cut here---------------start------------->8---
>> \documentclass{article}
>> 	\usepackage{mathtools}
>>
>> \begin{document}
>> 	hello world
>> \end{document}
>> --8<---------------cut here---------------end--------------->8---
>>
>> Compiling it with LuaLaTeX under strace in a shell with 
>> texlive-scheme-basic, texlive-collection-luatex and
>> texlive-collection-latexextra, it seems like most of the time is spent 
>> recursively searching for input files:
>>
>> --8<---------------cut here---------------start------------->8---
>> % time     seconds  usecs/call     calls    errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>>   27.70    0.080138           2     30174           getdents64
>>   21.99    0.063605           4     15455       259 openat
>>   17.44    0.050460           3     16179        32 newfstatat
>>   14.37    0.041583           3     10440     10296 access
>>    8.42    0.024348           1     15196           close
>>    7.76    0.022456           1     15201           fstat
>>    0.79    0.002278           1      1868           write
>> --8<---------------cut here---------------end--------------->8---
>>
>> and similarly for pdflatex.
Side note: While retrying the experiments, I found that these numbers 
must have been from a recompilation, with a clean directory are higher 
because it recursively searches for test.aux.  I have tried being extra 
careful this time :).

>>
>> As an extreme example, consider
>>
>> --8<---------------cut here---------------start------------->8---
>> \documentclass{tudapub}
>>
>> \begin{document}
>> 	hello world
>> \end{document}
>> --8<---------------cut here---------------end--------------->8---
>>
>> compiled with
>>
>> --8<---------------cut here---------------start------------->8---
>> texlive-scheme-basic
>> texlive-collection-luatex
>> texlive-collection-latexextra
>> texlive-roboto texlive-urcls
>> texlive-xcharter
>> texlive-tuda-ci
>> --8<---------------cut here---------------end--------------->8---
>>
>>
>> This takes over 14 seconds (compared to about 2.7 seconds for lualatex
>> from Arch Linux) and from strace:
>>
>> --8<---------------cut here---------------start------------->8---
>> % time     seconds  usecs/call     calls    errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>>   32.60    5.926537           3   1801518           getdents64
>>   26.46    4.809462           5    900841       284 openat
>>   20.90    3.799744           4    896057    895349 access
>>   10.19    1.851520           2    900557           close
>>    9.49    1.724891           1    900575           fstat
>>    0.28    0.050743           2     17680       229 newfstatat
>>    0.04    0.007077           1      6073           read
>> --8<---------------cut here---------------end--------------->8---
>
>Thank you for the report. I confirm the issue, unfortunately.
>
>> The cause for this seems to be kpathsea doesn't treat the ls-R database
>> as authoritative.  It is opened but kpathsea falls back to recursive
>> searching.
>
>AFAIU, this should not happen. According to "The TeX Live Guide 2024":
>
>  If a file is not found in the database, by default Kpathsea goes ahead
>  and searches the disk. If a particular path element begins with ‘!!’,
>  however, only the database will be searched for that element, never
>  the disk.
>
>IOW, even if the "!!" prefix is not there, Kpathsea should first look
>for files in ls-R, and then on the disk. As you point out, it doesn’t
>happen like this, and I don’t know why.
>
I think, it actually does work as advertised.  I looked at the basename 
of all files that are access'ed in the minimal example I sent for both 
LuaLaTex from Guix and from Arch Linux.  Comparing the logs 
(logs/minimal_vanilla.txt and logs/minimal_arch_vanilla.txt in the 
tarball):
--8<---------------cut here---------------start------------->8---
--- logs/minimal_vanilla.txt
+++ logs/minimal_arch_vanilla.txt
@@ -4 +3,0 @@
-      1 aliases                               -1
@@ -27,2 +25,0 @@
-      1 ls-R                                  0
-      1 ls-r                                  -1
@@ -284,0 +282 @@
+      3 texmf.cnf                             -1
@@ -286,0 +285 @@
+      4 aliases                               -1
@@ -290,0 +290,2 @@
+      4 ls-R                                  0
+      4 ls-r                                  -1
@@ -298,0 +300,2 @@
+     14 epstopdf.cfg                          -1
+     14 test.aux                              -1
@@ -306,2 +308,0 @@
-   9866 epstopdf.cfg                          -1
-   9866 test.aux                              -1
--8<---------------cut here---------------end--------------->8---
Where the first number is the number of times the file was tried to be 
access'ed and number at the end is -1 if the call failed and 0 if it 
succeeded.  The only meaningful difference is for epstopdf.cfg and 
test.aux, both files that exist neither on Guix nor on Arch Linux (at 
least on first compilation for test.aux).  The difference is that on 
Arch Linux LuaLaTeX only recursively searches the current directory and 
$TEXMFLOCAL while on Guix it recursively searches the entirety of 
$GUIX_TEXFM (i.e. $TEXMFDIST).

I also tried the opposite, stripping the !! from $TEXMF for LuaLaTeX on 
Arch Linux and the same problem appears (see 
logs/minimal_arch_texmf-override.txt, of course the actual numbers for 
the two files are higher as I have more packages installed).

So (un)fortunately, texlive-libkpathsea and !! seems to work as 
intended: Without !!, a positive entry in ls-R is used but the lack of 
an entry doesn't cut the search short, falling back to recursive 
searching.

Looking at the extreme example (logs/extreme_vanilla.txt), the main 
culprits for the recursive searches seem to be various .fontspec files 
and configuration files that don't exist.

>> In the package definition for texlive-libkpathsea, texmf.cnf is modified
>> such that the TEXMF variable is set without !! in front of
>> $TEXMFSYSCONFIG, $TEXMFSYSVAR and $TEXMFDIST.
>> If I override $TEXMF via --cnf-line like
>>
>> --8<---------------cut here---------------start------------->8---
>> lualatex \
>> 	--cnf-line='TEXMF =
>> 	{$TEXMFCONFIG,$TEXMFVAR,$TEXMFHOME,!!$TEXMFSYSCONFIG,!!$TEXMFSYSVAR,!!$TEXMFDIST}' \
>> 	example.ltx
>> --8<---------------cut here---------------end--------------->8---
>>
>> compilation time for the extreme example above falls to about 2.5
>> seconds, without excessive searching.
>
>At least it proves our ls-R file is valid, at the expected location.
Just for the fun of it, I tried setting $TEXMFDBS to "{}" and it 
compilation time for the minimal example went from 0.9 to 9 seconds.  I 
think there would have been more complaints if the ls-R didn't work at 
all :D.

>> The comment above the substitution says that the !! construct wouldn't
>> work for texlive-build-system or when building profiles.  I don't know
>> if it would be possible to work around this but perhaps it could be
>> possible to work around this if installed in profile (or environment)?
>
>I don’t understand what you want to install in a profile. The ls-R file
>is already built during profile generation. See "guix/profiles.scm".
What I meant was that we could maybe use a horrible hack like somehow 
overwriting texmf.cnf or wrapping the engines — anything to avoid 
rebuilding the world.  But on a second thought, LaTeX should mostly be a 
build time dependency so that grafting with a version capable of 
handling both the build environment and being installed should work 
well, right?  At least until the next TeX Live release.

>Maybe we could keep "!!" prefix and create a ls-R file each time
>`texlive-build-system' builds a package and every time
>`texlive-updmap.cfg' is an input used to build documentation. In this
>case I'm not sure about what should be done for packages propagating TeX
>Live libraries without actually using them.
I think, that the best solution would be to somehow try to make !! work 
in the build environment but I'm unsure how.  Perhaps the Nix folks have 
a solution for the problem?

>In any case, this would require some experimentation. And it still is
>a workaround for a problem we don’t understand yet.
>
>Regards,
>-- 
>Nicolas Goaziou

vicvbcun
[texlive-kpathsea-debugging.tar.zst (application/octet-stream, attachment)]

Information forwarded to bug-guix <at> gnu.org:
bug#75893; Package guix. (Thu, 30 Jan 2025 22:55:02 GMT) Full text and rfc822 format available.

Message #14 received at 75893 <at> debbugs.gnu.org (full text, mbox):

From: vicvbcun <guix <at> ikherbers.com>
To: Nicolas Goaziou <mail <at> nicolasgoaziou.fr>, 75893 <at> debbugs.gnu.org,
 andreas <at> enge.fr, guix <at> nicolasgoaziou.fr
Subject: Re: bug#75893: texlive: kpathsea doesn't use ls-R database
Date: Thu, 30 Jan 2025 23:54:52 +0100
On 2025-01-30T23:27:29+0100, vicvbcun wrote
> [...]
>>>The comment above the substitution says that the !! construct wouldn't 
>>>work for texlive-build-system or when building profiles.  I don't know 
>>>if it would be possible to work around this but perhaps it could be 
>>>possible to work around this if installed in profile (or environment)?
>>
>>I don’t understand what you want to install in a profile. The ls-R file
>>is already built during profile generation. See "guix/profiles.scm".
>What I meant was that we could maybe use a horrible hack like somehow 
>overwriting texmf.cnf or wrapping the engines — anything to avoid 
>rebuilding the world.  But on a second thought, LaTeX should mostly be 
>a build time dependency so that grafting with a version capable of 
>handling both the build environment and being installed should work 
>well, right?  At least until the next TeX Live release.
Actually, on a third thought, the following cursed approach might work: 
Create a variant `texlive-libkpathsea/ls-R-authoritative' of 
`texlive-libkpathsea' with the only difference being !! in front of 
$TEXMFDIST in texmf.cnf and register it as a replacement for 
`texlive-libkpathsea'.  That way packages are built with the original, 
ungrafted version but when a user installs TeX Live packages they get 
the version for which the ls-R database is authoritative.

An issue with this would be that ungexp'ing a texlive-* package 
referencing `texlive-libkpathsea' should yield the grafted version so 
the profile hook would probably need to be changed.

vicvbcun






Information forwarded to bug-guix <at> gnu.org:
bug#75893; Package guix. (Mon, 10 Feb 2025 11:23:02 GMT) Full text and rfc822 format available.

Message #17 received at 75893 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Goaziou <mail <at> nicolasgoaziou.fr>
To: vicvbcun <guix <at> ikherbers.com>
Cc: andreas <at> enge.fr, guix <at> nicolasgoaziou.fr, 75893 <at> debbugs.gnu.org
Subject: Re: bug#75893: texlive: kpathsea doesn't use ls-R database
Date: Mon, 10 Feb 2025 12:21:41 +0100
Hello,

vicvbcun <guix <at> ikherbers.com> writes:

> On 2025-01-30T23:27:29+0100, vicvbcun wrote
>> [...]
>>>> The comment above the substitution says that the !! construct
>>>> wouldn't work for texlive-build-system or when building profiles.
>>>> I don't know if it would be possible to work around this but
>>>> perhaps it could be possible to work around this if installed in
>>>> profile (or environment)?
>>>
>>>I don’t understand what you want to install in a profile. The ls-R file
>>>is already built during profile generation. See "guix/profiles.scm".
>> What I meant was that we could maybe use a horrible hack like
>> somehow overwriting texmf.cnf or wrapping the engines — anything to
>> avoid rebuilding the world.  But on a second thought, LaTeX should
>> mostly be a build time dependency so that grafting with a version
>> capable of handling both the build environment and being installed
>> should work well, right?  At least until the next TeX Live release.
> Actually, on a third thought, the following cursed approach might
> work: Create a variant `texlive-libkpathsea/ls-R-authoritative' of
> `texlive-libkpathsea' with the only difference being !! in front of
> $TEXMFDIST in texmf.cnf and register it as a replacement for
> `texlive-libkpathsea'.  That way packages are built with the original,
> ungrafted version but when a user installs TeX Live packages they get
> the version for which the ls-R database is authoritative.
>
> An issue with this would be that ungexp'ing a texlive-* package
> referencing `texlive-libkpathsea' should yield the grafted version so
> the profile hook would probably need to be changed.

I pushed a tentative patch in "tex-team" branch. I’m in the process of
testing it but it could take a while as texlive-collection-latexextra
contains more than 1k packages.

Feedback welcome.

Regards,
-- 
Nicolas Goaziou






Information forwarded to bug-guix <at> gnu.org:
bug#75893; Package guix. (Mon, 10 Feb 2025 21:22:02 GMT) Full text and rfc822 format available.

Message #20 received at 75893 <at> debbugs.gnu.org (full text, mbox):

From: Nicolas Goaziou <mail <at> nicolasgoaziou.fr>
To: vicvbcun <guix <at> ikherbers.com>
Cc: andreas <at> enge.fr, guix <at> nicolasgoaziou.fr, 75893 <at> debbugs.gnu.org
Subject: Re: bug#75893: texlive: kpathsea doesn't use ls-R database
Date: Mon, 10 Feb 2025 22:20:51 +0100
Nicolas Goaziou via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:

> I pushed a tentative patch in "tex-team" branch. I’m in the process of
> testing it but it could take a while as texlive-collection-latexextra
> contains more than 1k packages.

It seems to be better. The "extreme" example in your original post takes
around 6.5 seconds on my machine (that’s still 2.5 times more than your
results but my laptop is old) on the second run. The first run takes
slightly longer because it needs to populate font cache.

I’m going to ask for an inclusion on master branch, but it will not
happen quickly considering the pending queue for merge requests.






This bug report was last modified 21 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.