GNU bug report logs -
#22608
Module system thread unsafety and .go compilation
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 22608 in the body.
You can then email your comments to 22608 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-guix <at> gnu.org
:
bug#22608
; Package
guix
.
(Tue, 09 Feb 2016 20:03:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
:
New bug report received and forwarded. Copy sent to
bug-guix <at> gnu.org
.
(Tue, 09 Feb 2016 20:03:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
To speed up the compilation of the many Scheme files in Guix, we use a
script that first loads all modules to be compiled into the Guile
process (by calling 'resolve-interface' on the module names), and then
the corresponding Scheme files are compiled in a par-for-each.
While Guile's module system is known to be thread unsafe, the idea was
that all mutation should happen in the serial loading phase, and the
parallel compile-file calls should then be thread safe.
Sadly that assumption isn't met when autoloads are involved.
Minimal-ish test-case:
- Check out 0889321.
- Build it.
- Edit gnu/build/activation.scm and gnu/build/linux-boot.scm to contain
merely the following expressions, respectively:
(define-module (gnu build activation)
#:use-module (gnu build linux-boot))
(define-module (gnu build linux-boot)
#:autoload (system base compile) (compile-file))
- Run make again.
If you're on a multi-core system, you will probably get an error saying
something weird like "no such language scheme".
Note: when you then run make *again* it succeeds.
Solution proposals:
1. s/par-for-each/for-each/. Will make compilation slower on multi-core
machines. We would do the same for guix pull, which is a bit sad
because it's so fast right now. Very simple solution though.
2. We find out some partitioning of the Scheme modules such that there
is minimal overlap in total loaded modules when the modules in one
subset are each loaded by one Guile process. Then each Guile process
loads & compiles the modules in its given subset serially, but these
Guile processes run in parallel. This could speed things up even
more than now because the module-loading phases of the processes
would be parallel too. It also has the side-effect that less memory
is consumed the fewer cores you have (because less Scheme modules
loaded into memory at once). If someone (Ludo?) has a good general
overview of Guix's module graph then maybe they can come up with a
sensible partitioning of the modules, say into 4 subsets (maxing out
benefits at quad-core), such that loading all modules in one subset
loads a minimal amount of modules that are outside that subset. That
should be the only challenging part of this solution.
3. We do nothing for now since this bug triggers rarely, and can be
worked around by simply re-running make. (We just have to hope that
it doesn't trigger on guix pull or on clean builds after some commit;
there's no "just rerun make" in guix pull or an automated build of
Guix.) AFAIU Wingo expressed motivation to make Guile's module
system thread safe, so this problem would then truly disappear.
I think #2 is a pretty good solution. The only thing worrying me is
that we might not be able to sensibly partition the Scheme modules
according to any simple logic that can be automated (like guix/ is one
subset, gnu/packages/ is another, etc.). Maintaining the subsets
manually in the Makefile would be pretty ugly. But maybe some simple
logic, possibly combined with few special-cases in the code, would be
good enough.
Thoughts?
Taylan
Information forwarded
to
bug-guix <at> gnu.org
:
bug#22608
; Package
guix
.
(Wed, 10 Feb 2016 13:51:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 22608 <at> debbugs.gnu.org (full text, mbox):
taylanbayirli <at> gmail.com (Taylan Ulrich "Bayırlı/Kammer") skribis:
> Sadly that assumption isn't met when autoloads are involved.
> Minimal-ish test-case:
>
> - Check out 0889321.
>
> - Build it.
>
> - Edit gnu/build/activation.scm and gnu/build/linux-boot.scm to contain
> merely the following expressions, respectively:
>
> (define-module (gnu build activation)
> #:use-module (gnu build linux-boot))
>
> (define-module (gnu build linux-boot)
> #:autoload (system base compile) (compile-file))
>
> - Run make again.
>
> If you're on a multi-core system, you will probably get an error saying
> something weird like "no such language scheme".
Do you have a clear explanation of why this happens? I would expect
(system base compile) to already be loaded for instance, so it’s not
clear to me what’s going on. Or is it just the mutation of (gnu build
linux-boot) that’s causing problems?
> Solution proposals:
>
> 1. s/par-for-each/for-each/. Will make compilation slower on multi-core
> machines. We would do the same for guix pull, which is a bit sad
> because it's so fast right now. Very simple solution though.
>
> 2. We find out some partitioning of the Scheme modules such that there
> is minimal overlap in total loaded modules when the modules in one
> subset are each loaded by one Guile process. Then each Guile process
> loads & compiles the modules in its given subset serially, but these
> Guile processes run in parallel. This could speed things up even
> more than now because the module-loading phases of the processes
> would be parallel too. It also has the side-effect that less memory
> is consumed the fewer cores you have (because less Scheme modules
> loaded into memory at once). If someone (Ludo?) has a good general
> overview of Guix's module graph then maybe they can come up with a
> sensible partitioning of the modules, say into 4 subsets (maxing out
> benefits at quad-core), such that loading all modules in one subset
> loads a minimal amount of modules that are outside that subset. That
> should be the only challenging part of this solution.
>
> 3. We do nothing for now since this bug triggers rarely, and can be
> worked around by simply re-running make. (We just have to hope that
> it doesn't trigger on guix pull or on clean builds after some commit;
> there's no "just rerun make" in guix pull or an automated build of
> Guix.) AFAIU Wingo expressed motivation to make Guile's module
> system thread safe, so this problem would then truly disappear.
Short-term, I’d do #1 or #3; probably #1 though, because random failures
are no fun, and we know they can happen.
Longer-term, I’m not convinced by #2. I think I would instead build
packages in reverse topological order, probably serially at first, which
would address <http://bugs.gnu.org/15602> (with the caveat that the (gnu
packages …) modules cannot be topologically-sorted, but OTOH they
typically don’t use macros, so we’re fine.)
That would require a tool to extract and the ‘define-module’ forms and
build a graph from there.
But really, we must fix <http://bugs.gnu.org/15602>, an in particular,
‘compile-file’ should not mutate the global module name space. I think
we could do something like:
(define (compile-file* …)
(let ((root the-root-module)
(compile-root (copy-module the-root-module)))
(dynamic-wind
(lambda ()
(set! the-root-module compile-root)
;; ditto with the-scm-module
)
(lambda ()
(compile-file …))
(lambda ()
(set! the-root-module root)
;; …
))))
It’s unclear how costly ‘copy-module’ would be, and the whole strategy
depends on it.
Eventually it seems clear that Guile proper needs to address this use
case, and needs to provide thread-safe modules.
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#22608
; Package
guix
.
(Wed, 10 Feb 2016 13:51:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 22608 <at> debbugs.gnu.org (full text, mbox):
taylanbayirli <at> gmail.com (Taylan Ulrich "Bayırlı/Kammer") skribis:
> Sadly that assumption isn't met when autoloads are involved.
> Minimal-ish test-case:
>
> - Check out 0889321.
>
> - Build it.
>
> - Edit gnu/build/activation.scm and gnu/build/linux-boot.scm to contain
> merely the following expressions, respectively:
>
> (define-module (gnu build activation)
> #:use-module (gnu build linux-boot))
>
> (define-module (gnu build linux-boot)
> #:autoload (system base compile) (compile-file))
>
> - Run make again.
>
> If you're on a multi-core system, you will probably get an error saying
> something weird like "no such language scheme".
Do you have a clear explanation of why this happens? I would expect
(system base compile) to already be loaded for instance, so it’s not
clear to me what’s going on. Or is it just the mutation of (gnu build
linux-boot) that’s causing problems?
> Solution proposals:
>
> 1. s/par-for-each/for-each/. Will make compilation slower on multi-core
> machines. We would do the same for guix pull, which is a bit sad
> because it's so fast right now. Very simple solution though.
>
> 2. We find out some partitioning of the Scheme modules such that there
> is minimal overlap in total loaded modules when the modules in one
> subset are each loaded by one Guile process. Then each Guile process
> loads & compiles the modules in its given subset serially, but these
> Guile processes run in parallel. This could speed things up even
> more than now because the module-loading phases of the processes
> would be parallel too. It also has the side-effect that less memory
> is consumed the fewer cores you have (because less Scheme modules
> loaded into memory at once). If someone (Ludo?) has a good general
> overview of Guix's module graph then maybe they can come up with a
> sensible partitioning of the modules, say into 4 subsets (maxing out
> benefits at quad-core), such that loading all modules in one subset
> loads a minimal amount of modules that are outside that subset. That
> should be the only challenging part of this solution.
>
> 3. We do nothing for now since this bug triggers rarely, and can be
> worked around by simply re-running make. (We just have to hope that
> it doesn't trigger on guix pull or on clean builds after some commit;
> there's no "just rerun make" in guix pull or an automated build of
> Guix.) AFAIU Wingo expressed motivation to make Guile's module
> system thread safe, so this problem would then truly disappear.
Short-term, I’d do #1 or #3; probably #1 though, because random failures
are no fun, and we know they can happen.
Longer-term, I’m not convinced by #2. I think I would instead build
packages in reverse topological order, probably serially at first, which
would address <http://bugs.gnu.org/15602> (with the caveat that the (gnu
packages …) modules cannot be topologically-sorted, but OTOH they
typically don’t use macros, so we’re fine.)
That would require a tool to extract and the ‘define-module’ forms and
build a graph from there.
But really, we must fix <http://bugs.gnu.org/15602>, an in particular,
‘compile-file’ should not mutate the global module name space. I think
we could do something like:
(define (compile-file* …)
(let ((root the-root-module)
(compile-root (copy-module the-root-module)))
(dynamic-wind
(lambda ()
(set! the-root-module compile-root)
;; ditto with the-scm-module
)
(lambda ()
(compile-file …))
(lambda ()
(set! the-root-module root)
;; …
))))
It’s unclear how costly ‘copy-module’ would be, and the whole strategy
depends on it.
Eventually it seems clear that Guile proper needs to address this use
case, and needs to provide thread-safe modules.
Ludo’.
Severity set to 'important' from 'normal'
Request was from
ludo <at> gnu.org (Ludovic Courtès)
to
control <at> debbugs.gnu.org
.
(Tue, 23 Feb 2016 13:27:01 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#22608
; Package
guix
.
(Tue, 03 Jul 2018 22:11:01 GMT)
Full text and
rfc822 format available.
Message #16 received at 22608 <at> debbugs.gnu.org (full text, mbox):
Hello,
taylanbayirli <at> gmail.com (Taylan Ulrich "Bayırlı/Kammer") skribis:
> To speed up the compilation of the many Scheme files in Guix, we use a
> script that first loads all modules to be compiled into the Guile
> process (by calling 'resolve-interface' on the module names), and then
> the corresponding Scheme files are compiled in a par-for-each.
>
> While Guile's module system is known to be thread unsafe, the idea was
> that all mutation should happen in the serial loading phase, and the
> parallel compile-file calls should then be thread safe.
>
> Sadly that assumption isn't met when autoloads are involved.
For the record, these issues should be fixed in Guile 2.2.4:
533e3ff17 * Serialize accesses to submodule hash tables.
46bcbfa56 * Module import obarrays are accessed in a critical section.
761cf0fb8 * Make module autoloading thread-safe.
‘guix pull’ now defaults to 2.2.4, so we’ll see if indeed those crashes
disappear.
Ludo’.
Information forwarded
to
bug-guix <at> gnu.org
:
bug#22608
; Package
guix
.
(Sat, 08 Oct 2022 00:22:02 GMT)
Full text and
rfc822 format available.
Message #19 received at 22608 <at> debbugs.gnu.org (full text, mbox):
Hi,
ludo <at> gnu.org (Ludovic Courtès) writes:
> Hello,
>
> taylanbayirli <at> gmail.com (Taylan Ulrich "Bayırlı/Kammer") skribis:
>
>> To speed up the compilation of the many Scheme files in Guix, we use a
>> script that first loads all modules to be compiled into the Guile
>> process (by calling 'resolve-interface' on the module names), and then
>> the corresponding Scheme files are compiled in a par-for-each.
>>
>> While Guile's module system is known to be thread unsafe, the idea was
>> that all mutation should happen in the serial loading phase, and the
>> parallel compile-file calls should then be thread safe.
>>
>> Sadly that assumption isn't met when autoloads are involved.
>
> For the record, these issues should be fixed in Guile 2.2.4:
>
> 533e3ff17 * Serialize accesses to submodule hash tables.
> 46bcbfa56 * Module import obarrays are accessed in a critical section.
> 761cf0fb8 * Make module autoloading thread-safe.
>
> ‘guix pull’ now defaults to 2.2.4, so we’ll see if indeed those crashes
> disappear.
I think we haven't seen these in the last 4 years! We still have
references to https://bugs.gnu.org/15602 in our code base though;
although the upstream issue appears to have been fixed. Could we remove
the workarounds now?
--
Thanks,
Maxim
Information forwarded
to
bug-guix <at> gnu.org
:
bug#22608
; Package
guix
.
(Mon, 10 Oct 2022 08:09:01 GMT)
Full text and
rfc822 format available.
Message #22 received at 22608 <at> debbugs.gnu.org (full text, mbox):
Hi!
Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:
> ludo <at> gnu.org (Ludovic Courtès) writes:
[...]
>> For the record, these issues should be fixed in Guile 2.2.4:
>>
>> 533e3ff17 * Serialize accesses to submodule hash tables.
>> 46bcbfa56 * Module import obarrays are accessed in a critical section.
>> 761cf0fb8 * Make module autoloading thread-safe.
>>
>> ‘guix pull’ now defaults to 2.2.4, so we’ll see if indeed those crashes
>> disappear.
>
> I think we haven't seen these in the last 4 years! We still have
> references to https://bugs.gnu.org/15602 in our code base though;
> although the upstream issue appears to have been fixed. Could we remove
> the workarounds now?
The module thread-safety issue discussed here appears to be done.
However the workarounds for <https://bugs.gnu.org/15602> must remain:
that specific issue is still there.
Thanks,
Ludo’.
Reply sent
to
Maxim Cournoyer <maxim.cournoyer <at> gmail.com>
:
You have taken responsibility.
(Wed, 12 Oct 2022 14:25:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
taylanbayirli <at> gmail.com (Taylan Ulrich Bayırlı/Kammer)
:
bug acknowledged by developer.
(Wed, 12 Oct 2022 14:25:03 GMT)
Full text and
rfc822 format available.
Message #27 received at 22608-done <at> debbugs.gnu.org (full text, mbox):
Hi,
Ludovic Courtès <ludo <at> gnu.org> writes:
> Hi!
>
> Maxim Cournoyer <maxim.cournoyer <at> gmail.com> skribis:
>
>> ludo <at> gnu.org (Ludovic Courtès) writes:
>
> [...]
>
>>> For the record, these issues should be fixed in Guile 2.2.4:
>>>
>>> 533e3ff17 * Serialize accesses to submodule hash tables.
>>> 46bcbfa56 * Module import obarrays are accessed in a critical section.
>>> 761cf0fb8 * Make module autoloading thread-safe.
>>>
>>> ‘guix pull’ now defaults to 2.2.4, so we’ll see if indeed those crashes
>>> disappear.
>>
>> I think we haven't seen these in the last 4 years! We still have
>> references to https://bugs.gnu.org/15602 in our code base though;
>> although the upstream issue appears to have been fixed. Could we remove
>> the workarounds now?
>
> The module thread-safety issue discussed here appears to be done.
Alright, I'm closing this one then.
> However the workarounds for <https://bugs.gnu.org/15602> must remain:
> that specific issue is still there.
Thanks for the heads-up!
--
Maxim
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Thu, 10 Nov 2022 12:24:04 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 239 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.