GNU bug report logs - #79492
Shepherd catastrophic memory leak

Previous Next

Package: guix;

Reported by: "Zack Weinberg" <zack <at> owlfolio.org>

Date: Mon, 22 Sep 2025 19:24:02 UTC

Severity: normal

To reply to this bug, email your comments to 79492 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Mon, 22 Sep 2025 19:24:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to "Zack Weinberg" <zack <at> owlfolio.org>:
New bug report received and forwarded. Copy sent to bug-guix <at> gnu.org. (Mon, 22 Sep 2025 19:24:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Zack Weinberg" <zack <at> owlfolio.org>
To: bug-guix <at> gnu.org
Subject: Shepherd catastrophic memory leak
Date: Mon, 22 Sep 2025 15:23:05 -0400
I left my Guix System-based web server running for 26 days and PID 1 has
ballooned to consume 75% of all available RAM.  Because of this, it can
no longer fork. Which, in turn, means the system is almost but not quite
dead in the water.  Daemons that are already running, such as the actual
web server, are fine, but any transient service -- like ssh -- won't
start.  I could log in on the console, because getty was already
running, but `reboot` just hangs, and if I log out I expect it won't be
able to start another getty process.

Here is some relevant troubleshooting info:

# uptime
19:08:57  up 26 days 20:17,  1 user,  load average: 0.01, 0.02, 0.00

# free
               total        used        free      shared  buff/cache   available
Mem:         2020468     1768960      103008        6472      307064      251508
Swap:        2094056      168268     1925788

# ps -p 1 lc
F   UID     PID    PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
4     0       1       0  20   0 1988980 1528612 do_epo Sl ?       175:14 shepher

# grep -v MARK messages
2025-09-14 22:00:48 localhost shepherd[1]: Rotating '/var/log/messages' to '/var/log/messages.1'.
2025-09-14 22:00:48 localhost linux: [1638517.256304] __vm_enough_memory: pid: 1, comm: shepherd, bytes: 8388608 not enough memory for the allocation
2025-09-14 22:00:48 localhost shepherd[1]: Exception caught while calling action of timer 'log-rotation': (system-error "primitive-fork" "~A" ("Cannot allocate memory") (12))
2025-09-22 19:06:33 localhost shepherd[1]: Stopping service root...
2025-09-22 19:06:33 localhost shepherd[1]: Exiting shepherd...
2025-09-22 19:06:33 localhost shepherd[1]: Service guix-ownership is not running.
2025-09-22 19:06:33 localhost shepherd[1]: Service user-homes is not running.
2025-09-22 19:06:33 localhost shepherd[1]: Stopping service swap-7cb6821e-5fbb-48b1-85f8-74b4c41e9b7f...
2025-09-22 19:06:33 localhost linux: [2319321.058327] __vm_enough_memory: pid: 1, comm: shepherd, bytes: 2144313344 not enough memory for the allocation
2025-09-22 19:06:33 localhost shepherd[1]: Ignoring error while stopping swap-7cb6821e-5fbb-48b1-85f8-74b4c41e9b7f: (system-error "swapoff" "~S: ~A" ("/dev/vda2" "Cannot allocate memory") (12))
2025-09-22 19:06:33 localhost shepherd[1]: Service swap-7cb6821e-5fbb-48b1-85f8-74b4c41e9b7f might have failed to stop.
2025-09-22 19:06:33 localhost shepherd[1]: Service swap-7cb6821e-5fbb-48b1-85f8-74b4c41e9b7f is now stopped.
2025-09-22 19:06:34 localhost shepherd[1]: Stopping service ntpd...
2025-09-22 19:06:34 localhost ntpd[134]: ntpd exiting on signal 15 (Terminated)
2025-09-22 19:06:34 localhost shepherd[1]: Service ntpd stopped.
2025-09-22 19:06:34 localhost shepherd[1]: Service ntpd is now stopped.
2025-09-22 19:06:34 localhost shepherd[1]: Stopping service ssh-daemon...
2025-09-22 19:06:34 localhost shepherd[1]: Service ssh-daemon stopped.
2025-09-22 19:06:34 localhost shepherd[1]: Service ssh-daemon is now stopped.
2025-09-22 19:06:34 localhost shepherd[1]: Stopping service certbot-certificate-renewal...

--

Closely related issue: For situations just such as this, reboot(8) is
supposed to have an option (conventionally `-f/--force`) which causes it
to issue the reboot system call itself, bypassing init.  But the
Shepherd's version of reboot is missing this option.

--

I was already pretty frustrated with Guix System and this memory leak is
the last straw.  This server is shortly going to be reformatted with
another distribution.  However, I will preserve a disk image in case it
is useful to anyone.

zw




Information forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Mon, 22 Sep 2025 23:49:02 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Dr. Arne Babenhauserheide" <arne_bab <at> web.de>
To: "Zack Weinberg" via Bug reports for GNU Guix <bug-guix <at> gnu.org>
Cc: Zack Weinberg <zack <at> owlfolio.org>, 79492 <at> debbugs.gnu.org
Subject: Re: bug#79492: Shepherd catastrophic memory leak
Date: Tue, 23 Sep 2025 01:48:32 +0200
[Message part 1 (text/plain, inline)]
"Zack Weinberg" via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:

> I left my Guix System-based web server running for 26 days and PID 1 has
> ballooned to consume 75% of all available RAM.  Because of this, it can
> no longer fork. Which, in turn, means the system is almost but not
> quite
…
> # free
>                total        used        free      shared  buff/cache   available
> Mem:         2020468     1768960      103008        6472      307064      251508
> Swap:        2094056      168268     1925788


You still have almost all swap free, so you should be able to start
programs (though slowly).

What I found, though, is that SSH can get into trouble when cgroups run
out (which happens quickly if you make heavy use of docker).

I regularly delete the unused cgroups then:

find /sys/fs/cgroup/ -depth -type d -name 'c*' | xargs -I {} sudo bash -c 'if test "$(cat {}/pids.current)" -eq 0; then echo {}; cat {}/pids.current; rmdir {}; fi'

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Mon, 22 Sep 2025 23:49:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Tue, 23 Sep 2025 06:45:02 GMT) Full text and rfc822 format available.

Message #14 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: "Zack Weinberg" via Bug reports for GNU Guix <bug-guix <at> gnu.org>
Cc: Zack Weinberg <zack <at> owlfolio.org>, 79492 <at> debbugs.gnu.org
Subject: Re: bug#79492: Shepherd catastrophic memory leak
Date: Tue, 23 Sep 2025 08:41:46 +0200
Hi Zack,

"Zack Weinberg" via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:

> I left my Guix System-based web server running for 26 days and PID 1 has
> ballooned to consume 75% of all available RAM.  Because of this, it can
> no longer fork.

This is being tracked at
<https://codeberg.org/shepherd/shepherd/issues/1>.

It would seem a workaround is to use Inetutils syslogd instead of the
built-in ‘system-log’:

--8<---------------cut here---------------start------------->8---
(operating-system
  ;; …
  (services (append (list …
                          (service syslog-service-type))
                    (modify-services %base-services
                      (delete shepherd-system-log-service-type)))))
--8<---------------cut here---------------end--------------->8---

Ludo’.




Information forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Tue, 23 Sep 2025 06:45:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Tue, 23 Sep 2025 07:55:03 GMT) Full text and rfc822 format available.

Message #20 received at submit <at> debbugs.gnu.org (full text, mbox):

From: "Dr. Arne Babenhauserheide" <arne_bab <at> web.de>
To: "Zack Weinberg" via Bug reports for GNU Guix <bug-guix <at> gnu.org>
Cc: Zack Weinberg <zack <at> owlfolio.org>, 79492 <at> debbugs.gnu.org
Subject: Re: bug#79492: Shepherd catastrophic memory leak
Date: Tue, 23 Sep 2025 09:54:30 +0200
[Message part 1 (text/plain, inline)]
"Dr. Arne Babenhauserheide" <arne_bab <at> web.de> writes:

> "Zack Weinberg" via Bug reports for GNU Guix <bug-guix <at> gnu.org> writes:
>> I left my Guix System-based web server running for 26 days and PID 1 has
>> ballooned to consume 75% of all available RAM.  Because of this, it can
>> no longer fork. Which, in turn, means the system is almost but not

> You still have almost all swap free, so you should be able to start

I have to take back this comment: didn’t read closely enough.
(forking copies allocated memory, so 75% mem usage kills fork)

I’m sorry for the noise.

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Tue, 23 Sep 2025 07:56:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guix <at> gnu.org:
bug#79492; Package guix. (Fri, 26 Sep 2025 22:20:01 GMT) Full text and rfc822 format available.

Message #26 received at 79492 <at> debbugs.gnu.org (full text, mbox):

From: Tomas Volf <~@wolfsden.cz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: zack <at> owlfolio.org, 79492 <at> debbugs.gnu.org
Subject: Re: bug#79492: Shepherd catastrophic memory leak
Date: Sat, 27 Sep 2025 00:19:26 +0200
Hi,

Ludovic Courtès <ludo <at> gnu.org> writes:

> It would seem a workaround is to use Inetutils syslogd instead of the
> built-in ‘system-log’:
>
> (operating-system
>   ;; …
>   (services (append (list …
>                           (service syslog-service-type))
>                     (modify-services %base-services
>                       (delete shepherd-system-log-service-type)))))

Thank you for the suggestion.  I will give that a try.  I would probably
be a good idea to mention it on the Codeberg issue as well.

Have a nice day,
Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.




This bug report was last modified 39 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.