GNU bug report logs - #24937
"deleting unused links" GC phase is too slow

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guix; Severity: important; Reported by: ludo@HIDDEN (Ludovic Courtès); dated Sun, 13 Nov 2016 17:42:02 UTC; Maintainer for guix is bug-guix@HIDDEN.
Severity set to 'important' from 'normal' Request was from ludo@HIDDEN (Ludovic Courtès) to control <at> debbugs.gnu.org. Full text available.

Message received at 24937 <at> debbugs.gnu.org:


Received: (at 24937) by debbugs.gnu.org; 9 Dec 2016 23:25:16 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Fri Dec 09 18:25:15 2016
Received: from localhost ([127.0.0.1]:36355 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1cFUXT-0006WO-EL
	for submit <at> debbugs.gnu.org; Fri, 09 Dec 2016 18:25:15 -0500
Received: from eggs.gnu.org ([208.118.235.92]:59640)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1cFU8f-0004Ph-OW
 for 24937 <at> debbugs.gnu.org; Fri, 09 Dec 2016 17:59:38 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <ludo@HIDDEN>) id 1cFTto-00086V-8V
 for 24937 <at> debbugs.gnu.org; Fri, 09 Dec 2016 17:44:21 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_50,RP_MATCHES_RCVD
 autolearn=disabled version=3.3.2
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:47471)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <ludo@HIDDEN>)
 id 1cFTtX-00080r-SE; Fri, 09 Dec 2016 17:43:59 -0500
Received: from reverse-83.fdn.fr ([80.67.176.83]:37800 helo=pluto)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>)
 id 1cFTtX-0005xR-3D; Fri, 09 Dec 2016 17:43:59 -0500
From: ludo@HIDDEN (Ludovic =?utf-8?Q?Court=C3=A8s?=)
To: 24937 <at> debbugs.gnu.org
Subject: Re: bug#24937: "deleting unused links" GC phase is too slow
References: <87wpg7ffbm.fsf@HIDDEN>
Date: Fri, 09 Dec 2016 23:43:57 +0100
In-Reply-To: <87wpg7ffbm.fsf@HIDDEN> ("Ludovic
 \=\?utf-8\?Q\?Court\=C3\=A8s\=22'\?\=
 \=\?utf-8\?Q\?s\?\= message of "Sun, 13 Nov 2016 18:41:01 +0100")
Message-ID: <87wpf867v6.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-Spam-Score: -8.0 (--------)
X-Debbugs-Envelope-To: 24937
Cc: Mark H Weaver <mhw@HIDDEN>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -8.0 (--------)

ludo@HIDDEN (Ludovic Court=C3=A8s) skribis:

> =E2=80=98LocalStore::removeUnusedLinks=E2=80=99 traverses all the entries=
 in
> /gnu/store/.links and calls lstat(2) on each one of them and checks
> =E2=80=98st_nlink=E2=80=99 to determine whether they can be deleted.
>
> There are two problems: lstat(2) can be slow on spinning disks as found
> on hydra.gnu.org, and the algorithm is proportional in the number of
> entries in /gnu/store/.links, which is a lot on hydra.gnu.org.

On Dec. 2 on guix-sysadmin@HIDDEN, Mark described an improvement that
noticeably improved performance:

  The idea is to read the entire /gnu/store/.links directory, sort the
  entries by inode number, and then iterate over the entries by inode
  number, calling 'lstat' on each one and deleting the ones with a link
  count of 1.

  The reason this is so much faster is because the inodes are stored on
  disk in order of inode number, so this leads to a sequential access
  pattern on disk instead of a random access pattern.

  The difficulty is that the directory is too large to comfortably store
  all of the entries in virtual memory.  Instead, the entries should be
  written to temporary files on disk, and then sorted using merge sort to
  ensure sequential access patterns during sorting.  Fortunately, this is
  exactly what 'sort' does from GNU coreutils.

  So, for now, I've implemented this as a pair of small C programs that is
  used in a pipeline with GNU sort.  The first program simply reads a
  directory and writes lines of the form "<inode> <name>" to stdout.
  (Unfortunately, "ls -i" calls stat on each entry, so it can't be used).
  This is piped through 'sort -n' and then into another small C program
  that reads these lines, calls 'lstat' on each one, and deletes the
  non-directories with link count 1.

Regarding memory usage, I replied:

  Really?

  For each entry, we have to store roughly 70 bytes for the file name (or
  52 if we consider only the basename), plus 8 bytes for the inode number;
  let=E2=80=99s say 64 bytes.

  If we have 10 M entries, that=E2=80=99s 700 MB (or 520 MB), which is a lo=
t, but
  maybe acceptable?

  At worst, we may still see an improvement if we proceed by batches: we
  read 10000 directory entries (7 MB), sort them, and stat them, then read
  the next 10000 entries.  WDYT?

Ludo=E2=80=99.




Information forwarded to bug-guix@HIDDEN:
bug#24937; Package guix. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 13 Nov 2016 17:41:17 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Nov 13 12:41:16 2016
Received: from localhost ([127.0.0.1]:56055 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1c5ymK-0001i9-NV
	for submit <at> debbugs.gnu.org; Sun, 13 Nov 2016 12:41:16 -0500
Received: from eggs.gnu.org ([208.118.235.92]:36575)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <ludo@HIDDEN>) id 1c5ymI-0001hv-Mb
 for submit <at> debbugs.gnu.org; Sun, 13 Nov 2016 12:41:15 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <ludo@HIDDEN>) id 1c5ymC-0007TC-Lz
 for submit <at> debbugs.gnu.org; Sun, 13 Nov 2016 12:41:09 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_20,RP_MATCHES_RCVD
 autolearn=disabled version=3.3.2
Received: from lists.gnu.org ([2001:4830:134:3::11]:45210)
 by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_256_CBC_SHA1:32)
 (Exim 4.71) (envelope-from <ludo@HIDDEN>) id 1c5ymC-0007T6-IY
 for submit <at> debbugs.gnu.org; Sun, 13 Nov 2016 12:41:08 -0500
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55776)
 by lists.gnu.org with esmtp (Exim 4.71)
 (envelope-from <ludo@HIDDEN>) id 1c5ymB-0000gB-HC
 for bug-guix@HIDDEN; Sun, 13 Nov 2016 12:41:08 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
 (envelope-from <ludo@HIDDEN>) id 1c5ym8-0007Q6-EK
 for bug-guix@HIDDEN; Sun, 13 Nov 2016 12:41:07 -0500
Received: from fencepost.gnu.org ([2001:4830:134:3::e]:50989)
 by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <ludo@HIDDEN>)
 id 1c5ym8-0007Q2-BK
 for bug-guix@HIDDEN; Sun, 13 Nov 2016 12:41:04 -0500
Received: from reverse-83.fdn.fr ([80.67.176.83]:50852 helo=pluto)
 by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
 (Exim 4.82) (envelope-from <ludo@HIDDEN>) id 1c5ym7-000132-Mq
 for bug-guix@HIDDEN; Sun, 13 Nov 2016 12:41:04 -0500
From: ludo@HIDDEN (Ludovic =?utf-8?Q?Court=C3=A8s?=)
To: bug-guix@HIDDEN
Subject: "deleting unused links" GC phase is too slow
X-URL: http://www.fdn.fr/~lcourtes/
X-Revolutionary-Date: 23 Brumaire an 225 de la =?utf-8?Q?R=C3=A9volution?=
X-PGP-Key-ID: 0x090B11993D9AEBB5
X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc
X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4  0CFB 090B 1199 3D9A EBB5
X-OS: x86_64-unknown-linux-gnu
Date: Sun, 13 Nov 2016 18:41:01 +0100
Message-ID: <87wpg7ffbm.fsf@HIDDEN>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6.x
X-Received-From: 2001:4830:134:3::11
X-Spam-Score: -7.8 (-------)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -7.8 (-------)

=E2=80=98LocalStore::removeUnusedLinks=E2=80=99 traverses all the entries in
/gnu/store/.links and calls lstat(2) on each one of them and checks
=E2=80=98st_nlink=E2=80=99 to determine whether they can be deleted.

There are two problems: lstat(2) can be slow on spinning disks as found
on hydra.gnu.org, and the algorithm is proportional in the number of
entries in /gnu/store/.links, which is a lot on hydra.gnu.org.

Ludo=E2=80=99.




Acknowledgement sent to ludo@HIDDEN (Ludovic Courtès):
New bug report received and forwarded. Copy sent to bug-guix@HIDDEN. Full text available.
Report forwarded to bug-guix@HIDDEN:
bug#24937; Package guix. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Fri, 9 Dec 2016 23:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.