GNU bug report logs - #7399
du: add "--hash-all-files" option

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Severity: wishlist; Reported by: Evgeny Kapun <abacabadabacaba@HIDDEN>; dated Sun, 14 Nov 2010 14:00:04 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.
Changed bug title to 'du: add "--hash-all-files" option' from 'du may count one file multiple times if it visible through multiple mounts' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.
Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon@HIDDEN> to control <at> debbugs.gnu.org. Full text available.

Message received at 7399 <at> debbugs.gnu.org:


Received: (at 7399) by debbugs.gnu.org; 15 Nov 2010 08:01:13 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Nov 15 03:01:13 2010
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1PHu00-0005C3-Ty
	for submit <at> debbugs.gnu.org; Mon, 15 Nov 2010 03:01:13 -0500
Received: from mx.meyering.net ([82.230.74.64])
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <jim@HIDDEN>) id 1PHtzy-0005By-O2
	for 7399 <at> debbugs.gnu.org; Mon, 15 Nov 2010 03:01:11 -0500
Received: by rho.meyering.net (Acme Bit-Twister, from userid 1000)
	id 5A21C6003A; Mon, 15 Nov 2010 09:06:05 +0100 (CET)
From: Jim Meyering <jim@HIDDEN>
To: Paul Eggert <eggert@HIDDEN>
Subject: Re: bug#7399: du may count one file multiple times if it visible
	through multiple mounts
In-Reply-To: <4CE0DE5A.8090409@HIDDEN> (Paul Eggert's message of "Sun, 14
	Nov 2010 23:16:42 -0800")
References: <4CDFE134.3000208@HIDDEN> <4CE0DE5A.8090409@HIDDEN>
Date: Mon, 15 Nov 2010 09:06:05 +0100
Message-ID: <87zktbypsi.fsf@HIDDEN>
Lines: 27
MIME-Version: 1.0
Content-Type: text/plain
X-Spam-Score: -5.6 (-----)
X-Debbugs-Envelope-To: 7399
Cc: Evgeny Kapun <abacabadabacaba@HIDDEN>, 7399 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -5.6 (-----)

Paul Eggert wrote:

> On 11/14/2010 05:16 AM, Evgeny Kapun wrote:
>> Some kernels, such as Linux, permit mounting one filesystem multiple
>> times. This can make multiple paths refer to the same file, although
>> neither hard nor symbolic links are involved.
>
> GNU du (as well as a lot of other programs I expect) doesn't work well
> in such environments, which do not conform to POSIX requirements for
> file system link counts.  GNU du could easily be fixed to handle these
> environments, but at a substantial runtime cost in the normal case,
> because it'd have to hash every file it runs across, not just files
> with link counts > 1 or that result from multiple arguments.
>
> One possible workaround is to add an option, --hash-all-files say, which causes
> du to hash every file it runs across, and thus not double-count files
> in such cases.

du.c already has an internal hash_all variable, and it so happens you
can set it by using du's --files0-from= option.  This should do the trick:

  find dir -print0 | du --files0-from=-

Obviously that's a bit of a kludge.
We shouldn't require a separate find process (and disabling
du's internal traversal code) just to turn this on, so adding
that option might make sense.




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:
bug#7399; Package coreutils. Full text available.

Message received at 7399 <at> debbugs.gnu.org:


Received: (at 7399) by debbugs.gnu.org; 15 Nov 2010 07:11:49 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Mon Nov 15 02:11:49 2010
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1PHtED-0004rH-MF
	for submit <at> debbugs.gnu.org; Mon, 15 Nov 2010 02:11:49 -0500
Received: from smtp.cs.ucla.edu ([131.179.128.62])
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <eggert@HIDDEN>) id 1PHtEB-0004rC-Il
	for 7399 <at> debbugs.gnu.org; Mon, 15 Nov 2010 02:11:48 -0500
Received: from localhost (localhost.localdomain [127.0.0.1])
	by smtp.cs.ucla.edu (Postfix) with ESMTP id DBD3939E8305;
	Sun, 14 Nov 2010 23:16:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1])
	by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024)
	with ESMTP id jzi9xOtZodbe; Sun, 14 Nov 2010 23:16:42 -0800 (PST)
Received: from [192.168.1.10] (pool-71-189-109-235.lsanca.fios.verizon.net
	[71.189.109.235])
	by smtp.cs.ucla.edu (Postfix) with ESMTPSA id 6C05D39E82E9;
	Sun, 14 Nov 2010 23:16:42 -0800 (PST)
Message-ID: <4CE0DE5A.8090409@HIDDEN>
Date: Sun, 14 Nov 2010 23:16:42 -0800
From: Paul Eggert <eggert@HIDDEN>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
MIME-Version: 1.0
To: Evgeny Kapun <abacabadabacaba@HIDDEN>
Subject: Re: bug#7399: du may count one file multiple times if it visible
	through multiple mounts
References: <4CDFE134.3000208@HIDDEN>
In-Reply-To: <4CDFE134.3000208@HIDDEN>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Score: -2.9 (--)
X-Debbugs-Envelope-To: 7399
Cc: 7399 <at> debbugs.gnu.org
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.9 (--)

On 11/14/2010 05:16 AM, Evgeny Kapun wrote:
> Some kernels, such as Linux, permit mounting one filesystem multiple
> times. This can make multiple paths refer to the same file, although
> neither hard nor symbolic links are involved.

GNU du (as well as a lot of other programs I expect) doesn't work well
in such environments, which do not conform to POSIX requirements for
file system link counts.  GNU du could easily be fixed to handle these
environments, but at a substantial runtime cost in the normal case,
because it'd have to hash every file it runs across, not just files
with link counts > 1 or that result from multiple arguments.

One possible workaround is to add an option, --hash-all-files say, which causes
du to hash every file it runs across, and thus not double-count files
in such cases.

(Another possible workaround is to tell users "don't do that".  :-)




Information forwarded to owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:
bug#7399; Package coreutils. Full text available.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 14 Nov 2010 13:59:19 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Nov 14 08:59:19 2010
Received: from localhost ([127.0.0.1] helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1PHd71-0005Vt-FX
	for submit <at> debbugs.gnu.org; Sun, 14 Nov 2010 08:59:19 -0500
Received: from eggs.gnu.org ([140.186.70.92])
	by debbugs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <abacabadabacaba@HIDDEN>) id 1PHcNc-0005CO-Fv
	for submit <at> debbugs.gnu.org; Sun, 14 Nov 2010 08:12:24 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <abacabadabacaba@HIDDEN>) id 1PHcSE-0003Mq-Ui
	for submit <at> debbugs.gnu.org; Sun, 14 Nov 2010 08:17:18 -0500
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on eggs.gnu.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.5 required=5.0 tests=BAYES_00,FREEMAIL_FROM,
	FROM_LOCAL_HEX, RCVD_IN_BL_SPAMCOP_NET, RCVD_IN_DNSWL_NONE,
	T_DKIM_INVALID, 
	T_TO_NO_BRKTS_FREEMAIL autolearn=no version=3.3.1
Received: from lists.gnu.org ([199.232.76.165]:59736)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <abacabadabacaba@HIDDEN>) id 1PHcSE-0003Mm-Sb
	for submit <at> debbugs.gnu.org; Sun, 14 Nov 2010 08:17:10 -0500
Received: from [140.186.70.92] (port=53283 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PHcS6-0004gO-EP
	for bug-coreutils@HIDDEN; Sun, 14 Nov 2010 08:17:10 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <abacabadabacaba@HIDDEN>) id 1PHcRy-0003Iw-9h
	for bug-coreutils@HIDDEN; Sun, 14 Nov 2010 08:17:02 -0500
Received: from mail-fx0-f41.google.com ([209.85.161.41]:39631)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <abacabadabacaba@HIDDEN>) id 1PHcRy-0003IZ-4m
	for bug-coreutils@HIDDEN; Sun, 14 Nov 2010 08:16:54 -0500
Received: by fxm20 with SMTP id 20so3409014fxm.0
	for <bug-coreutils@HIDDEN>; Sun, 14 Nov 2010 05:16:52 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma;
	h=domainkey-signature:received:received:message-id:date:from
	:user-agent:mime-version:to:subject:x-enigmail-version:content-type
	:content-transfer-encoding;
	bh=D45bHbGt8F4Qkvvp+xrlCGAYWSLf6w0sXLH4khU21To=;
	b=px6ooZScXa/xZguPzwMp+kFsiIPzEL4+AQJY6alucK0EaUONW9td+oBsOhMgAnPAbD
	0IRBwTR+qGb18SHfuPxZPqz3qHcehmHZLJf/lT5ie8tiymUlF5nfHcj1k4FcbELbGPGA
	QxTFozundKClfFGQpnFLU3FGHuSqn2z/v74AY=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma;
	h=message-id:date:from:user-agent:mime-version:to:subject
	:x-enigmail-version:content-type:content-transfer-encoding;
	b=cx0hMHWIayVsNyUrIdQ/nneMIU4oGTmpL0OAElRuO+NsCQaH5PmGqzMGlbdD84P4Ld
	AkUXkRvsttR/Q10HgVqBuV2VadYdHHr470GynbS0EoKYGM+DdoOdwh3Jm5NcHxUhNXOA
	kmBJfRJY6RObOS5tlBz+9/x1xUcLahCCRMkkw=
Received: by 10.223.106.210 with SMTP id y18mr3545237fao.108.1289740612860;
	Sun, 14 Nov 2010 05:16:52 -0800 (PST)
Received: from [10.211.230.65] (ip-83-149-3-28.nwgsm.ru [83.149.3.28])
	by mx.google.com with ESMTPS id e17sm567645fak.34.2010.11.14.05.16.51
	(version=SSLv3 cipher=RC4-MD5); Sun, 14 Nov 2010 05:16:52 -0800 (PST)
Message-ID: <4CDFE134.3000208@HIDDEN>
Date: Sun, 14 Nov 2010 16:16:36 +0300
From: Evgeny Kapun <abacabadabacaba@HIDDEN>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
	rv:1.9.2.12) Gecko/20101027 Thunderbird/3.1.6
MIME-Version: 1.0
To: bug-coreutils@HIDDEN
Subject: du may count one file multiple times if it visible through multiple
	mounts
X-Enigmail-Version: 1.1.2
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6 (newer, 2)
X-Spam-Score: -2.6 (--)
X-Debbugs-Envelope-To: submit
X-Mailman-Approved-At: Sun, 14 Nov 2010 08:59:17 -0500
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.11
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <http://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <http://debbugs.gnu.org/pipermail/debbugs-submit>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <http://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>,
	<mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Sender: debbugs-submit-bounces <at> debbugs.gnu.org
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
X-Spam-Score: -2.6 (--)

Some kernels, such as Linux, permit mounting one filesystem multiple
times. This can make multiple paths refer to the same file, although
neither hard nor symbolic links are involved. In this case, du sometimes
incorrectly counts that file many times:

	$ mkdir dir1 dir2
	# mount --bind dir1 dir2
	$ dd if=/dev/urandom of=dir1/file bs=1k count=1000
	1000+0 records in
	1000+0 records out
	1024000 bytes (1.0 MB) copied, 0.179289 s, 5.7 MB/s
	$ du
	1008	./dir1
	1008	./dir2
	2020	.

As you may see, the file is counted twice, once as dir1/file and then as
dir2/file. However, if du is run with repeated argument, it's behavior
is different:

	$ du . .
	1008	./dir1
	1012	.




Acknowledgement sent to Evgeny Kapun <abacabadabacaba@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to owner <at> debbugs.gnu.org, bug-coreutils@HIDDEN:
bug#7399; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Fri, 11 Jan 2019 09:45:01 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.