GNU bug report logs -
#61884
add an option to du that allows to control which file types are counted
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61884 in the body.
You can then email your comments to 61884 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Wed, 01 Mar 2023 03:20:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Christoph Anton Mitterer <calestyo <at> scientia.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Wed, 01 Mar 2023 03:20:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
Hey.
When I want to count the nominal sizes of the (usually regular) files
in a directory I do something like:
du --apparent-size --block-size=1
This however also counts in the sizes of the directories themselves
(and I guess also of symlinks, etc.).
The "problem" with that is in particular, that for the exact same
dir/file structure, the results differ e.g. between ext4 and btrfs,
because of different sizes for the directories (themselves).
It would be nice if there was a option that allowed to select which
file types are counted.
Yes I know that one can do something like:
find . -type f -print0 | du --apparent-size -l -c -s --block-size=1 --files0-from=- | tail -n
But that's rather cumbersome... also I cannot do something like
du path1 path2 path3
and get totals for each and a grand summary.
And even if I make an shell alias out of this, I cannot do bash completion on it.
Thanks,
Chris.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Thu, 02 Mar 2023 16:02:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 61884 <at> debbugs.gnu.org (full text, mbox):
On 01/03/2023 03:18, Christoph Anton Mitterer wrote:
> Hey.
>
> When I want to count the nominal sizes of the (usually regular) files
> in a directory I do something like:
>
> du --apparent-size --block-size=1
>
> This however also counts in the sizes of the directories themselves
> (and I guess also of symlinks, etc.).
>
>
> The "problem" with that is in particular, that for the exact same
> dir/file structure, the results differ e.g. between ext4 and btrfs,
> because of different sizes for the directories (themselves).
>
> It would be nice if there was a option that allowed to select which
> file types are counted.
>
>
> Yes I know that one can do something like:
> find . -type f -print0 | du --apparent-size -l -c -s --block-size=1 --files0-from=- | tail -n
>
> But that's rather cumbersome... also I cannot do something like
> du path1 path2 path3
> and get totals for each and a grand summary.
>
> And even if I make an shell alias out of this, I cannot do bash completion on it.
There are many possible filtering options,
which are probably best left to `find` (as per your example).
This was also mentioned previously at:
https://lists.gnu.org/archive/html/coreutils/2013-04/msg00043.html
cheers,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Thu, 02 Mar 2023 16:56:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 61884 <at> debbugs.gnu.org (full text, mbox):
On Thu, 2023-03-02 at 16:01 +0000, Pádraig Brady wrote:
> There are many possible filtering options,
> which are probably best left to `find` (as per your example).
> This was also mentioned previously at:
> https://lists.gnu.org/archive/html/coreutils/2013-04/msg00043.html
Sure, but the problem with all these is that one doesn't get usable
per-operand totals - only one big overall total.
If you take e.g.:
find dir1 dir2 fileA -not -type d -print0 | du -hsc --files0-from=-
(without the tail), one get's one line per (non-directory) file below
dir1 and dir2 as well as one for fileA .. plus the grand overall total,
whereas it would be nice to have totals for:
- dir1
- dir2
- fielA
- overall
Cheers,
Chris.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Thu, 02 Mar 2023 17:21:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 61884 <at> debbugs.gnu.org (full text, mbox):
Hi Christoph,
Christoph Anton Mitterer <calestyo <at> scientia.org> [2023-03-02 17:54:09 +0100]:
> On Thu, 2023-03-02 at 16:01 +0000, Pádraig Brady wrote:
> > There are many possible filtering options,
> > which are probably best left to `find` (as per your example).
> > This was also mentioned previously at:
> > https://lists.gnu.org/archive/html/coreutils/2013-04/msg00043.html
>
>
> Sure, but the problem with all these is that one doesn't get usable
> per-operand totals - only one big overall total.
>
> If you take e.g.:
>
>
> find dir1 dir2 fileA -not -type d -print0 | du -hsc --files0-from=-
>
> (without the tail), one get's one line per (non-directory) file below
> dir1 and dir2 as well as one for fileA .. plus the grand overall total,
> whereas it would be nice to have totals for:
> - dir1
> - dir2
> - fielA
> - overall
>
Would something like this work for you?
----------------------------------------------------------------
$ echo dir1_file1 > dir1/file1
$ echo dir1_file2 > dir1/file2
$ echo dir2_file1 > dir2/file1
$ echo dir2_file2 > dir2/file2
$ echo somefile > fileA
$ find dir1 dir2 fileA -not -type d -print0 | xargs --null du -hsc
4.0K dir1/file2
4.0K dir1/file1
4.0K dir2/file2
4.0K dir2/file1
4.0K fileA
20K total
----------------------------------------------------------------
- Glenn
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Fri, 03 Mar 2023 00:22:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 61884 <at> debbugs.gnu.org (full text, mbox):
Hey Glenn
On Thu, 2023-03-02 at 10:20 -0700, Glenn Golden wrote:
> Would something like this work for you?
>
> ----------------------------------------------------------------
> $ echo dir1_file1 > dir1/file1
> $ echo dir1_file2 > dir1/file2
> $ echo dir2_file1 > dir2/file1
> $ echo dir2_file2 > dir2/file2
> $ echo somefile > fileA
>
> $ find dir1 dir2 fileA -not -type d -print0 | xargs --null du -
> hsc
> 4.0K dir1/file2
> 4.0K dir1/file1
> 4.0K dir2/file2
> 4.0K dir2/file1
> 4.0K fileA
> 20K total
> ----------------------------------------------------------------
TBH, I don't even understand how this should solve the "problem" I've
described above.
Your find would stil return any non-directory files beneath dir1 and
dir2.
Because of xargs, du would see each of them as an argument (and likely
produce undesired results if there are too many files), and
subsequently still print each of them as a -s "total".
But apart from that,... it's clear that one can get the desired results
*somehow*, e.g. I simply use a scrip like that right now:
total_size=0
for pathname in "$@"; do
size="$( find "${pathname}" \! -type d -print0 | du --apparent-size -l -c --block-size=1 --files0-from=- | tail -n 1 | cut -d ' ' -f 1 )"
total_size="$(( ${size} + ${total_size} ))"
printf '%s\t%s\n' "${size}" "${pathname}"
done
printf '%s\ttotal\n' "${total_size}"
# (with the -d ' ' being a literal tabulator - $'…' quoting is not (yet) POSIX standardised)
That gets of course ugly if one would have really a lot arguments (many
forked processes).
And it's not something that one can expect to be there per default.
Anyway,... feel free to close the issue.
Cheers,
Chris.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Fri, 03 Mar 2023 00:25:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 61884 <at> debbugs.gnu.org (full text, mbox):
Oh, and I forgot to mention another main drawback of such a script.
It cannot (easily) be used with du's other options, cause that would
require some options parser to be added to the script.
While this is of course rather easily possible (getopt) the main
problem there is IMO to keep it up2date with any option changes to du.
Cheers,
Chris.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Sat, 04 Mar 2023 22:59:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 61884 <at> debbugs.gnu.org (full text, mbox):
What's the motivation here? Does this have something to do with
reproducible builds?
One possibility is for --apparent-size to always count 0 for
directories, since 'read' never returns a positive number on
directories. That is, we reinterpret --apparent-size to mean "bytes that
could be read" rather than "what st_size says".
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Sat, 04 Mar 2023 23:34:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 61884 <at> debbugs.gnu.org (full text, mbox):
On Sat, 2023-03-04 at 14:58 -0800, Paul Eggert wrote:
> What's the motivation here? Does this have something to do with
> reproducible builds?
No, nothing with reproducibility - at least not from my side. It's
really just to get a number for the "actual" data. And yes it's clear
that one can argue what that actually is ;-) ... but at least I think
it should give the same totals for the same files (of any type) on any
filesystem.
> One possibility is for --apparent-size to always count 0 for
> directories, since 'read' never returns a positive number on
> directories. That is, we reinterpret --apparent-size to mean "bytes
> that
> could be read" rather than "what st_size says".
Sounds like having a good potential for breaking existing stuff.
And in a way solve the fundamental problem only partially:
As said above, it's not even clear what "actual" or "pristine" data
should actually be.
I would say that it's at least independent of any underlying structures
(like meta data of a filesystem or e.g. header data in a tar archive).
But would symlinks (i.e. their length) count for it?
What about hardlinked files, would they count once or n times?
du already allows to select what it should do for hard links (-l) so I
figured it would fit conceptually if it would allow the same for file
types.
E.g. with a --type option that takes a string of (1-n) letter like
find:
b block (buffered) special
c character (unbuffered) special
d directory
p named pipe (FIFO)
f regular file
l symbolic link
s socket
D door (Solaris)
If --type is given only the files with letters are counted (but it has
no effect on whether such files are followed or recursed into (in the
case of d or l).
But anyway... as said previously... I already have my script that does
more or less what I want.
So if you think the whole idea is overkill for du, then don't hesitate
to close as wontfix.
Cheers,
Chris.
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Sun, 05 Mar 2023 01:01:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
Christoph Anton Mitterer <calestyo <at> scientia.org>
:
bug acknowledged by developer.
(Sun, 05 Mar 2023 01:01:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 61884-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 2023-03-04 15:33, Christoph Anton Mitterer wrote:
> But would symlinks (i.e. their length) count for it?
Sure, because you can read symlinks by using readlink, and that gives
you their lengths.
Come to think of it, POSIX specifies st_size only for regular files and
symlinks among the files you'll find in a directory. So du --apparent
should count st_size only for these file types; it should ignore st_size
for other file types unless we know somehow that those sizes make sense
(which for directories is problematic for the reasons you mention).
> What about hardlinked files, would they count once or n times?
That's an independent axis and is handled by -l. Hard links are not a
file type.
> b block (buffered) special
> c character (unbuffered) special
> d directory
> p named pipe (FIFO)
> f regular file
> l symbolic link
> s socket
> D door (Solaris)
I expect Coreutils's already-existing usable_st_function should tell us
which types have usable st_size. This will exclude directories, which
should be the right thing for your use case.
So I installed the attached patch to fix du --apparent to count sizes
only when st_size is well-defined. This should address your use case so
I'm boldly closing the bug report.
[0001-du-apparent-counts-only-symlinks-and-regular.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Sun, 05 Mar 2023 01:22:02 GMT)
Full text and
rfc822 format available.
Message #34 received at 61884 <at> debbugs.gnu.org (full text, mbox):
Hey Paul.
On Sat, 2023-03-04 at 17:00 -0800, Paul Eggert wrote:
>
> So I installed the attached patch
AFAICS this is now only documented in the info page?
Would you mind to add a shorter notice to the manpage as well?
Thanks,
Chris.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Sun, 05 Mar 2023 02:14:02 GMT)
Full text and
rfc822 format available.
Message #37 received at 61884 <at> debbugs.gnu.org (full text, mbox):
On 2023-03-04 17:20, Christoph Anton Mitterer wrote:
> Would you mind to add a shorter notice to the manpage as well?
The manpage is terse by design, and I doubt whether this minor detail
makes the cut.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#61884
; Package
coreutils
.
(Mon, 13 Mar 2023 15:27:02 GMT)
Full text and
rfc822 format available.
Message #40 received at 61884 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 05/03/2023 01:00, Paul Eggert wrote:
> On 2023-03-04 15:33, Christoph Anton Mitterer wrote:
>
>> But would symlinks (i.e. their length) count for it?
>
> Sure, because you can read symlinks by using readlink, and that gives
> you their lengths.
>
> Come to think of it, POSIX specifies st_size only for regular files and
> symlinks among the files you'll find in a directory. So du --apparent
> should count st_size only for these file types; it should ignore st_size
> for other file types unless we know somehow that those sizes make sense
> (which for directories is problematic for the reasons you mention).
>
>
>> What about hardlinked files, would they count once or n times?
>
> That's an independent axis and is handled by -l. Hard links are not a
> file type.
>
>
>> b block (buffered) special
>> c character (unbuffered) special
>> d directory
>> p named pipe (FIFO)
>> f regular file
>> l symbolic link
>> s socket
>> D door (Solaris)
>
> I expect Coreutils's already-existing usable_st_function should tell us
> which types have usable st_size. This will exclude directories, which
> should be the right thing for your use case.
>
>
> So I installed the attached patch to fix du --apparent to count sizes
> only when st_size is well-defined. This should address your use case so
> I'm boldly closing the bug report.
The attached adjusts the du/threshold test to pass
by avoiding testing --apparent with dirs
cheers,
Pádraig
[du--app-dir-test.patch (text/x-patch, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Tue, 11 Apr 2023 11:24:12 GMT)
Full text and
rfc822 format available.
This bug report was last modified 2 years and 32 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.