GNU bug report logs - #28152
Human readable units (-h/--human-readable vs --si) - Wrong prefix and missing unit

Previous Next

Package: coreutils;

Reported by: Michael Weiss <dev.primeos <at> gmail.com>

Date: Sat, 19 Aug 2017 20:25:02 UTC

Severity: wishlist

Tags: wontfix

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 28152 in the body.
You can then email your comments to 28152 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#28152; Package coreutils. (Sat, 19 Aug 2017 20:25:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Michael Weiss <dev.primeos <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sat, 19 Aug 2017 20:25:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Michael Weiss <dev.primeos <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: Human readable units (-h/--human-readable vs --si) - Wrong prefix
 and missing unit
Date: Sat, 19 Aug 2017 21:27:02 +0200
Imho the units used in the output of df, du, ls, etc. with the
-h/--human-readable option can be very misleading/ambiguous and in the
case of -h/--human-readable even wrong according to standards.

I don't want to flame about this but I'd love it if we could discuss
this objectively by considering the official standards and change the
output appropriately.

First of all I hope we can agree that the current output is ambiguous
and therefore not really useful unless the exact command that generated
that output is known (or at least if --si or -h was used). Imho this is
not desirable and already causes some problems when sharing that output
without providing the command.

If we look at the standards Wikipedia [0] provides the following table
(I've removed the JEDEC units as they shouldn't be relevant here ("Unit
prefixes for semiconductor storage capacity")):

Prefixes for multiples of bits (bit) or bytes (B)
Decimal            | Binary
Value  SI          | Value  IEC
1000   k  kilo     | 1024   Ki  kibi
10002  M  mega     | 10242  Mi  mebi
10003  G  giga     | 10243  Gi  gibi
10004  T  tera     | 10244  Ti  tebi
10005  P  peta     | 10245  Pi  pebi
10006  E  exa      | 10246  Ei  exbi
10007  Z  zetta    | 10247  Zi  zebi
10008  Y  yotta    | 10248  Yi  yobi

These are the unit prefixes that I'm used to and they have the advantage
that they're unambiguous and standardized.

"With the aim of avoiding ambiguity the International Electrotechnical
Commission (IEC) adopted new binary prefixes in 1998 (IEC 80000-13:2008
formerly subclauses 3.8 and 3.9 of IEC 60027-2:2005) Each binary prefix
is formed from the first syllable of the decimal prefix with the similar
value, and the syllable "bi". The symbols are the decimal symbol, always
capitalised, followed by the letter "i". According to these standards,
kilo, mega, giga et seq. would only be used in the decimal sense, even
when referring to data storage capacities: kilobyte and megabyte would
denote one thousand and one million bytes respectively (consistent with
the metric system), while new terms such as kibibyte, mebibyte and
gibibyte, with symbols KiB, MiB and GiB, would denote 210, 220 and 230
bytes respectively." [1]

And last but not least we should provide the actual unit as well. In
this case all units are in bytes which we can abbreviate with B (not
with a lowercase b as that would mean bits). This should make the output
completely unambiguous, follow the standards and avoid the possibility
of misinterpretation.

I can understand that changing such historic things might always cause
some minor problems but delaying them doesn't make them magically go
away. And since this change would only affect the human readable output
it shouldn't really break any scripts.

An example:

Old:
114M	fileA
120M	fileA
New:
114MiB	fileA
120MB	fileA
Or alternatively:
114 MiB	fileA
120 MB	fileA

Links/References:
- https://en.wikipedia.org/wiki/Unit_prefix#Binary_prefixes
- https://en.wikipedia.org/wiki/Data_rate_units
- http://man7.org/linux/man-pages/man7/units.7.html
- http://man7.org/linux/man-pages/man1/numfmt.1.html
- https://debbugs.gnu.org/cgi/bugreport.cgi?bug=7176
- https://debbugs.gnu.org/cgi/bugreport.cgi?bug=18119

GNU coreutils version: 8.27
OS: GNU/Linux

Kind regards,

Michael

[0]: https://en.wikipedia.org/wiki/Unit_prefix
[1]: https://en.wikipedia.org/wiki/Unit_prefix#Binary_prefixes




Information forwarded to bug-coreutils <at> gnu.org:
bug#28152; Package coreutils. (Sat, 19 Aug 2017 21:10:02 GMT) Full text and rfc822 format available.

Message #8 received at 28152 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Michael Weiss <dev.primeos <at> gmail.com>
Cc: 28152 <at> debbugs.gnu.org, Mihai Capotă <mihai <at> mihaic.ro>
Subject: Re: bug#28152: Human readable units (-h/--human-readable vs --si) -
 Wrong prefix and missing unit
Date: Sat, 19 Aug 2017 14:09:00 -0700
Michael Weiss wrote:

> I can understand that changing such historic things might always cause
> some minor problems

I'm afraid the problems would be more than minor, as other programs parse the 
output (there's an option in GNU 'sort' to do that, for example). That being 
said, I could be talked into a patch like the one that Mihai Capotă suggested in:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=7176#11

as this would be upward-compatible. It would need documentation though.




Information forwarded to bug-coreutils <at> gnu.org:
bug#28152; Package coreutils. (Mon, 21 Aug 2017 23:21:01 GMT) Full text and rfc822 format available.

Message #11 received at 28152 <at> debbugs.gnu.org (full text, mbox):

From: Michael Weiss <dev.primeos <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 28152 <at> debbugs.gnu.org, Mihai Capotă <mihai <at> mihaic.ro>
Subject: Re: bug#28152: Human readable units (-h/--human-readable vs --si) -
 Wrong prefix and missing unit
Date: Tue, 22 Aug 2017 00:56:09 +0200
On Sat, 19 Aug, 2017 at 14:09:00 -0700, Paul Eggert wrote:
> I'm afraid the problems would be more than minor, as other programs parse
> the output (there's an option in GNU 'sort' to do that, for example).

You're right, I was way too optimistic about this. But still, it could
be way worse imho.

> That being said, I could be talked into a patch like the one that
> Mihai Capotă suggested in:
> 
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=7176#11
> 
> as this would be upward-compatible. It would need documentation though.

Imho that patch would already be a great improvement but probably still
not enough.

If I didn't miss anything it would overwrite the default behaviour i.e.
if one would want to get the normal output one would have to execute
something like this: "env -u BLOCK_SIZE ls -l". The other problem would
be that the behaviour of -h and --si wouldn't change at all.

If one would like to change the default unit/format of the output (e.g.
via .bashrc) this would be great but unfortunately it wouldn't cover the
use case where one would like to use all binaries normally but get the
"human_B" output.

Do you think it would be possible to add another variable that wouldn't
overwrite the default but use the "human_B" output with -h or --si?

In that case one could set something like "HUMAN_B=true" and get the
following output:

$ du -s
116244	.

$ du -sh
114MiB	.

$ du -s --si
120MB	.

PS: Thanks for your fast reply (and sorry for my delay...).




Information forwarded to bug-coreutils <at> gnu.org:
bug#28152; Package coreutils. (Mon, 21 Aug 2017 23:22:02 GMT) Full text and rfc822 format available.

Message #14 received at 28152 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Michael Weiss <dev.primeos <at> gmail.com>
Cc: 28152 <at> debbugs.gnu.org, Mihai Capotă <mihai <at> mihaic.ro>
Subject: Re: bug#28152: Human readable units (-h/--human-readable vs --si) -
 Wrong prefix and missing unit
Date: Mon, 21 Aug 2017 16:21:27 -0700
On 08/21/2017 03:56 PM, Michael Weiss wrote:
> Do you think it would be possible to add another variable that wouldn't
> overwrite the default but use the "human_B" output with -h or --si?

Probably not. We've been heading more in the opposite direction, in that 
we'd rather not have environment variables affect the behavior of 
standard utilities, due to the possibility of confusion and even attacks 
on unwary users. For interactive use you can define your own du command 
or alias that behaves the way you prefer.





Information forwarded to bug-coreutils <at> gnu.org:
bug#28152; Package coreutils. (Mon, 21 Aug 2017 23:59:01 GMT) Full text and rfc822 format available.

Message #17 received at 28152 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Michael Weiss <dev.primeos <at> gmail.com>, 28152 <at> debbugs.gnu.org
Subject: Re: bug#28152: Human readable units (-h/--human-readable vs --si) -
 Wrong prefix and missing unit
Date: Mon, 21 Aug 2017 17:58:29 -0600
Hello Michael,

On 19/08/17 01:27 PM, Michael Weiss wrote:
> Imho the units used in the output of df, du, ls, etc. with the
> -h/--human-readable option can be very misleading/ambiguous and in the
> case of -h/--human-readable even wrong according to standards.

[...]

> Old:
> 114M	fileA
> 120M	fileA
> New:
> 114MiB	fileA
> 120MB	fileA
[...]
> - http://man7.org/linux/man-pages/man1/numfmt.1.html

You've mentioned numfmt(1), it's worth noting that your
request is exactly what numfmt was designed to do.

The following commands will display df/du/ls output in SI and IEC-I
units, giving the output you wanted:

  ls -l | numfmt --suffix B --field=5 --to=si
  ls -l | numfmt --suffix B --field=5 --to=iec-i

  du | numfmt --format "%-10f" --suffix B --field 1 --to=si
  du | numfmt --format "%-10f" --suffix B --field 1 --to=iec-i

  df | numfmt --suffix B --header --field=2-4 --to=si
  df | numfmt --suffix B --header --field=2-4 --to=iec-i


And these can be rather easily put into a shell function so it'll be
easy to use:

 df_si() { df "$@" | numfmt --suffix B --header --field=2-4 --to=si ; }


Note that numfmt with multiple fields requires coreutils 8.24 or later
(but since you're using 8.27 it should not be a problem).


Hope this helps,
- assaf








Information forwarded to bug-coreutils <at> gnu.org:
bug#28152; Package coreutils. (Tue, 30 Oct 2018 01:10:01 GMT) Full text and rfc822 format available.

Message #20 received at 28152 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 28152 <at> debbugs.gnu.org
Subject: Re: bug#28152: Human readable units (-h/--human-readable vs --si) -
 Wrong prefix and missing unit
Date: Mon, 29 Oct 2018 19:08:53 -0600
severity 28152 wishlist
tags 28152 wontfix
close 28152
stop

(triaging old bugs)

On 2017-08-21 5:21 p.m., Paul Eggert wrote:
> On 08/21/2017 03:56 PM, Michael Weiss wrote:
>> Do you think it would be possible to add another variable that wouldn't
>> overwrite the default but use the "human_B" output with -h or --si?
> 
> Probably not. We've been heading more in the opposite direction, in that 
> we'd rather not have environment variables affect the behavior of 
> standard utilities, due to the possibility of confusion and even attacks 
> on unwary users. For interactive use you can define your own du command 
> or alias that behaves the way you prefer.

On 2017-08-21 5:58 p.m., Assaf Gordon wrote:
> You've mentioned numfmt(1), it's worth noting that your request is
> exactly what numfmt was designed to do.
With no further comments, I'm closing this bug.
Discussion can continue by replying to this thread.

-assaf




Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 01:10:02 GMT) Full text and rfc822 format available.

Added tag(s) wontfix. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 01:10:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 28152 <at> debbugs.gnu.org and Michael Weiss <dev.primeos <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 01:10:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 27 Nov 2018 12:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 144 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.