GNU bug report logs - #47702
wc man page: first you are talking about bytes, then you are talking about characters

Previous Next

Package: coreutils;

Reported by: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>

Date: Sun, 11 Apr 2021 05:43:03 UTC

Severity: normal

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 47702 in the body.
You can then email your comments to 47702 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#47702; Package coreutils. (Sun, 11 Apr 2021 05:43:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Sun, 11 Apr 2021 05:43:04 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>
To: bug-coreutils <at> gnu.org
Subject: wc man page: first you are talking about bytes, then you are
 talking about characters
Date: Sun, 11 Apr 2021 09:42:57 +0800
Man wc says

       Print newline, word, and byte counts for each FILE, and a total line if
       more than one FILE is specified.  A word is a non-zero-length  sequence
       of characters delimited by white space.

first you are talking about bytes, then you are talking about
characters.

So for the latter, please say
characters (not bytes)
or
characters (same as bytes)
or just
bytes
Yes, even if explained in the INFO file.
Thanks.




Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Sun, 11 Apr 2021 15:51:02 GMT) Full text and rfc822 format available.

Notification sent to 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>:
bug acknowledged by developer. (Sun, 11 Apr 2021 15:51:02 GMT) Full text and rfc822 format available.

Message #10 received at 47702-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: 積丹尼 Dan Jacobson <jidanni <at> jidanni.org>,
 47702-done <at> debbugs.gnu.org
Subject: Re: bug#47702: wc man page: first you are talking about bytes, then
 you are talking about characters
Date: Sun, 11 Apr 2021 16:50:35 +0100
[Message part 1 (text/plain, inline)]
On 11/04/2021 02:42, 積丹尼 Dan Jacobson wrote:
> Man wc says
> 
>         Print newline, word, and byte counts for each FILE, and a total line if
>         more than one FILE is specified.  A word is a non-zero-length  sequence
>         of characters delimited by white space.
> 
> first you are talking about bytes, then you are talking about
> characters.
> 
> So for the latter, please say
> characters (not bytes)
> or
> characters (same as bytes)
> or just
> bytes
> Yes, even if explained in the INFO file.

You're right that this is under-specified,
in both the man page and the info file.
The above is really characters (not bytes).
In fact as a GNU extension it's printable characters.
POSIX does not specify this, but one can confirm like:


$ printf '\xc3 \xc3' | LC_ALL=C wc --word --character --byte
      0       3       3
$ printf '\xc3 \xc3' | LC_ALL=C.utf8 wc --word --character --byte
      0       1       3

The info file was really quite under-specified in this regard.
I'll apply the attached to clarify things.
Marking this as done.

thanks!
Pádraig
[wc-clarify-counts.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 10 May 2021 11:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 349 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.