GNU bug report logs - #18621
[BUG] wc -c incorrectly counts bytes in /sys

Previous Next

Package: coreutils;

Reported by: George Shuklin <george.shuklin <at> gmail.com>

Date: Fri, 3 Oct 2014 15:13:01 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18621 in the body.
You can then email your comments to 18621 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Fri, 03 Oct 2014 15:13:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to George Shuklin <george.shuklin <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Fri, 03 Oct 2014 15:13:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: George Shuklin <george.shuklin <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: [BUG] wc -c incorrectly counts bytes in /sys
Date: Fri, 03 Oct 2014 17:47:19 +0300
There is many sysfs (linux) attributes which reported as '4k files' but 
contains just a few bytes.

wc file and wc -c shows different sizes.

Example:

$cat /sys/kernel/vmcoreinfo
1b74c00 1024

$hexdump -Cv /sys/kernel/vmcoreinfo
00000000  31 62 37 34 63 30 30 20  31 30 32 34 0a           |1b74c00 1024.|
0000000d

$ls -la /sys/kernel/vmcoreinfo
-r--r--r-- 1 root root 4096 Oct  3 17:40 /sys/kernel/vmcoreinfo


Here wc output:

$ wc /sys/kernel/vmcoreinfo
   1    2   13 /sys/kernel/vmcoreinfo

and wc -c:

$ wc -c /sys/kernel/vmcoreinfo
4096 /sys/kernel/vmcoreinfo

4096 is not 13, and manual page for wc says that third number is byte count.

I think problem is in  cnt(const char *file)  function:

	if (dochar || domulti) {
		if (fstat(fd, &sb)) {
			warn("%s: fstat", file);
			(void)close(fd);
			return (1);
		}
		if (S_ISREG(sb.st_mode)) {
			(void)printf(" %7lld", (long long)sb.st_size);
			tcharct += sb.st_size;
			(void)close(fd);
			return (0);
		}
	}






Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Fri, 03 Oct 2014 16:49:01 GMT) Full text and rfc822 format available.

Message #8 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Shuklin <george.shuklin <at> gmail.com>
Cc: 18621 <at> debbugs.gnu.org
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Fri, 03 Oct 2014 17:48:20 +0100
On 10/03/2014 03:47 PM, George Shuklin wrote:
> There is many sysfs (linux) attributes which reported as '4k files' but contains just a few bytes.
> 
> wc file and wc -c shows different sizes.
> 
> Example:
> 
> $cat /sys/kernel/vmcoreinfo
> 1b74c00 1024
> 
> $hexdump -Cv /sys/kernel/vmcoreinfo
> 00000000  31 62 37 34 63 30 30 20  31 30 32 34 0a           |1b74c00 1024.|
> 0000000d
> 
> $ls -la /sys/kernel/vmcoreinfo
> -r--r--r-- 1 root root 4096 Oct  3 17:40 /sys/kernel/vmcoreinfo
> 
> 
> Here wc output:
> 
> $ wc /sys/kernel/vmcoreinfo
>    1    2   13 /sys/kernel/vmcoreinfo
> 
> and wc -c:
> 
> $ wc -c /sys/kernel/vmcoreinfo
> 4096 /sys/kernel/vmcoreinfo
> 
> 4096 is not 13, and manual page for wc says that third number is byte count.
> 
> I think problem is in  cnt(const char *file)  function:
> 
>     if (dochar || domulti) {
>         if (fstat(fd, &sb)) {
>             warn("%s: fstat", file);
>             (void)close(fd);
>             return (1);
>         }
>         if (S_ISREG(sb.st_mode)) {
>             (void)printf(" %7lld", (long long)sb.st_size);
>             tcharct += sb.st_size;
>             (void)close(fd);
>             return (0);
>         }
>     }

I'm not sure where the above code comes from,
by coreutils trunk has the same behavior with these files.
We could avoid it with the following patch.
Note in the case where "real" small files don't
take up space in the file system, this will involve a redundant read,
however that will only be the case for small files so shouldn't
be problematic.

thanks,
Pádraig.

diff --git a/src/wc.c b/src/wc.c
index 1ff007d..bf1ce76 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
         fstatus->failed = fstat (fd, &fstatus->st);

       if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
+          && fstatus->st.st_blocks
           && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
           && (end_pos = lseek (fd, 0, SEEK_END)) != -1)
         {





Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Fri, 03 Oct 2014 18:27:02 GMT) Full text and rfc822 format available.

Message #11 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 18621 <at> debbugs.gnu.org, George Shuklin <george.shuklin <at> gmail.com>
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Fri, 3 Oct 2014 11:26:34 -0700
On Fri, Oct 3, 2014 at 9:48 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 10/03/2014 03:47 PM, George Shuklin wrote:
...
> I'm not sure where the above code comes from,
> by coreutils trunk has the same behavior with these files.
> We could avoid it with the following patch.
> Note in the case where "real" small files don't
> take up space in the file system, this will involve a redundant read,
> however that will only be the case for small files so shouldn't
> be problematic.
>
> thanks,
> Pádraig.
>
> diff --git a/src/wc.c b/src/wc.c
> index 1ff007d..bf1ce76 100644
> --- a/src/wc.c
> +++ b/src/wc.c
> @@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
>          fstatus->failed = fstat (fd, &fstatus->st);
>
>        if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
> +          && fstatus->st.st_blocks
>            && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
>            && (end_pos = lseek (fd, 0, SEEK_END)) != -1)
>          {

That looks like a fine fix.
However, a similar issue affects tac, when its lseek-SEEK_END fails:

  $ tac /sys/kernel/vmcoreinfo
  tac: /sys/kernel/vmcoreinfo: read error: Inappropriate ioctl for device




Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Fri, 03 Oct 2014 18:49:01 GMT) Full text and rfc822 format available.

Message #14 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Pádraig Brady
 <P <at> draigbrady.com>
Cc: 18621 <at> debbugs.gnu.org, George Shuklin <george.shuklin <at> gmail.com>
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Fri, 03 Oct 2014 11:47:49 -0700
On 10/03/2014 11:26 AM, Jim Meyering wrote:
> That looks like a fine fix.

Unfortunately that fix would make 'wc -c' waaaaay slower for a file that 
consists entirely of a big hole.

How about if we change usable_st_size to return false for these proc 
files, with a heuristic as tight as we can make it, and to have 
coreutils check usable_st_size in more places.  Something like this, 
perhaps:

  /* Return a boolean indicating whether SB->st_size is correct. */
  static inline bool
  usable_st_size (struct stat const *sb)
  {
    if (S_ISREG (sb->st_mode))
      {
        /* proc files like /sys/kernel/vmcoreinfo are weird: their
           st_size values do not reflect what's actually in them.
           The following heuristic attempts to catch proc files without
           catching many regular files that just happen to have the same
           signature.  */
        return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0
                  && sb->st_size == ST_BLKSIZE (*sb));
      }
    return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb));
  }


and then review every place where coreutils currently uses st_size and 
prepend a check for usable_st_size if needed.




Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Fri, 03 Oct 2014 19:18:01 GMT) Full text and rfc822 format available.

Message #17 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18621 <at> debbugs.gnu.org, George Shuklin <george.shuklin <at> gmail.com>,
 Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Fri, 03 Oct 2014 20:17:15 +0100
On 10/03/2014 07:47 PM, Paul Eggert wrote:
> On 10/03/2014 11:26 AM, Jim Meyering wrote:
>> That looks like a fine fix.
> 
> Unfortunately that fix would make 'wc -c' waaaaay slower for a file that consists entirely of a big hole.

True, which you could avoid by deferring to read() for empty files:

diff --git a/src/wc.c b/src/wc.c
index 1ff007d..f8176cc 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
         fstatus->failed = fstat (fd, &fstatus->st);

       if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
+          && fstatus->st.st_blocks && fstatus->st.st_size
           && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
           && (end_pos = lseek (fd, 0, SEEK_END)) != -1)
         {

> How about if we change usable_st_size to return false for these proc files, with a heuristic as tight as we can make it, and to have coreutils check usable_st_size in more places.  Something like this, perhaps:
> 
>   /* Return a boolean indicating whether SB->st_size is correct. */
>   static inline bool
>   usable_st_size (struct stat const *sb)
>   {
>     if (S_ISREG (sb->st_mode))
>       {
>         /* proc files like /sys/kernel/vmcoreinfo are weird: their
>            st_size values do not reflect what's actually in them.
>            The following heuristic attempts to catch proc files without
>            catching many regular files that just happen to have the same
>            signature.  */
>         return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0
>                   && sb->st_size == ST_BLKSIZE (*sb));
>       }
>     return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb));
>   }
>
> and then review every place where coreutils currently uses st_size and prepend a check for usable_st_size if needed.

That would be usefult, and that check matches this case, however many are not
distinguishable. Consider: `stat /proc/$$/io` which has st_size and st_blocks = 0
Note my adjusted patch above handles both cases.

thanks,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Tue, 07 Oct 2014 07:10:02 GMT) Full text and rfc822 format available.

Message #20 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Pádraig Brady
 <P <at> draigbrady.com>
Cc: 18621 <at> debbugs.gnu.org, George Shuklin <george.shuklin <at> gmail.com>
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Tue, 07 Oct 2014 00:09:24 -0700
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> How about if we change usable_st_size to return false for these proc files

Attached is a better idea, I hope.  I audited the coreutils code to look for 
problematic uses of SEEK_END or st_size when reading files (I didn't look at 
writing; one can of worms at a time).

The attached patch still needs a changelog entry and test cases.  The basic idea 
is to not trust st_size when it's <= ST_BLKSIZE.  This fixes bugs in 'head', 
'od', 'split', 'tac', 'tail', and 'wc' when applied to input files in proc or 
sysfs file systems.

Here's an example bug fixed by this patch:

$ cat /sys/kernel/profiling
0
$ tail -2c /sys/kernel/profiling
$

This should be:

$ cat /sys/kernel/profiling
0
$ tail -2c /sys/kernel/profiling
0
$

[procsys.diff (text/plain, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 07 Oct 2014 23:53:02 GMT) Full text and rfc822 format available.

Notification sent to George Shuklin <george.shuklin <at> gmail.com>:
bug acknowledged by developer. (Tue, 07 Oct 2014 23:53:03 GMT) Full text and rfc822 format available.

Message #25 received at 18621-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Jim Meyering <jim <at> meyering.net>, Pádraig Brady
 <P <at> draigbrady.com>
Cc: George Shuklin <george.shuklin <at> gmail.com>, 18621-done <at> debbugs.gnu.org
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Tue, 07 Oct 2014 16:51:47 -0700
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:

> The attached patch still needs a changelog entry and test cases.

I wrote those up and pushed the attached patch; this should fix the bug so I'm 
closing the bug report.

[0001-wc-don-t-miscount-sys-and-similar-file-systems.patch (text/plain, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Wed, 08 Oct 2014 00:37:02 GMT) Full text and rfc822 format available.

Message #28 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 18621 <at> debbugs.gnu.org, George Shuklin <george.shuklin <at> gmail.com>,
 Jim Meyering <jim <at> meyering.net>
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Wed, 08 Oct 2014 01:36:30 +0100
On 10/08/2014 12:51 AM, Paul Eggert wrote:
> Paul Eggert wrote:
> 
>> The attached patch still needs a changelog entry and test cases.
> 
> I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report.

I was just going through the patch as it happens, and I found no issues.
I see the tac changes should also avoid erroneous errors if a file was truncated while reading for example.
I pushed a follow up to avoid some new syntax check errors.

thanks!
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Wed, 08 Oct 2014 01:13:02 GMT) Full text and rfc822 format available.

Message #31 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: George Shuklin <george.shuklin <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>,
 18621 <18621 <at> debbugs.gnu.org>
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Tue, 7 Oct 2014 18:12:02 -0700
On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 10/08/2014 12:51 AM, Paul Eggert wrote:
>> Paul Eggert wrote:
>>
>>> The attached patch still needs a changelog entry and test cases.
>>
>> I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report.
>
> I was just going through the patch as it happens, and I found no issues.
> I see the tac changes should also avoid erroneous errors if a file was truncated while reading for example.
> I pushed a follow up to avoid some new syntax check errors.
>
> thanks!
> Pádraig.

I pushed a patch to avoid a new failiure of the
split/b-chunk.sh test on non-Linux.




Information forwarded to bug-coreutils <at> gnu.org:
bug#18621; Package coreutils. (Wed, 08 Oct 2014 11:52:01 GMT) Full text and rfc822 format available.

Message #34 received at 18621 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 18621 <18621 <at> debbugs.gnu.org>, George Shuklin <george.shuklin <at> gmail.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Subject: Re: bug#18621: [BUG] wc -c incorrectly counts bytes in /sys
Date: Wed, 08 Oct 2014 12:51:28 +0100
[Message part 1 (text/plain, inline)]
On 10/08/2014 02:12 AM, Jim Meyering wrote:
> On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
>> On 10/08/2014 12:51 AM, Paul Eggert wrote:
>>> Paul Eggert wrote:
>>>
>>>> The attached patch still needs a changelog entry and test cases.
>>>
>>> I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report.
>>
>> I was just going through the patch as it happens, and I found no issues.
>> I see the tac changes should also avoid erroneous errors if a file was truncated while reading for example.
>> I pushed a follow up to avoid some new syntax check errors.
>>
>> thanks!
>> Pádraig.
> 
> I pushed a patch to avoid a new failiure of the
> split/b-chunk.sh test on non-Linux.

The attached is needed to avoid a new warning on 32 bit.

thanks,
Pádraig.

[tac-i686-warning.patch (text/x-patch, attachment)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 05 Nov 2014 12:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 9 years and 174 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.