GNU bug report logs -
#18621
[BUG] wc -c incorrectly counts bytes in /sys
Previous Next
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 18621 in the body.
You can then email your comments to 18621 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Fri, 03 Oct 2014 15:13:01 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
George Shuklin <george.shuklin <at> gmail.com>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 03 Oct 2014 15:13:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
There is many sysfs (linux) attributes which reported as '4k files' but
contains just a few bytes.
wc file and wc -c shows different sizes.
Example:
$cat /sys/kernel/vmcoreinfo
1b74c00 1024
$hexdump -Cv /sys/kernel/vmcoreinfo
00000000 31 62 37 34 63 30 30 20 31 30 32 34 0a |1b74c00 1024.|
0000000d
$ls -la /sys/kernel/vmcoreinfo
-r--r--r-- 1 root root 4096 Oct 3 17:40 /sys/kernel/vmcoreinfo
Here wc output:
$ wc /sys/kernel/vmcoreinfo
1 2 13 /sys/kernel/vmcoreinfo
and wc -c:
$ wc -c /sys/kernel/vmcoreinfo
4096 /sys/kernel/vmcoreinfo
4096 is not 13, and manual page for wc says that third number is byte count.
I think problem is in cnt(const char *file) function:
if (dochar || domulti) {
if (fstat(fd, &sb)) {
warn("%s: fstat", file);
(void)close(fd);
return (1);
}
if (S_ISREG(sb.st_mode)) {
(void)printf(" %7lld", (long long)sb.st_size);
tcharct += sb.st_size;
(void)close(fd);
return (0);
}
}
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Fri, 03 Oct 2014 16:49:01 GMT)
Full text and
rfc822 format available.
Message #8 received at 18621 <at> debbugs.gnu.org (full text, mbox):
On 10/03/2014 03:47 PM, George Shuklin wrote:
> There is many sysfs (linux) attributes which reported as '4k files' but contains just a few bytes.
>
> wc file and wc -c shows different sizes.
>
> Example:
>
> $cat /sys/kernel/vmcoreinfo
> 1b74c00 1024
>
> $hexdump -Cv /sys/kernel/vmcoreinfo
> 00000000 31 62 37 34 63 30 30 20 31 30 32 34 0a |1b74c00 1024.|
> 0000000d
>
> $ls -la /sys/kernel/vmcoreinfo
> -r--r--r-- 1 root root 4096 Oct 3 17:40 /sys/kernel/vmcoreinfo
>
>
> Here wc output:
>
> $ wc /sys/kernel/vmcoreinfo
> 1 2 13 /sys/kernel/vmcoreinfo
>
> and wc -c:
>
> $ wc -c /sys/kernel/vmcoreinfo
> 4096 /sys/kernel/vmcoreinfo
>
> 4096 is not 13, and manual page for wc says that third number is byte count.
>
> I think problem is in cnt(const char *file) function:
>
> if (dochar || domulti) {
> if (fstat(fd, &sb)) {
> warn("%s: fstat", file);
> (void)close(fd);
> return (1);
> }
> if (S_ISREG(sb.st_mode)) {
> (void)printf(" %7lld", (long long)sb.st_size);
> tcharct += sb.st_size;
> (void)close(fd);
> return (0);
> }
> }
I'm not sure where the above code comes from,
by coreutils trunk has the same behavior with these files.
We could avoid it with the following patch.
Note in the case where "real" small files don't
take up space in the file system, this will involve a redundant read,
however that will only be the case for small files so shouldn't
be problematic.
thanks,
Pádraig.
diff --git a/src/wc.c b/src/wc.c
index 1ff007d..bf1ce76 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
fstatus->failed = fstat (fd, &fstatus->st);
if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
+ && fstatus->st.st_blocks
&& (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
&& (end_pos = lseek (fd, 0, SEEK_END)) != -1)
{
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Fri, 03 Oct 2014 18:27:02 GMT)
Full text and
rfc822 format available.
Message #11 received at 18621 <at> debbugs.gnu.org (full text, mbox):
On Fri, Oct 3, 2014 at 9:48 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 10/03/2014 03:47 PM, George Shuklin wrote:
...
> I'm not sure where the above code comes from,
> by coreutils trunk has the same behavior with these files.
> We could avoid it with the following patch.
> Note in the case where "real" small files don't
> take up space in the file system, this will involve a redundant read,
> however that will only be the case for small files so shouldn't
> be problematic.
>
> thanks,
> Pádraig.
>
> diff --git a/src/wc.c b/src/wc.c
> index 1ff007d..bf1ce76 100644
> --- a/src/wc.c
> +++ b/src/wc.c
> @@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
> fstatus->failed = fstat (fd, &fstatus->st);
>
> if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
> + && fstatus->st.st_blocks
> && (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
> && (end_pos = lseek (fd, 0, SEEK_END)) != -1)
> {
That looks like a fine fix.
However, a similar issue affects tac, when its lseek-SEEK_END fails:
$ tac /sys/kernel/vmcoreinfo
tac: /sys/kernel/vmcoreinfo: read error: Inappropriate ioctl for device
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Fri, 03 Oct 2014 18:49:01 GMT)
Full text and
rfc822 format available.
Message #14 received at 18621 <at> debbugs.gnu.org (full text, mbox):
On 10/03/2014 11:26 AM, Jim Meyering wrote:
> That looks like a fine fix.
Unfortunately that fix would make 'wc -c' waaaaay slower for a file that
consists entirely of a big hole.
How about if we change usable_st_size to return false for these proc
files, with a heuristic as tight as we can make it, and to have
coreutils check usable_st_size in more places. Something like this,
perhaps:
/* Return a boolean indicating whether SB->st_size is correct. */
static inline bool
usable_st_size (struct stat const *sb)
{
if (S_ISREG (sb->st_mode))
{
/* proc files like /sys/kernel/vmcoreinfo are weird: their
st_size values do not reflect what's actually in them.
The following heuristic attempts to catch proc files without
catching many regular files that just happen to have the same
signature. */
return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0
&& sb->st_size == ST_BLKSIZE (*sb));
}
return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb));
}
and then review every place where coreutils currently uses st_size and
prepend a check for usable_st_size if needed.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Fri, 03 Oct 2014 19:18:01 GMT)
Full text and
rfc822 format available.
Message #17 received at 18621 <at> debbugs.gnu.org (full text, mbox):
On 10/03/2014 07:47 PM, Paul Eggert wrote:
> On 10/03/2014 11:26 AM, Jim Meyering wrote:
>> That looks like a fine fix.
>
> Unfortunately that fix would make 'wc -c' waaaaay slower for a file that consists entirely of a big hole.
True, which you could avoid by deferring to read() for empty files:
diff --git a/src/wc.c b/src/wc.c
index 1ff007d..f8176cc 100644
--- a/src/wc.c
+++ b/src/wc.c
@@ -235,6 +235,7 @@ wc (int fd, char const *file_x, struct fstatus *fstatus)
fstatus->failed = fstat (fd, &fstatus->st);
if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
+ && fstatus->st.st_blocks && fstatus->st.st_size
&& (current_pos = lseek (fd, 0, SEEK_CUR)) != -1
&& (end_pos = lseek (fd, 0, SEEK_END)) != -1)
{
> How about if we change usable_st_size to return false for these proc files, with a heuristic as tight as we can make it, and to have coreutils check usable_st_size in more places. Something like this, perhaps:
>
> /* Return a boolean indicating whether SB->st_size is correct. */
> static inline bool
> usable_st_size (struct stat const *sb)
> {
> if (S_ISREG (sb->st_mode))
> {
> /* proc files like /sys/kernel/vmcoreinfo are weird: their
> st_size values do not reflect what's actually in them.
> The following heuristic attempts to catch proc files without
> catching many regular files that just happen to have the same
> signature. */
> return ! (sb->st_uid == 0 && sb->st_gid == 0 && sb->st_blocks == 0
> && sb->st_size == ST_BLKSIZE (*sb));
> }
> return (S_ISLNK (sb->st_mode) || S_TYPEISSHM (sb) || S_TYPEISTMO (sb));
> }
>
> and then review every place where coreutils currently uses st_size and prepend a check for usable_st_size if needed.
That would be usefult, and that check matches this case, however many are not
distinguishable. Consider: `stat /proc/$$/io` which has st_size and st_blocks = 0
Note my adjusted patch above handles both cases.
thanks,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Tue, 07 Oct 2014 07:10:02 GMT)
Full text and
rfc822 format available.
Message #20 received at 18621 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> How about if we change usable_st_size to return false for these proc files
Attached is a better idea, I hope. I audited the coreutils code to look for
problematic uses of SEEK_END or st_size when reading files (I didn't look at
writing; one can of worms at a time).
The attached patch still needs a changelog entry and test cases. The basic idea
is to not trust st_size when it's <= ST_BLKSIZE. This fixes bugs in 'head',
'od', 'split', 'tac', 'tail', and 'wc' when applied to input files in proc or
sysfs file systems.
Here's an example bug fixed by this patch:
$ cat /sys/kernel/profiling
0
$ tail -2c /sys/kernel/profiling
$
This should be:
$ cat /sys/kernel/profiling
0
$ tail -2c /sys/kernel/profiling
0
$
[procsys.diff (text/plain, attachment)]
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Tue, 07 Oct 2014 23:53:02 GMT)
Full text and
rfc822 format available.
Notification sent
to
George Shuklin <george.shuklin <at> gmail.com>
:
bug acknowledged by developer.
(Tue, 07 Oct 2014 23:53:03 GMT)
Full text and
rfc822 format available.
Message #25 received at 18621-done <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert wrote:
> The attached patch still needs a changelog entry and test cases.
I wrote those up and pushed the attached patch; this should fix the bug so I'm
closing the bug report.
[0001-wc-don-t-miscount-sys-and-similar-file-systems.patch (text/plain, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Wed, 08 Oct 2014 00:37:02 GMT)
Full text and
rfc822 format available.
Message #28 received at 18621 <at> debbugs.gnu.org (full text, mbox):
On 10/08/2014 12:51 AM, Paul Eggert wrote:
> Paul Eggert wrote:
>
>> The attached patch still needs a changelog entry and test cases.
>
> I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report.
I was just going through the patch as it happens, and I found no issues.
I see the tac changes should also avoid erroneous errors if a file was truncated while reading for example.
I pushed a follow up to avoid some new syntax check errors.
thanks!
Pádraig.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Wed, 08 Oct 2014 01:13:02 GMT)
Full text and
rfc822 format available.
Message #31 received at 18621 <at> debbugs.gnu.org (full text, mbox):
On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 10/08/2014 12:51 AM, Paul Eggert wrote:
>> Paul Eggert wrote:
>>
>>> The attached patch still needs a changelog entry and test cases.
>>
>> I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report.
>
> I was just going through the patch as it happens, and I found no issues.
> I see the tac changes should also avoid erroneous errors if a file was truncated while reading for example.
> I pushed a follow up to avoid some new syntax check errors.
>
> thanks!
> Pádraig.
I pushed a patch to avoid a new failiure of the
split/b-chunk.sh test on non-Linux.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#18621
; Package
coreutils
.
(Wed, 08 Oct 2014 11:52:01 GMT)
Full text and
rfc822 format available.
Message #34 received at 18621 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
On 10/08/2014 02:12 AM, Jim Meyering wrote:
> On Tue, Oct 7, 2014 at 5:36 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
>> On 10/08/2014 12:51 AM, Paul Eggert wrote:
>>> Paul Eggert wrote:
>>>
>>>> The attached patch still needs a changelog entry and test cases.
>>>
>>> I wrote those up and pushed the attached patch; this should fix the bug so I'm closing the bug report.
>>
>> I was just going through the patch as it happens, and I found no issues.
>> I see the tac changes should also avoid erroneous errors if a file was truncated while reading for example.
>> I pushed a follow up to avoid some new syntax check errors.
>>
>> thanks!
>> Pádraig.
>
> I pushed a patch to avoid a new failiure of the
> split/b-chunk.sh test on non-Linux.
The attached is needed to avoid a new warning on 32 bit.
thanks,
Pádraig.
[tac-i686-warning.patch (text/x-patch, attachment)]
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Wed, 05 Nov 2014 12:24:03 GMT)
Full text and
rfc822 format available.
This bug report was last modified 10 years and 153 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.