GNU bug report logs - #33281
head does not consume input after '-c' is satisfied

Previous Next

Package: coreutils;

Reported by: Luiz Angelo Daros de Luca <luizluca <at> gmail.com>

Date: Mon, 5 Nov 2018 20:34:01 UTC

Severity: wishlist

Tags: wontfix

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 33281 in the body.
You can then email your comments to 33281 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#33281; Package coreutils. (Mon, 05 Nov 2018 20:34:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Luiz Angelo Daros de Luca <luizluca <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 05 Nov 2018 20:34:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Luiz Angelo Daros de Luca <luizluca <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: head does not consume input after '-c' is satisfied
Date: Mon, 5 Nov 2018 18:30:17 -0200
[Message part 1 (text/plain, inline)]
Hello,

Once head read enough bytes to satisfy -c option, it stops reading input
and quit.
This is different from what -n does and it is also different from both
FreeBSD and busybox head implementation.

With GNU Coreutils head:

$ echo -e "123\n456\n789" | { head -n 1; while read a; do echo "-$a-";
done; }
123
$ echo -e "123\n456\n789" | { head -c 4; while read a; do echo "-$a-";
done; }
123
-456-
-789-
$

With all other head implementations I tested:

$ echo -e "123\n456\n789" | { head -c 4 ; while read a ; do echo "-$a-" ;
done ; }
123
$

It would make sense to both -n and -c have the same meaning, differing only
whether to read bytes or lines.

Regards,
-- 

Luiz Angelo Daros de Luca
luizluca <at> gmail.com
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#33281; Package coreutils. (Mon, 05 Nov 2018 21:19:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Philip Rowlands <phr+coreutils <at> dimebar.com>
To: bug-coreutils <at> gnu.org
Subject: Re: bug#33281: head does not consume input after '-c' is satisfied
Date: Mon, 05 Nov 2018 21:17:49 +0000
On Mon, 5 Nov 2018, at 20:30, Luiz Angelo Daros de Luca wrote:
> 
> Once head read enough bytes to satisfy -c option, it stops reading input
> and quit.
> This is different from what -n does and it is also different from both
> FreeBSD and busybox head implementation.
> 
> With GNU Coreutils head:
> 
> $ echo -e "123\n456\n789" | { head -n 1; while read a; do echo "-$a-";
> done; }
> 123

This is incomplete; head doesn't read everything, but more than one line. On my (rather aged Linux) system:
$ head --version
head (GNU coreutils) 8.25

$ seq 1864 | { head -n 1; while read a; do echo "-$a-"; done; }
1
--
-1861-
-1862-
-1863-
-1864-

What's special about 1860 lines of output? It's just over the amount of data which head reads from the pipe, 8192 bytes.

$ seq 1860 | wc -c
8193

> $ echo -e "123\n456\n789" | { head -c 4; while read a; do echo "-$a-";
> done; }
> 123
> -456-
> -789-

In this case head knows it only needs 4 bytes, so only reads 4 bytes.

> With all other head implementations I tested:
> 
> $ echo -e "123\n456\n789" | { head -c 4 ; while read a ; do echo "-$a-" ;
> done ; }
> 123
> $
> 
> It would make sense to both -n and -c have the same meaning, differing only
> whether to read bytes or lines.

Consistency would be good, but consider in the case of lines, head doesn't know up-front how much data to read. The only way to read exactly the right amount, not a byte more, would be to read one byte at a time, something of a performance killer. It's not possible to "un-read" data you've collected via the read syscall.

To achieve consistency in the other direction, head could ignore the optimization to reduce the number of bytes read, and always read 8192 bytes, knowing that some would be discarded. This seems to be more in line with the other implementations you've tried.

For consistency's sake, what would these do? For widely differing values, the only way to produce the same residual output would be to consume all input data.
$ cat file.txt | { head -n 100; wc -c; }
$ cat file.txt | { head -c 100KB; wc -c; }


Cheers,
Phil




Information forwarded to bug-coreutils <at> gnu.org:
bug#33281; Package coreutils. (Tue, 06 Nov 2018 07:07:02 GMT) Full text and rfc822 format available.

Message #11 received at 33281 <at> debbugs.gnu.org (full text, mbox):

From: Bernhard Voelker <mail <at> bernhard-voelker.de>
To: Philip Rowlands <phr+coreutils <at> dimebar.com>, 33281 <at> debbugs.gnu.org
Subject: Re: bug#33281: head does not consume input after '-c' is satisfied
Date: Tue, 6 Nov 2018 08:06:38 +0100
On 11/5/18 10:17 PM, Philip Rowlands wrote:
> On Mon, 5 Nov 2018, at 20:30, Luiz Angelo Daros de Luca wrote:
>>
>> Once head read enough bytes to satisfy -c option, it stops reading input
>> and quit.
>> This is different from what -n does and it is also different from both
>> FreeBSD and busybox head implementation.
>>
>> With GNU Coreutils head:
>>
>> $ echo -e "123\n456\n789" | { head -n 1; while read a; do echo "-$a-";
>> done; }
>> 123
> 
> This is incomplete; head doesn't read everything, but more than one line. On my (rather aged Linux) system:
> $ head --version
> head (GNU coreutils) 8.25
> 
> $ seq 1864 | { head -n 1; while read a; do echo "-$a-"; done; }
> 1
> --
> -1861-
> -1862-
> -1863-
> -1864-
> 
> What's special about 1860 lines of output? It's just over the amount of data which head reads from the pipe, 8192 bytes.

Indeed, running 'head' via 'strace' seconds that:

  read(0, "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n12\n13\n14"..., 8192) = 8192

... and: 'head' tries to "undo" the reading by calling lseek(),
but that typically fails as stdin is a pipe:

  lseek(0, -8190, SEEK_CUR)               = -1 ESPIPE (Illegal seek)

Thus said, if your input was a regular file, then this positioning back to
where the newline "\n" was would succeed:

  $ file=$(mktemp) \
      && seq 4 > "$file" \
      && { strace -ve read,lseek head -n 1; while read a; do echo "-$a-"; done; } < "$file" \
      ; rm -f "$file"
  ...
  read(0, "1\n2\n3\n4\n", 8192)           = 8
  lseek(0, -6, SEEK_CUR)                  = 2
  1
  +++ exited with 0 +++
  -2-
  -3-
  -4-

Have a nice day,
Berny




Information forwarded to bug-coreutils <at> gnu.org:
bug#33281; Package coreutils. (Tue, 06 Nov 2018 19:53:02 GMT) Full text and rfc822 format available.

Message #14 received at 33281 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Philip Rowlands <phr+coreutils <at> dimebar.com>, 33281 <at> debbugs.gnu.org
Subject: Re: bug#33281: head does not consume input after '-c' is satisfied
Date: Tue, 6 Nov 2018 11:52:25 -0800
On 11/5/18 1:17 PM, Philip Rowlands wrote:
> To achieve consistency in the other direction, head could ignore the optimization to reduce the number of bytes read, and always read 8192 bytes, knowing that some would be discarded.

Let's not do that. It's less efficient and less useful than what GNU 
'head -c4' is doing now.

> For widely differing values, the only way to produce the same residual output would be to consume all input data.

Eeuuww. Let's *especially* not do that.





Information forwarded to bug-coreutils <at> gnu.org:
bug#33281; Package coreutils. (Sun, 23 Dec 2018 05:44:01 GMT) Full text and rfc822 format available.

Message #17 received at 33281 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>,
 Philip Rowlands <phr+coreutils <at> dimebar.com>, 33281 <at> debbugs.gnu.org
Subject: Re: bug#33281: head does not consume input after '-c' is satisfied
Date: Sat, 22 Dec 2018 22:43:48 -0700
tags 33281 wontfix
severity 33281 wishlist
close 33281
stop

Hello,

On 2018-11-06 12:52 p.m., Paul Eggert wrote:
> On 11/5/18 1:17 PM, Philip Rowlands wrote:
>> To achieve consistency in the other direction, head could ignore the 
>> optimization to reduce the number of bytes read, and always read 8192 
>> bytes, knowing that some would be discarded.
> 
> Let's not do that. It's less efficient and less useful than what GNU 
> 'head -c4' is doing now.
> 
>> For widely differing values, the only way to produce the same residual 
>> output would be to consume all input data.
> 
> Eeuuww. Let's *especially* not do that.
> 

Given the above, I'm closing this as "wontfix".
Discussion can continue by replying to this thread.

-assaf






Added tag(s) wontfix. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 23 Dec 2018 05:44:03 GMT) Full text and rfc822 format available.

Severity set to 'wishlist' from 'normal' Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 23 Dec 2018 05:44:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 33281 <at> debbugs.gnu.org and Luiz Angelo Daros de Luca <luizluca <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 23 Dec 2018 05:44:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 20 Jan 2019 12:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 70 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.