GNU bug report logs - #61300
wc -c doesn't advance stdin position when it's a regular file

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: coreutils; Reported by: Stephane Chazelas <stephane@HIDDEN>; dated Sun, 5 Feb 2023 18:28:02 UTC; Maintainer for coreutils is bug-coreutils@HIDDEN.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 5 Feb 2023 18:27:37 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Sun Feb 05 13:27:37 2023
Received: from localhost ([127.0.0.1]:46409 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1pOjjk-0007y0-VK
	for submit <at> debbugs.gnu.org; Sun, 05 Feb 2023 13:27:37 -0500
Received: from lists.gnu.org ([209.51.188.17]:48658)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <stephane@HIDDEN>) id 1pOjjj-0007xt-LE
 for submit <at> debbugs.gnu.org; Sun, 05 Feb 2023 13:27:36 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <stephane@HIDDEN>)
 id 1pOjjj-0007Tz-Bx
 for bug-coreutils@HIDDEN; Sun, 05 Feb 2023 13:27:35 -0500
Received: from relay3-d.mail.gandi.net ([2001:4b98:dc4:8::223])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <stephane@HIDDEN>)
 id 1pOjjh-0006wH-Hv
 for bug-coreutils@HIDDEN; Sun, 05 Feb 2023 13:27:35 -0500
Received: (Authenticated sender: stephane@HIDDEN)
 by mail.gandi.net (Postfix) with ESMTPSA id 30EF660002
 for <bug-coreutils@HIDDEN>; Sun,  5 Feb 2023 18:27:28 +0000 (UTC)
Date: Sun, 5 Feb 2023 18:27:28 +0000
From: Stephane Chazelas <stephane@HIDDEN>
To: bug-coreutils@HIDDEN
Subject: wc -c doesn't advance stdin position when it's a regular file
Message-ID: <20230205182728.5i2oi23purlzp6jj@HIDDEN>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Received-SPF: pass client-ip=2001:4b98:dc4:8::223;
 envelope-from=stephane@HIDDEN; helo=relay3-d.mail.gandi.net
X-Spam_score_int: -25
X-Spam_score: -2.6
X-Spam_bar: --
X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7,
 SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: -1.6 (-)
X-Debbugs-Envelope-To: submit
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -2.6 (--)

"wc -c" without filename arguments is meant to read stdin til
EOF and report the number of bytes it has read.

When stdin is on a regular file, GNU wc has that optimisation
whereby it skips the reading, does a pos = lseek(0,0,SEEK_CUR)
to find out its current position within the file, fstat(0) and
reports st_size - pos (assuming st_size > pos).

However, it does not move the position to the end of the file.
That means for instance that:

$ echo test > file
$ { wc -c; wc -c; } < file
5
5

Instead of 5, then 0:

$ { wc -c; cat; } < file
5
test

So the optimisation is incomplete.

It also reports the size of the file even if it could not possibly read it
because it's not open in read mode:

{ wc -c; } 0>> file
5

IMO, it should only do the optimisation if
- fcntl(F_GETFL) to check that the file is opened in O_RDONLY or O_RDWR
- current checks for /proc /sys-like filesystems
- pos > st_size
- lseek(0,st_size,SEEK_POS) is successful.

(that leaves a race window above where it could move the cursor
backward, but I would think that can be ignored as if something
else reads at the same time, there's not much we can expect
anyway).

-- 
Stephane




Acknowledgement sent to Stephane Chazelas <stephane@HIDDEN>:
New bug report received and forwarded. Copy sent to bug-coreutils@HIDDEN. Full text available.
Report forwarded to bug-coreutils@HIDDEN:
bug#61300; Package coreutils. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sun, 5 Feb 2023 18:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.