GNU bug report logs - #31332
touch unnecessarily calls dup2

Previous Next

Package: coreutils;

Reported by: John Steele Scott <toojays <at> toojays.net>

Date: Tue, 1 May 2018 15:28:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31332 in the body.
You can then email your comments to 31332 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#31332; Package coreutils. (Tue, 01 May 2018 15:28:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to John Steele Scott <toojays <at> toojays.net>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Tue, 01 May 2018 15:28:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: John Steele Scott <toojays <at> toojays.net>
To: bug-coreutils <at> gnu.org
Subject: touch unnecessarily calls dup2
Date: Tue, 1 May 2018 20:08:50 +0930
From https://stackoverflow.com/questions/40446555/why-does-touch-call-the-dup2-syscall

jscott <at> citra:/tmp$ touch --version | head -1
touch (GNU coreutils) 8.25
jscott <at> citra:/tmp$ strace -ttt touch foo 2>&1 | tail -9
1525170579.952032 open("foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
1525170579.952080 dup2(3, 0)            = 0
1525170579.952119 close(3)              = 0
1525170579.952156 utimensat(0, NULL, NULL, 0) = 0
1525170579.952209 close(0)              = 0
1525170579.952257 close(1)              = 0
1525170579.952294 close(2)              = 0
1525170579.952333 exit_group(0)         = ?
1525170579.952450 +++ exited with 0 +++

My analysis from that discussion:

    It's a historical artifact.

    The open()+dup2() pattern comes from the fd_reopen() function, which is used
    by several of the programs in the coreutils code base.

    Prior to coreutils commit e373bb1, fd_reopen() did not do open()+dup2(), but
    closed the desired file descriptor before opening a new one. This was the
    case when touch started using this function at coreutils commit 478bd89. Per
    that commit message, the intent was to reduce the number of file descriptors
    touch would have open.

In the scheme of things the extra call to dup2() is not a big deal. We've lived
with it for 10 years. On my laptop it would take 25,000 invocations of touch
before those unnecessary dup2() calls add up to a second. But touch is called a
lot, on systems all over the world. So maybe you guys care about this?

Cheers,

John




Information forwarded to bug-coreutils <at> gnu.org:
bug#31332; Package coreutils. (Tue, 30 Oct 2018 03:18:02 GMT) Full text and rfc822 format available.

Message #8 received at 31332 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: John Steele Scott <toojays <at> toojays.net>, 31332 <at> debbugs.gnu.org
Subject: Re: bug#31332: touch unnecessarily calls dup2
Date: Mon, 29 Oct 2018 21:17:07 -0600
tags 31332 notabug
close 31332
stop

On 2018-05-01 4:38 a.m., John Steele Scott wrote:
>  From https://stackoverflow.com/questions/40446555/why-does-touch-call-the-dup2-syscall
> 
> jscott <at> citra:/tmp$ touch --version | head -1
> touch (GNU coreutils) 8.25
> jscott <at> citra:/tmp$ strace -ttt touch foo 2>&1 | tail -9
> 1525170579.952032 open("foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
> 1525170579.952080 dup2(3, 0)            = 0
> 1525170579.952119 close(3)              = 0
> 1525170579.952156 utimensat(0, NULL, NULL, 0) = 0
> 1525170579.952209 close(0)              = 0
> 1525170579.952257 close(1)              = 0
> 1525170579.952294 close(2)              = 0
> 1525170579.952333 exit_group(0)         = ?
> 1525170579.952450 +++ exited with 0 +++
> 
> My analysis from that discussion:
> 
>      It's a historical artifact.
> 
>      The open()+dup2() pattern comes from the fd_reopen() function, which is used
>      by several of the programs in the coreutils code base.
> 

Not exactly an "artifact" - but an indented operation.

The goal of "fd_reopen" is not just to open a file,
but to ensure the returned file descriptor (an "int") has a specific
value.

For example, standard input (STDIN) is typically file descriptor value 
0. calling fd_reopen first opens the file (the kernel returns file 
descriptor 3), then it ensures file descriptor 0 is associated with the 
same file (and then, calling "close(3)" gets rid of the other file 
descriptor).

Checking how fd_reopen is used in coreutils, it is almost always used
to ensure STDIN/STDOUT point to the desired file(s):
====
$ git grep fd_reopen
src/csplit.c:  if (! STREQ (name, "-") && fd_reopen (STDIN_FILENO, name, 
O_RDONLY, 0) < 0)
src/dd.c:/* Restart on EINTR from fd_reopen().  */
src/dd.c:ifd_reopen (int desired_fd, char const *file, int flag, mode_t 
mode)
src/dd.c:      ret = fd_reopen (desired_fd, file, flag, mode);
src/dd.c:      if (ifd_reopen (STDIN_FILENO, input_file, O_RDONLY | 
input_flags, 0) < 0)
src/dd.c:           || ifd_reopen (STDOUT_FILENO, output_file, O_RDWR | 
opts, perms) < 0)
src/dd.c:          && (ifd_reopen (STDOUT_FILENO, output_file, O_WRONLY 
| opts, perms)
src/nohup.c:      if (fd_reopen (STDIN_FILENO, "/dev/null", O_WRONLY, 0) 
< 0)
src/nohup.c:                ? fd_reopen (STDOUT_FILENO, file, flags, mode)
src/nohup.c:                        ? fd_reopen (STDOUT_FILENO, in_home, 
flags, mode)
src/split.c:      && fd_reopen (STDIN_FILENO, infile, O_RDONLY, 0) < 0)
src/stty.c:      if (fd_reopen (STDIN_FILENO, device_name, O_RDONLY | 
O_NONBLOCK, 0) < 0)
src/touch.c:      fd = fd_reopen (STDIN_FILENO, file,
===

This means the rest of the program can just operate on STDIN (or STDOUT).


As such, I'm closing this item.
Discussion can continue by replying to this thread.

-assaf






Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 03:18:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 31332 <at> debbugs.gnu.org and John Steele Scott <toojays <at> toojays.net> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 03:18:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-coreutils <at> gnu.org:
bug#31332; Package coreutils. (Tue, 30 Oct 2018 04:20:01 GMT) Full text and rfc822 format available.

Message #15 received at 31332 <at> debbugs.gnu.org (full text, mbox):

From: John Steele Scott <toojays <at> toojays.net>
To: Assaf Gordon <assafgordon <at> gmail.com>, 31332 <at> debbugs.gnu.org
Subject: Re: bug#31332: touch unnecessarily calls dup2
Date: Tue, 30 Oct 2018 14:37:23 +1030
Hi Assif,

Thanks for your reply.

On 30/10/18 1:47 pm, Assaf Gordon wrote:
> tags 31332 notabug
> close 31332
> stop
>
> On 2018-05-01 4:38 a.m., John Steele Scott wrote:
>>  From https://stackoverflow.com/questions/40446555/why-does-touch-call-the-dup2-syscall
>>
>> jscott <at> citra:/tmp$ touch --version | head -1
>> touch (GNU coreutils) 8.25
>> jscott <at> citra:/tmp$ strace -ttt touch foo 2>&1 | tail -9
>> 1525170579.952032 open("foo", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 3
>> 1525170579.952080 dup2(3, 0)            = 0
>> 1525170579.952119 close(3)              = 0
>> 1525170579.952156 utimensat(0, NULL, NULL, 0) = 0
>> 1525170579.952209 close(0)              = 0
>> 1525170579.952257 close(1)              = 0
>> 1525170579.952294 close(2)              = 0
>> 1525170579.952333 exit_group(0)         = ?
>> 1525170579.952450 +++ exited with 0 +++
>>
>> My analysis from that discussion:
>>
>>      It's a historical artifact.
>>
>>      The open()+dup2() pattern comes from the fd_reopen() function, which is used
>>      by several of the programs in the coreutils code base.
>>
>
> Not exactly an "artifact" - but an indented operation.
>
I called it a historical artefact because the call to dup2() did not occur when touch started calling fd_reopen(). Prior to http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=e373bb19 fd_reopen() initially did "close(desired_fd); fd = open(...)" which would always do the right thing if stdin was the desired fd. No call to dup2().


> The goal of "fd_reopen" is not just to open a file,
> but to ensure the returned file descriptor (an "int") has a specific
> value.
>
> For example, standard input (STDIN) is typically file descriptor value 0. calling fd_reopen first opens the file (the kernel returns file descriptor 3), then it ensures file descriptor 0 is associated with the same file (and then, calling "close(3)" gets rid of the other file descriptor).
>
> Checking how fd_reopen is used in coreutils, it is almost always used
> to ensure STDIN/STDOUT point to the desired file(s):
> ====
> $ git grep fd_reopen
> src/csplit.c:  if (! STREQ (name, "-") && fd_reopen (STDIN_FILENO, name, O_RDONLY, 0) < 0)
> src/dd.c:/* Restart on EINTR from fd_reopen().  */
> src/dd.c:ifd_reopen (int desired_fd, char const *file, int flag, mode_t mode)
> src/dd.c:      ret = fd_reopen (desired_fd, file, flag, mode);
> src/dd.c:      if (ifd_reopen (STDIN_FILENO, input_file, O_RDONLY | input_flags, 0) < 0)
> src/dd.c:           || ifd_reopen (STDOUT_FILENO, output_file, O_RDWR | opts, perms) < 0)
> src/dd.c:          && (ifd_reopen (STDOUT_FILENO, output_file, O_WRONLY | opts, perms)
> src/nohup.c:      if (fd_reopen (STDIN_FILENO, "/dev/null", O_WRONLY, 0) < 0)
> src/nohup.c:                ? fd_reopen (STDOUT_FILENO, file, flags, mode)
> src/nohup.c:                        ? fd_reopen (STDOUT_FILENO, in_home, flags, mode)
> src/split.c:      && fd_reopen (STDIN_FILENO, infile, O_RDONLY, 0) < 0)
> src/stty.c:      if (fd_reopen (STDIN_FILENO, device_name, O_RDONLY | O_NONBLOCK, 0) < 0)
> src/touch.c:      fd = fd_reopen (STDIN_FILENO, file,
> ===
>
> This means the rest of the program can just operate on STDIN (or STDOUT). 

I haven't looked at all those cases, but my point was that the call to dup2() adds an extra 40us to each invocation of touch. In the case of touch, it really only cares about the fd being STDOUT_FILENO or !STDOUT_FILENO; it doesn't need to be STDIN_FILENO exactly. With a small amount of work touch could use open() instead of fd_reopen(). The code would be somewhat simpler and it would save 40us of execution time.

Or maybe it wouldn't, since there's an extra fd open when the program terminates and we need to wait for the kernel to clean that up anyway? I don't know.

Cheers,

John





Information forwarded to bug-coreutils <at> gnu.org:
bug#31332; Package coreutils. (Tue, 30 Oct 2018 05:39:02 GMT) Full text and rfc822 format available.

Message #18 received at 31332 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: John Steele Scott <toojays <at> toojays.net>,
 Assaf Gordon <assafgordon <at> gmail.com>, 31332 <at> debbugs.gnu.org
Subject: Re: bug#31332: touch unnecessarily calls dup2
Date: Mon, 29 Oct 2018 22:38:36 -0700
John Steele Scott wrote:
> Prior tohttp://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=e373bb19  fd_reopen() initially did "close(desired_fd); fd = open(...)" which would always do the right thing

No, as the old code did the wrong thing for "touch /dev/stdin", whereas the 
current code works.




Information forwarded to bug-coreutils <at> gnu.org:
bug#31332; Package coreutils. (Tue, 30 Oct 2018 06:25:01 GMT) Full text and rfc822 format available.

Message #21 received at 31332 <at> debbugs.gnu.org (full text, mbox):

From: John Steele Scott <toojays <at> toojays.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Assaf Gordon <assafgordon <at> gmail.com>,
 31332 <at> debbugs.gnu.org
Subject: Re: bug#31332: touch unnecessarily calls dup2
Date: Tue, 30 Oct 2018 16:44:45 +1030
On 30/10/18 4:08 pm, Paul Eggert wrote:
> John Steele Scott wrote:
>> Prior tohttp://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=e373bb19  fd_reopen() initially did "close(desired_fd); fd = open(...)" which would always do the right thing
>
> No, as the old code did the wrong thing for "touch /dev/stdin", whereas the current code works.

I stand corrected. Okay, I see that we don't want to close /dev/stdin at the start of touch. Thanks for making me think of this case.

I'd much much obliged if one of you could enlighten me as to why touch needs to treat /dev/stdin as a special case after the file has been opened though. Wouldn't the following work just as well, with one less system call?

--- a/src/touch.c
+++ b/src/touch.c
@@ -132,8 +132,8 @@ touch (const char *file)
   else if (! (no_create || no_dereference))
     {
       /* Try to open FILE, creating it if necessary.  */
-      fd = fd_reopen (STDIN_FILENO, file,
-                      O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
+      fd = open (file,
+                 O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
 
       /* Don't save a copy of errno if it's EISDIR, since that would lead
          touch to give a bogus diagnostic for e.g., 'touch /' (assuming
@@ -166,9 +166,9 @@ touch (const char *file)
                      (no_dereference && fd == -1) ? AT_SYMLINK_NOFOLLOW : 0)
         == 0);
 
-  if (fd == STDIN_FILENO)
+  if (fd != STDOUT_FILENO)
     {
-      if (close (STDIN_FILENO) != 0)
+      if (close (fd) != 0)
         {
           error (0, errno, _("failed to close %s"), quoteaf (file));
           return false;

Cheers,

John





Information forwarded to bug-coreutils <at> gnu.org:
bug#31332; Package coreutils. (Mon, 12 Nov 2018 22:24:01 GMT) Full text and rfc822 format available.

Message #24 received at 31332 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: John Steele Scott <toojays <at> toojays.net>,
 Assaf Gordon <assafgordon <at> gmail.com>, 31332 <at> debbugs.gnu.org
Subject: Re: bug#31332: touch unnecessarily calls dup2
Date: Mon, 12 Nov 2018 14:23:04 -0800
John Steele Scott wrote:
> I'd much much obliged if one of you could enlighten me as to why touch needs to treat /dev/stdin as a special case after the file has been opened though. Wouldn't the following work just as well, with one less system call?
> 
> --- a/src/touch.c
> +++ b/src/touch.c
> @@ -132,8 +132,8 @@ touch (const char *file)
>     else if (! (no_create || no_dereference))
>       {
>         /* Try to open FILE, creating it if necessary.  */
> -      fd = fd_reopen (STDIN_FILENO, file,
> -                      O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
> +      fd = open (file,
> +                 O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
>   
>         /* Don't save a copy of errno if it's EISDIR, since that would lead
>            touch to give a bogus diagnostic for e.g., 'touch /' (assuming
> @@ -166,9 +166,9 @@ touch (const char *file)
>                        (no_dereference && fd == -1) ? AT_SYMLINK_NOFOLLOW : 0)
>           == 0);
>   
> -  if (fd == STDIN_FILENO)
> +  if (fd != STDOUT_FILENO)
>       {
> -      if (close (STDIN_FILENO) != 0)
> +      if (close (fd) != 0)

That patch has trouble if 'open' returns 1. The resulting fd is never closed and 
for some filesystems (NFS, for example) one needs to close the fd to find out 
whether the fdutimensat call actually worked.




Information forwarded to bug-coreutils <at> gnu.org:
bug#31332; Package coreutils. (Mon, 12 Nov 2018 22:42:01 GMT) Full text and rfc822 format available.

Message #27 received at 31332 <at> debbugs.gnu.org (full text, mbox):

From: John Steele Scott <toojays <at> toojays.net>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Assaf Gordon <assafgordon <at> gmail.com>,
 31332 <at> debbugs.gnu.org
Subject: Re: bug#31332: touch unnecessarily calls dup2
Date: Tue, 13 Nov 2018 09:11:36 +1030
On 13/11/18 8:53 am, Paul Eggert wrote:
> John Steele Scott wrote:
>> I'd much much obliged if one of you could enlighten me as to why touch needs to treat /dev/stdin as a special case after the file has been opened though. Wouldn't the following work just as well, with one less system call?
>>
>> --- a/src/touch.c
>> +++ b/src/touch.c
>> @@ -132,8 +132,8 @@ touch (const char *file)
>>     else if (! (no_create || no_dereference))
>>       {
>>         /* Try to open FILE, creating it if necessary.  */
>> -      fd = fd_reopen (STDIN_FILENO, file,
>> -                      O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
>> +      fd = open (file,
>> +                 O_WRONLY | O_CREAT | O_NONBLOCK | O_NOCTTY, MODE_RW_UGO);
>>           /* Don't save a copy of errno if it's EISDIR, since that would lead
>>            touch to give a bogus diagnostic for e.g., 'touch /' (assuming
>> @@ -166,9 +166,9 @@ touch (const char *file)
>>                        (no_dereference && fd == -1) ? AT_SYMLINK_NOFOLLOW : 0)
>>           == 0);
>>   -  if (fd == STDIN_FILENO)
>> +  if (fd != STDOUT_FILENO)
>>       {
>> -      if (close (STDIN_FILENO) != 0)
>> +      if (close (fd) != 0)
>
> That patch has trouble if 'open' returns 1. The resulting fd is never closed and for some filesystems (NFS, for example) one needs to close the fd to find out whether the fdutimensat call actually worked.

Thanks Paul,

So if someone did want to wipe out that dup2 call from GNU touch, they'd need to track separately whether they opened the file or whether they are using a passed-in file descriptor, rather than relying on the value of fd to indicate that. Got it.

I'll stop flogging this dead horse now. Thanks both Paul and Assaf for your replies.

Cheers,

John





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 11 Dec 2018 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 137 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.