GNU bug report logs - #61095
possible misuse of posix_spawn API on non-linux OSes

Previous Next

Package: guile;

Reported by: Omar Polo <op <at> omarpolo.com>

Date: Fri, 27 Jan 2023 11:53:01 UTC

Severity: normal

Tags: patch

Merged with 61079

Done: Ludovic Courtès <ludo <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61095 in the body.
You can then email your comments to 61095 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-guile <at> gnu.org:
bug#61095; Package guile. (Fri, 27 Jan 2023 11:53:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Omar Polo <op <at> omarpolo.com>:
New bug report received and forwarded. Copy sent to bug-guile <at> gnu.org. (Fri, 27 Jan 2023 11:53:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Omar Polo <op <at> omarpolo.com>
To: bug-guile <at> gnu.org
Subject: possible misuse of posix_spawn API on non-linux OSes
Date: Fri, 27 Jan 2023 12:51:32 +0100
Hello,

I've noticed that test-system-cmds fails on OpenBSD-CURRENT while
testing the update to guile 3.0.9:

    test-system-cmds: system* exit status was 127 rather than 42
    FAIL: test-system-cmds

Here's an excerpt of the ktrace of the child process while executing
that specific test: (the first fork() is the one implicitly done by
posix_spawn(3))

  5590 guile RET   fork 0
  [...]
  5590 guile CALL  dup2(0,3)
  5590 guile RET   dup2 3
  5590 guile CALL  dup2(1,4)
  5590 guile RET   dup2 4
  5590 guile CALL  dup2(2,5)
  5590 guile RET   dup2 5
  5590 guile CALL  dup2(3,0)
  5590 guile RET   dup2 0
  5590 guile CALL  dup2(4,1)
  5590 guile RET   dup2 1
  5590 guile CALL  dup2(5,2)
  5590 guile RET   dup2 2
  5590 guile CALL  close(1023)
  5590 guile RET   close -1 errno 9 Bad file descriptor
  5590 guile CALL  kbind(0x7f7ffffd51f8,24,0x2b5c5ced59893fa9)
  5590 guile RET   kbind 0
  5590 guile CALL  exit(127)

(if you prefer I can provide a full ktrace of guile executing that
test case)

My interpretation is that the sequence of dup2(2) is from
posix_spawn_file_actions_adddup2 in do_spawn, while the strange
close(1023) is from close_inherited_fds_slow.  Such file descriptor is
not open, so close(2) fails with EBADF and the posix_spawn machinery
exits prematurely.  My current RLIMIT_NOFILE is 1024, so the number
would make sense.

On OpenBSD I've tried to use the following patch to work around the
issue:

[[[
Index: libguile/posix.c
--- libguile/posix.c.orig
+++ libguile/posix.c
@@ -1325,6 +1325,7 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
 static void
 close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
 {
+  max_fd = getdtablecount();
   while (--max_fd > 2)
     posix_spawn_file_actions_addclose (actions, max_fd);
 }
]]]

getdtablecount(2) returns the number of file descriptor currently open
by the process.  unfortunately it doesn't seem to be portable.  (well,
tbf /proc/self/fd is not portable too.)

However, while this pleases the system* test, it breaks the pipe
tests:

    Running popen.test
    FAIL: popen.test: open-input-pipe: echo hello
    FAIL: popen.test: pipeline - arguments: (expected-value ("HELLO WORLD\n" (0 0)) actual-value ("" (127 0)))

the reason seem to be similar:

 74865 guile    CALL  dup2(7,3)
 74865 guile    RET   dup2 3
 74865 guile    CALL  dup2(10,4)
 74865 guile    RET   dup2 4
 74865 guile    CALL  dup2(2,5)
 74865 guile    RET   dup2 5
 74865 guile    CALL  dup2(3,0)
 74865 guile    RET   dup2 0
 74865 guile    CALL  dup2(4,1)
 74865 guile    RET   dup2 1
 74865 guile    CALL  dup2(5,2)
 74865 guile    RET   dup2 2
 74865 guile    CALL  close(8)
 74865 guile    RET   close -1 errno 9 Bad file descriptor
 74865 guile    CALL  kbind(0x7f7ffffcfa88,24,0x2125923bdf2ca9e)
 74865 guile    RET   kbind 0
 74865 guile    CALL  exit(127)

I guess it's trying to close the fd of the pipe that was closed.

I'm not sure what to do from here, I'm not used to the posix_spawn_*
APIs.  I'm happy to help testing diffs or by providing more info if
needed.


Thanks,

Omar Polo




Information forwarded to bug-guile <at> gnu.org:
bug#61095; Package guile. (Fri, 27 Jan 2023 12:26:02 GMT) Full text and rfc822 format available.

Message #8 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Omar Polo <op <at> omarpolo.com>
To: 61095 <at> debbugs.gnu.org
Subject: Re: possible misuse of posix_spawn API on non-linux OSes
Date: Fri, 27 Jan 2023 13:25:09 +0100
Actually I can avoid the EBADF by checking that the fd is 'live' with
something like fstat:

[[[
Index: libguile/posix.c
--- libguile/posix.c.orig
+++ libguile/posix.c
@@ -1325,8 +1325,12 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
 static void
 close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
 {
-  while (--max_fd > 2)
-    posix_spawn_file_actions_addclose (actions, max_fd);
+  struct stat sb;
+  max_fd = getdtablecount();
+  while (--max_fd > 2) {
+    if (fstat(max_fd, &sb) != -1)
+      posix_spawn_file_actions_addclose (actions, max_fd);
+  }
 }
 
 static void

]]]

The regress passes and while this workaround may be temporarly
acceptable I -personally- don't like it much.  There's a reason guile
can't set CLOEXEC for all the file descriptors > 2 obtained via open,
socket, pipe, ... like perl -for example- does?




Merged 61079 61095. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Mon, 27 Mar 2023 13:34:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guile <at> gnu.org:
bug#61095; Package guile. (Tue, 28 Mar 2023 09:35:02 GMT) Full text and rfc822 format available.

Message #13 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Omar Polo <op <at> omarpolo.com>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, Andrew Whatson <whatson <at> tailcall.au>,
 61095 <at> debbugs.gnu.org
Subject: Re: bug#61095: possible misuse of posix_spawn API on non-linux OSes
Date: Tue, 28 Mar 2023 11:34:16 +0200
[Message part 1 (text/plain, inline)]
Hi Omar,

Apologies for the late reply.

Omar Polo <op <at> omarpolo.com> skribis:

> I've noticed that test-system-cmds fails on OpenBSD-CURRENT while
> testing the update to guile 3.0.9:
>
>     test-system-cmds: system* exit status was 127 rather than 42
>     FAIL: test-system-cmds

We’re seeing the same failure on GNU/Hurd:

  https://issues.guix.gnu.org/61079

> Actually I can avoid the EBADF by checking that the fd is 'live' with
> something like fstat:
>
> [[[
>
> Index: libguile/posix.c
> --- libguile/posix.c.orig
> +++ libguile/posix.c
> @@ -1325,8 +1325,12 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
>  static void
>  close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
>  {
> -  while (--max_fd > 2)
> -    posix_spawn_file_actions_addclose (actions, max_fd);
> +  struct stat sb;
> +  max_fd = getdtablecount();
> +  while (--max_fd > 2) {
> +    if (fstat(max_fd, &sb) != -1)
> +      posix_spawn_file_actions_addclose (actions, max_fd);
> +  }
>  }

I came up with the following patch:

[Message part 2 (text/x-patch, inline)]
diff --git a/libguile/posix.c b/libguile/posix.c
index 3a8be94e4..cde199888 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1322,39 +1322,18 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
 #undef FUNC_NAME
 #endif /* HAVE_FORK */
 
-static void
-close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
-{
-  while (--max_fd > 2)
-    posix_spawn_file_actions_addclose (actions, max_fd);
-}
-
 static void
 close_inherited_fds (posix_spawn_file_actions_t *actions, int max_fd)
 {
-  DIR *dirp;
-  struct dirent *d;
-  int fd;
-
-  /* Try to use the platform-specific list of open file descriptors, so
-     we don't need to use the brute force approach. */
-  dirp = opendir ("/proc/self/fd");
-
-  if (dirp == NULL)
-    return close_inherited_fds_slow (actions, max_fd);
-
-  while ((d = readdir (dirp)) != NULL)
+  while (--max_fd > 2)
     {
-      fd = atoi (d->d_name);
-
-      /* Skip "." and "..", garbage entries, stdin/stdout/stderr. */
-      if (fd <= 2)
-        continue;
-
-      posix_spawn_file_actions_addclose (actions, fd);
+      /* Adding invalid file descriptors to an 'addclose' action leads
+         to 'posix_spawn' failures on some operating systems:
+         <https://bugs.gnu.org/61095>.  Hence the extra check.  */
+      int flags = fcntl (max_fd, F_GETFD, NULL);
+      if ((flags >= 0) && ((flags & FD_CLOEXEC) == 0))
+        posix_spawn_file_actions_addclose (actions, max_fd);
     }
-
-  closedir (dirp);
 }
 
 static pid_t
@@ -1366,6 +1345,26 @@ do_spawn (char *exec_file, char **exec_argv, char **exec_env,
   posix_spawn_file_actions_t actions;
   posix_spawnattr_t *attrp = NULL;
 
+  posix_spawn_file_actions_init (&actions);
+
+  /* Duplicate IN, OUT, and ERR unconditionally to clear their
+     FD_CLOEXEC flag, if any.  */
+  posix_spawn_file_actions_adddup2 (&actions, in, STDIN_FILENO);
+  posix_spawn_file_actions_adddup2 (&actions, out, STDOUT_FILENO);
+  posix_spawn_file_actions_adddup2 (&actions, err, STDERR_FILENO);
+
+  /* TODO: Use 'closefrom' where available.  */
+#if 0
+  /* Version 2.34 of the GNU libc provides this function.  */
+  posix_spawn_file_actions_addclosefrom_np (&actions, 3);
+#else
+  if (in > 2)
+    posix_spawn_file_actions_addclose (&actions, in);
+  if (out > 2 && out != in)
+    posix_spawn_file_actions_addclose (&actions, out);
+  if (err > 2 && err != out && err != in)
+    posix_spawn_file_actions_addclose (&actions, err);
+
   int max_fd = 1024;
 
 #if defined (HAVE_GETRLIMIT) && defined (RLIMIT_NOFILE)
@@ -1376,31 +1375,8 @@ do_spawn (char *exec_file, char **exec_argv, char **exec_env,
   }
 #endif
 
-  posix_spawn_file_actions_init (&actions);
-
-  int free_fd_slots = 0;
-  int fd_slot[3];
-
-  for (int fdnum = 3; free_fd_slots < 3 && fdnum < max_fd; fdnum++)
-    {
-      if (fdnum != in && fdnum != out && fdnum != err)
-        {
-          fd_slot[free_fd_slots] = fdnum;
-          free_fd_slots++;
-        }
-    }
-
-  /* Move the fds out of the way, so that duplicate fds or fds equal
-     to 0, 1, 2 don't trample each other */
-
-  posix_spawn_file_actions_adddup2 (&actions, in, fd_slot[0]);
-  posix_spawn_file_actions_adddup2 (&actions, out, fd_slot[1]);
-  posix_spawn_file_actions_adddup2 (&actions, err, fd_slot[2]);
-  posix_spawn_file_actions_adddup2 (&actions, fd_slot[0], 0);
-  posix_spawn_file_actions_adddup2 (&actions, fd_slot[1], 1);
-  posix_spawn_file_actions_adddup2 (&actions, fd_slot[2], 2);
-
   close_inherited_fds (&actions, max_fd);
+#endif
 
   int res = -1;
   if (spawnp)
[Message part 3 (text/plain, inline)]
Could you confirm that it works on OpenBSD and that there’s no
performance regression?

Andrew: it removes the /proc/self/fd loop you added to fix
<https://bugs.gnu.org/59321>, but it reduces the number of ‘close’ calls
in the child.  Could you check whether that’s okay performance-wise?

Eventually I plan to use ‘posix_spawn_file_actions_addclosefrom_np’ on
glibc >= 2.34, but I have yet to test it.  That will be the best
solution.

Josselin: I simplified the ‘dup2’ logic somewhat.

Feedback welcome!

> The regress passes and while this workaround may be temporarly
> acceptable I -personally- don't like it much.  There's a reason guile
> can't set CLOEXEC for all the file descriptors > 2 obtained via open,
> socket, pipe, ... like perl -for example- does?

Guile does that for file descriptors it opens internally, but
applications using ‘open-file’ without the recently-added “e” flag, or
‘socket’ without ‘SOCK_CLOEXEC’, etc., end up with more file descriptors
that need to be taken care of.

I wish the default were close-on-exec, but we’re not there yet.

Thanks,
Ludo’.

Information forwarded to bug-guile <at> gnu.org:
bug#61095; Package guile. (Tue, 28 Mar 2023 16:11:02 GMT) Full text and rfc822 format available.

Message #16 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Josselin Poiret <dev <at> jpoiret.xyz>
To: Ludovic Courtès <ludo <at> gnu.org>, Omar Polo
 <op <at> omarpolo.com>
Cc: Andrew Whatson <whatson <at> tailcall.au>, 61095 <at> debbugs.gnu.org
Subject: Re: bug#61095: possible misuse of posix_spawn API on non-linux OSes
Date: Tue, 28 Mar 2023 18:10:27 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

Ludovic Courtès <ludo <at> gnu.org> writes:

> -      posix_spawn_file_actions_addclose (actions, fd);
> +      /* Adding invalid file descriptors to an 'addclose' action leads
> +         to 'posix_spawn' failures on some operating systems:
> +         <https://bugs.gnu.org/61095>.  Hence the extra check.  */
> +      int flags = fcntl (max_fd, F_GETFD, NULL);
> +      if ((flags >= 0) && ((flags & FD_CLOEXEC) == 0))
> +        posix_spawn_file_actions_addclose (actions, max_fd);
>      }

I'm worried about TOCTOU in multi-threaded contexts here, which is why I
opted for the heavy handed-approach here.  In general I don't think we
can know in advance which fdes to close :/ It's a shame that the
posix_spawn actions fails on other kernels, since I don't really see a
way to mitigate this problem (apart from the new
posix_spawn_file_actions_addclosefrom_np as you mentioned).  I don't
know what we could do here.  Maybe not provide spawn?  Or provide it in
spite of the broken fd closing?

> -
> -  closedir (dirp);
>  }
>  
>  static pid_t
> @@ -1366,6 +1345,26 @@ do_spawn (char *exec_file, char **exec_argv, char **exec_env,
>    posix_spawn_file_actions_t actions;
>    posix_spawnattr_t *attrp = NULL;
>  
> +  posix_spawn_file_actions_init (&actions);
> +
> +  /* Duplicate IN, OUT, and ERR unconditionally to clear their
> +     FD_CLOEXEC flag, if any.  */
> +  posix_spawn_file_actions_adddup2 (&actions, in, STDIN_FILENO);
> +  posix_spawn_file_actions_adddup2 (&actions, out, STDOUT_FILENO);
> +  posix_spawn_file_actions_adddup2 (&actions, err, STDERR_FILENO);

This won't work, and actually this was one of the original logic bugs I
was trying to fix.  If err is equal to, let's say, STDIN_FILENO, then
the first call will overwrite the initial file descriptor at
STDIN_FILENO, and the second call won't do what the caller intended.
This is why I was moving them out of the way first, so that they would
not overwrite each other.

> +  /* TODO: Use 'closefrom' where available.  */
> +#if 0
> +  /* Version 2.34 of the GNU libc provides this function.  */
> +  posix_spawn_file_actions_addclosefrom_np (&actions, 3);
> +#else
> +  if (in > 2)
> +    posix_spawn_file_actions_addclose (&actions, in);
> +  if (out > 2 && out != in)
> +    posix_spawn_file_actions_addclose (&actions, out);
> +  if (err > 2 && err != out && err != in)
> +    posix_spawn_file_actions_addclose (&actions, err);

Isn't this unneeded given we call close_inherited_fds below?

> [...]
>
> Josselin: I simplified the ‘dup2’ logic somewhat.

See my comments above.

> Guile does that for file descriptors it opens internally, but
> applications using ‘open-file’ without the recently-added “e” flag, or
> ‘socket’ without ‘SOCK_CLOEXEC’, etc., end up with more file descriptors
> that need to be taken care of.
>
> I wish the default were close-on-exec, but we’re not there yet.

+1

Best,
-- 
Josselin Poiret
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#61095; Package guile. (Wed, 29 Mar 2023 22:31:02 GMT) Full text and rfc822 format available.

Message #19 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Josselin Poiret <dev <at> jpoiret.xyz>
Cc: 61095 <at> debbugs.gnu.org, Andrew Whatson <whatson <at> tailcall.au>,
 Omar Polo <op <at> omarpolo.com>
Subject: Re: bug#61095: possible misuse of posix_spawn API on non-linux OSes
Date: Thu, 30 Mar 2023 00:30:04 +0200
Hi!

Josselin Poiret <dev <at> jpoiret.xyz> skribis:

> Ludovic Courtès <ludo <at> gnu.org> writes:
>
>> -      posix_spawn_file_actions_addclose (actions, fd);
>> +      /* Adding invalid file descriptors to an 'addclose' action leads
>> +         to 'posix_spawn' failures on some operating systems:
>> +         <https://bugs.gnu.org/61095>.  Hence the extra check.  */
>> +      int flags = fcntl (max_fd, F_GETFD, NULL);
>> +      if ((flags >= 0) && ((flags & FD_CLOEXEC) == 0))
>> +        posix_spawn_file_actions_addclose (actions, max_fd);
>>      }
>
> I'm worried about TOCTOU in multi-threaded contexts here,

Yes, that’s a problem.  The current /proc/self/fd optimization has that
problem too.  :-/

> which is why I opted for the heavy handed-approach here.  In general I
> don't think we can know in advance which fdes to close :/ It's a shame
> that the posix_spawn actions fails on other kernels, since I don't
> really see a way to mitigate this problem (apart from the new
> posix_spawn_file_actions_addclosefrom_np as you mentioned).  I don't
> know what we could do here.  Maybe not provide spawn?  Or provide it
> in spite of the broken fd closing?

Not providing ‘spawn’ is no longer an option.

We can expect the problem to practically vanish “soon” on GNU variants
with ‘closefrom’ (glibc 2.34 was released in Aug. 2021).

On Linux with glibc < 2.34, we’d keep the current code (maybe without
the /proc/self/fd optimization?).

On other systems, we can have the racy code above as a last resort or
OS-specific tricks, like Omar was suggesting for OpenBSD.  It sucks but
what else can we do?

(BTW,
<https://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_spawn_file_actions_addclose.html>
reads:

  It shall not be considered an error for the fildes argument passed to
  these functions to specify a file descriptor for which the specified
  operation could not be performed at the time of the call. Any such
  error will be detected when the associated file actions object is
  later used during a posix_spawn() or posix_spawnp() operation.

OpenBSD and GNU/Hurd follow this to the letter.

OTOH ‘linux/spawni.c’ in glibc is purposefully more liberal:

  /* Signal errors only for file descriptors out of range.  */
)

>> +  /* Duplicate IN, OUT, and ERR unconditionally to clear their
>> +     FD_CLOEXEC flag, if any.  */
>> +  posix_spawn_file_actions_adddup2 (&actions, in, STDIN_FILENO);
>> +  posix_spawn_file_actions_adddup2 (&actions, out, STDOUT_FILENO);
>> +  posix_spawn_file_actions_adddup2 (&actions, err, STDERR_FILENO);
>
> This won't work, and actually this was one of the original logic bugs I
> was trying to fix.  If err is equal to, let's say, STDIN_FILENO, then
> the first call will overwrite the initial file descriptor at
> STDIN_FILENO, and the second call won't do what the caller intended.
> This is why I was moving them out of the way first, so that they would
> not overwrite each other.

Oh, my bad.

>> +  /* TODO: Use 'closefrom' where available.  */
>> +#if 0
>> +  /* Version 2.34 of the GNU libc provides this function.  */
>> +  posix_spawn_file_actions_addclosefrom_np (&actions, 3);
>> +#else
>> +  if (in > 2)
>> +    posix_spawn_file_actions_addclose (&actions, in);
>> +  if (out > 2 && out != in)
>> +    posix_spawn_file_actions_addclose (&actions, out);
>> +  if (err > 2 && err != out && err != in)
>> +    posix_spawn_file_actions_addclose (&actions, err);
>
> Isn't this unneeded given we call close_inherited_fds below?

No, because of the FD_CLOEXEC selection.

Coming next is an updated patch series addressing this as proposed
above.  Let me know what y’all think!

I tested the ‘posix_spawn_file_actions_addclosefrom_np’ path by building in:

  guix time-machine --branch=core-updates -- shell -CP -D -f guix.scm

… which gives us glibc 2.35.

Ludo’.




Information forwarded to dev <at> jpoiret.xyz, op <at> omarpolo.com, whatson <at> tailcall.au, bug-guile <at> gnu.org:
bug#61095; Package guile. (Wed, 29 Mar 2023 22:32:01 GMT) Full text and rfc822 format available.

Message #22 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 61095 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 1/3] 'spawn' closes only open file descriptors on
 non-GNU/Linux systems.
Date: Thu, 30 Mar 2023 00:30:55 +0200
Fixes <https://bugs.gnu.org/61095>.
Reported by Omar Polo <op <at> omarpolo.com>.

* libguile/posix.c (close_inherited_fds_slow): On systems other than
GNU/Linux, call 'addclose' only when 'fcntl' succeeds on MAX_FD.
---
 libguile/posix.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/libguile/posix.c b/libguile/posix.c
index 3a8be94e4..68e9bfade 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1326,7 +1326,24 @@ static void
 close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
 {
   while (--max_fd > 2)
-    posix_spawn_file_actions_addclose (actions, max_fd);
+    {
+      /* Adding a 'close' action for a file descriptor that is not open
+         causes 'posix_spawn' to fail on GNU/Hurd and on OpenBSD, but
+         not on GNU/Linux: <https://bugs.gnu.org/61095>.  Hence this
+         strategy:
+
+           - On GNU/Linux, close every FD, since that's the only
+             race-free way to make sure the child doesn't inherit one.
+           - On other systems, only close FDs currently open in the
+             parent; it works, but it's racy (XXX).
+
+         The only reliable option is 'addclosefrom'.  */
+#if ! (defined __GLIBC__ && defined __linux__)
+      int flags = fcntl (max_fd, F_GETFD, NULL);
+      if (flags >= 0)
+#endif
+        posix_spawn_file_actions_addclose (actions, max_fd);
+    }
 }
 
 static void

base-commit: e334e59589c3cbfc68d3f7d0d739000e0876b36d
-- 
2.39.2





Information forwarded to dev <at> jpoiret.xyz, op <at> omarpolo.com, whatson <at> tailcall.au, bug-guile <at> gnu.org:
bug#61095; Package guile. (Wed, 29 Mar 2023 22:32:02 GMT) Full text and rfc822 format available.

Message #25 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 61095 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 2/3] Remove racy optimized file descriptor closing loop in
 'spawn'.
Date: Thu, 30 Mar 2023 00:30:56 +0200
This reverts 9332b632407894c2e1951cce1bc678f19e1fa8e4, thereby
reinstating the performance issue in <https://bugs.gnu.org/59321>.

This optimization was subject to race conditions in multi-threaded code:
new file descriptors could pop up at any time and thus leak in the
child.

* libguile/posix.c (close_inherited_fds): Remove.
(close_inherited_fds_slow): Rename to...
(close_inherited_fds): ... this.
---
 libguile/posix.c | 30 +-----------------------------
 1 file changed, 1 insertion(+), 29 deletions(-)

diff --git a/libguile/posix.c b/libguile/posix.c
index 68e9bfade..b5830c43b 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1323,7 +1323,7 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
 #endif /* HAVE_FORK */
 
 static void
-close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
+close_inherited_fds (posix_spawn_file_actions_t *actions, int max_fd)
 {
   while (--max_fd > 2)
     {
@@ -1346,34 +1346,6 @@ close_inherited_fds_slow (posix_spawn_file_actions_t *actions, int max_fd)
     }
 }
 
-static void
-close_inherited_fds (posix_spawn_file_actions_t *actions, int max_fd)
-{
-  DIR *dirp;
-  struct dirent *d;
-  int fd;
-
-  /* Try to use the platform-specific list of open file descriptors, so
-     we don't need to use the brute force approach. */
-  dirp = opendir ("/proc/self/fd");
-
-  if (dirp == NULL)
-    return close_inherited_fds_slow (actions, max_fd);
-
-  while ((d = readdir (dirp)) != NULL)
-    {
-      fd = atoi (d->d_name);
-
-      /* Skip "." and "..", garbage entries, stdin/stdout/stderr. */
-      if (fd <= 2)
-        continue;
-
-      posix_spawn_file_actions_addclose (actions, fd);
-    }
-
-  closedir (dirp);
-}
-
 static pid_t
 do_spawn (char *exec_file, char **exec_argv, char **exec_env,
           int in, int out, int err, int spawnp)
-- 
2.39.2





Information forwarded to dev <at> jpoiret.xyz, op <at> omarpolo.com, whatson <at> tailcall.au, bug-guile <at> gnu.org:
bug#61095; Package guile. (Wed, 29 Mar 2023 22:32:02 GMT) Full text and rfc822 format available.

Message #28 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: 61095 <at> debbugs.gnu.org
Cc: Ludovic Courtès <ludo <at> gnu.org>
Subject: [PATCH 3/3] Use 'posix_spawn_file_actions_addclosefrom_np' where
 available.
Date: Thu, 30 Mar 2023 00:30:57 +0200
* configure.ac: Check for 'posix_spawn_file_actions_addclosefrom_np'.
* libguile/posix.c (HAVE_ADDCLOSEFROM): New macro.
(close_inherited_fds): Wrap in #ifdef HAVE_ADDCLOSEFROM.
(do_spawn) [HAVE_ADDCLOSEFROM]: Use 'posix_spawn_file_actions_addclosefrom_np'.
---
 configure.ac     |  4 +++-
 libguile/posix.c | 14 ++++++++++++++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/configure.ac b/configure.ac
index d5ce1c4ac..4a93be979 100644
--- a/configure.ac
+++ b/configure.ac
@@ -515,6 +515,7 @@ AC_CHECK_HEADERS([crt_externs.h])
 #   sched_getaffinity, sched_setaffinity - GNU extensions (glibc)
 #   sendfile - non-POSIX, found in glibc
 #   pipe2 - non-POSIX, found in glibc (GNU/Linux and GNU/Hurd)
+#   posix_spawn_file_actions_addclosefrom_np - glibc >= 2.34
 #
 AC_CHECK_FUNCS([DINFINITY DQNAN cexp chsize clog clog10 ctermid         \
   fesetround ftime ftruncate fchown fchownat fchmod fchdir readlinkat	\
@@ -528,7 +529,8 @@ AC_CHECK_FUNCS([DINFINITY DQNAN cexp chsize clog clog10 ctermid         \
   index bcopy rindex truncate isblank _NSGetEnviron              \
   strcoll_l strtod_l strtol_l newlocale uselocale utimensat     \
   fstatat futimens openat						\
-  sched_getaffinity sched_setaffinity sendfile pipe2])
+  sched_getaffinity sched_setaffinity sendfile pipe2
+  posix_spawn_file_actions_addclosefrom_np])
 
 # The newlib C library uses _NL_ prefixed locale langinfo constants.
 AC_CHECK_DECLS([_NL_NUMERIC_GROUPING], [], [], [[#include <langinfo.h>]])
diff --git a/libguile/posix.c b/libguile/posix.c
index b5830c43b..3adc743c4 100644
--- a/libguile/posix.c
+++ b/libguile/posix.c
@@ -1322,6 +1322,12 @@ SCM_DEFINE (scm_fork, "primitive-fork", 0, 0, 0,
 #undef FUNC_NAME
 #endif /* HAVE_FORK */
 
+#ifdef HAVE_POSIX_SPAWN_FILE_ACTIONS_ADDCLOSEFROM_NP
+# define HAVE_ADDCLOSEFROM 1
+#endif
+
+#ifndef HAVE_ADDCLOSEFROM
+
 static void
 close_inherited_fds (posix_spawn_file_actions_t *actions, int max_fd)
 {
@@ -1346,6 +1352,8 @@ close_inherited_fds (posix_spawn_file_actions_t *actions, int max_fd)
     }
 }
 
+#endif
+
 static pid_t
 do_spawn (char *exec_file, char **exec_argv, char **exec_env,
           int in, int out, int err, int spawnp)
@@ -1389,7 +1397,13 @@ do_spawn (char *exec_file, char **exec_argv, char **exec_env,
   posix_spawn_file_actions_adddup2 (&actions, fd_slot[1], 1);
   posix_spawn_file_actions_adddup2 (&actions, fd_slot[2], 2);
 
+#ifdef HAVE_ADDCLOSEFROM
+  /* This function appears in glibc 2.34.  It's both free from race
+     conditions and more efficient than the alternative.  */
+  posix_spawn_file_actions_addclosefrom_np (&actions, 3);
+#else
   close_inherited_fds (&actions, max_fd);
+#endif
 
   int res = -1;
   if (spawnp)
-- 
2.39.2





Added tag(s) patch. Request was from Ludovic Courtès <ludo <at> gnu.org> to control <at> debbugs.gnu.org. (Wed, 29 Mar 2023 22:35:01 GMT) Full text and rfc822 format available.

Information forwarded to bug-guile <at> gnu.org:
bug#61095; Package guile. (Thu, 30 Mar 2023 20:22:01 GMT) Full text and rfc822 format available.

Message #33 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Josselin Poiret <dev <at> jpoiret.xyz>
To: Ludovic Courtès <ludo <at> gnu.org>
Cc: 61095 <at> debbugs.gnu.org, Andrew Whatson <whatson <at> tailcall.au>,
 Omar Polo <op <at> omarpolo.com>
Subject: Re: bug#61095: possible misuse of posix_spawn API on non-linux OSes
Date: Thu, 30 Mar 2023 22:21:28 +0200
[Message part 1 (text/plain, inline)]
Hi Ludo,

Ludovic Courtès <ludo <at> gnu.org> writes:

> Coming next is an updated patch series addressing this as proposed
> above.  Let me know what y’all think!
>
> I tested the ‘posix_spawn_file_actions_addclosefrom_np’ path by building in:
>
>   guix time-machine --branch=core-updates -- shell -CP -D -f guix.scm

I didn't test, but this LGTM!  Maybe someone on OpenBSD could test this
patchset?

Best,
-- 
Josselin Poiret
[signature.asc (application/pgp-signature, inline)]

Information forwarded to bug-guile <at> gnu.org:
bug#61095; Package guile. (Fri, 31 Mar 2023 17:47:01 GMT) Full text and rfc822 format available.

Message #36 received at 61095 <at> debbugs.gnu.org (full text, mbox):

From: Omar Polo <op <at> omarpolo.com>
To: Josselin Poiret <dev <at> jpoiret.xyz>
Cc: Ludovic Courtès <ludo <at> gnu.org>,
 Andrew Whatson <whatson <at> tailcall.au>, 61095 <at> debbugs.gnu.org
Subject: Re: bug#61095: possible misuse of posix_spawn API on non-linux OSes
Date: Fri, 31 Mar 2023 19:45:55 +0200
On 2023/03/30 22:21:28 +0200, Josselin Poiret <dev <at> jpoiret.xyz> wrote:
> Hi Ludo,
> 
> Ludovic Courtès <ludo <at> gnu.org> writes:
> 
> > Coming next is an updated patch series addressing this as proposed
> > above.  Let me know what y’all think!
> >
> > I tested the ‘posix_spawn_file_actions_addclosefrom_np’ path by building in:
> >
> >   guix time-machine --branch=core-updates -- shell -CP -D -f guix.scm
> 
> I didn't test, but this LGTM!  Maybe someone on OpenBSD could test this
> patchset?

    % gmake check
    <snip />
    gmake[5]: Entering directory '/home/op/w/guile/test-suite/standalone'
    PASS: test-system-cmds

it seems to work on OpenBSD 7.3 :)

but note that our libc doesn't have posix_spawn_file_actions_addclosefrom_np,
so this is using the "racy" code path.

Just for curiosity, as it's outside the scope of the bug, what's the
reason posix_spawn was used instead of a more classic fork() +
closefrom()?


Thanks,

Omar Polo




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sun, 02 Apr 2023 13:45:02 GMT) Full text and rfc822 format available.

Notification sent to Omar Polo <op <at> omarpolo.com>:
bug acknowledged by developer. (Sun, 02 Apr 2023 13:45:02 GMT) Full text and rfc822 format available.

Message #41 received at 61095-done <at> debbugs.gnu.org (full text, mbox):

From: Ludovic Courtès <ludo <at> gnu.org>
To: Omar Polo <op <at> omarpolo.com>
Cc: Josselin Poiret <dev <at> jpoiret.xyz>, Andrew Whatson <whatson <at> tailcall.au>,
 61095-done <at> debbugs.gnu.org
Subject: Re: bug#61095: possible misuse of posix_spawn API on non-linux OSes
Date: Sun, 02 Apr 2023 15:44:01 +0200
Hi!

Omar Polo <op <at> omarpolo.com> skribis:

> On 2023/03/30 22:21:28 +0200, Josselin Poiret <dev <at> jpoiret.xyz> wrote:
>> Hi Ludo,
>> 
>> Ludovic Courtès <ludo <at> gnu.org> writes:
>> 
>> > Coming next is an updated patch series addressing this as proposed
>> > above.  Let me know what y’all think!
>> >
>> > I tested the ‘posix_spawn_file_actions_addclosefrom_np’ path by building in:
>> >
>> >   guix time-machine --branch=core-updates -- shell -CP -D -f guix.scm
>> 
>> I didn't test, but this LGTM!  Maybe someone on OpenBSD could test this
>> patchset?
>
>     % gmake check
>     <snip />
>     gmake[5]: Entering directory '/home/op/w/guile/test-suite/standalone'
>     PASS: test-system-cmds
>
> it seems to work on OpenBSD 7.3 :)

Awesome!  Pushed as 9cc85a4f52147fcdaa4c52a62bcc87bdb267d0a9.

> but note that our libc doesn't have posix_spawn_file_actions_addclosefrom_np,
> so this is using the "racy" code path.

Yeah, not great.  :-/  I hope that function will be adopted by other
libcs, especially since ‘closefrom’ is already available.

> Just for curiosity, as it's outside the scope of the bug, what's the
> reason posix_spawn was used instead of a more classic fork() +
> closefrom()?

There’s a long discussion at:

  https://issues.guix.gnu.org/52835

Essentially, ‘fork’ is unusable in multi-threaded context, in addition
to being inefficient.

Thanks,
Ludo’.




Reply sent to Ludovic Courtès <ludo <at> gnu.org>:
You have taken responsibility. (Sun, 02 Apr 2023 13:45:02 GMT) Full text and rfc822 format available.

Notification sent to Ludovic Courtès <ludo <at> gnu.org>:
bug acknowledged by developer. (Sun, 02 Apr 2023 13:45:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Mon, 01 May 2023 11:24:12 GMT) Full text and rfc822 format available.

This bug report was last modified 359 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.