GNU bug report logs - #68504
[PATCH] Add copy-on-write support to scm_copy_file.

Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.

Package: guile; Reported by: Tomas Volf <~@wolfsden.cz>; Keywords: patch; dated Tue, 16 Jan 2024 12:49:02 UTC; Maintainer for guile is bug-guile@HIDDEN.

Message received at submit <at> debbugs.gnu.org:


Received: (at submit) by debbugs.gnu.org; 16 Jan 2024 12:48:46 +0000
From debbugs-submit-bounces <at> debbugs.gnu.org Tue Jan 16 07:48:46 2024
Received: from localhost ([127.0.0.1]:48058 helo=debbugs.gnu.org)
	by debbugs.gnu.org with esmtp (Exim 4.84_2)
	(envelope-from <debbugs-submit-bounces <at> debbugs.gnu.org>)
	id 1rPis1-0004WA-HJ
	for submit <at> debbugs.gnu.org; Tue, 16 Jan 2024 07:48:46 -0500
Received: from lists.gnu.org ([2001:470:142::17]:37988)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <~@wolfsden.cz>) id 1rPirz-0004VP-TC
 for submit <at> debbugs.gnu.org; Tue, 16 Jan 2024 07:48:44 -0500
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1rPirr-0003b2-AF
 for bug-guile@HIDDEN; Tue, 16 Jan 2024 07:48:35 -0500
Received: from wolfsden.cz ([37.205.8.62])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <~@wolfsden.cz>) id 1rPirn-0003la-Tp
 for bug-guile@HIDDEN; Tue, 16 Jan 2024 07:48:35 -0500
Received: by wolfsden.cz (Postfix, from userid 104)
 id 50577256DE4; Tue, 16 Jan 2024 12:48:28 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail;
 t=1705409308; bh=alskAz+dpfsEoJuYXtXbnk5z/cVM/q0gpTfYVmArpjg=;
 h=From:To:Cc:Subject:Date;
 b=Eb3iz9psQO3TFI2ALRIujHeBEzatfsNEt/V+SpfmINU11WEXM9EbkmlkusQk17nf6
 FcKdIyVzZi7qOjOGNzK5DWhVPxNZpJZXi8tuBbOkrUkcEjiSFmP+BKCBecNHtKgk0K
 pnRka7OyYzr6UECCxVXPuG4pXMAhUU8MxubnoosxRI5xanwg4CW2f1553MIME+SlkH
 eGIE9mMOP8xfLbXE0IHXyiElwAv2fTdT8FKup7GN8wh6IS7+vOzhJm8SWfIhfQWAJM
 fAkbUfRPYf+wHzhY+VR6m1oFX+7RJx0AHqJmXnAz71Le7Xd79qZVvfmnhCwvtu0UOL
 Z7RmwHRgVVy/HJzW4dDoCimEIqcCwaCxSTy19OzymdJSy+gM3BazqgHtJUCqwL0N9C
 i+ghlfDGD3Vk79Qf4HfxzmxVj89rRn1y1cnf+464achw9Nb5raOgNu+ayrmXwGd8nB
 RDfxaZXQaNbEX9SRJQEIfnND3aIWSM2iHj1fN970w4C4ogAKsgpz4srf1SOYe2Ln3w
 18bSZfjPBvrspvhxT9hjXlcd+iEQuJPwOTHDrJOhEzkq+fgavUCX9OrS7VZJo1Knil
 a955iuZSnURKn1U2Kom8sRQLlQ0HYaR3S5eRlW3Zmy+Z6CUatgQ1duaiEV5KSebaNK
 MPq1YBK9WUNJWg/McaOxS8eE=
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on wolfsden
X-Spam-Level: 
X-Spam-Status: No, score=-1.2 required=5.0 tests=ALL_TRUSTED,DKIM_SIGNED,
 DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,T_SCC_BODY_TEXT_LINE,
 URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6
Received: from localhost (unknown [81.17.16.72])
 by wolfsden.cz (Postfix) with ESMTPSA id 905FF256DE3;
 Tue, 16 Jan 2024 12:48:27 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=wolfsden.cz; s=mail;
 t=1705409307; bh=alskAz+dpfsEoJuYXtXbnk5z/cVM/q0gpTfYVmArpjg=;
 h=From:To:Cc:Subject:Date;
 b=wYM7mOjbW02LJpZ3vuxsUAFyaoU+UbEHSzzX1u1hEntfF1ZkcYneHwI54cnN0AJwm
 HHFnZfUVfwWIJueEFZXQoRTZ47GPYYBsF95RNZlmZ8VpTMu1myvNfpmy1RknGawV1C
 C+TRdzOEZdQNEQMiaFEyih8B4/7cwcTr9mPpp8CrQi7IlABVKuKRww5tUDmSrcNjFE
 asdV5yPZ11Yc85A69fY1P96Kb1u0eScCYigqNVm4gJai+O24cTzUsYbQ78qoQkZceZ
 l6C0E1nKAS9oGGuSjOJ2iVHDC6RwaJu7yFfpgC6I3UNjGl4PhIp2hrHuiKQUHipBgY
 r9t/DvfyPiCqUyGEOgjjaWxzH0qhJmkkKWuHHz3aPUjnWNcgicQfLjxj3YnPSTQFxV
 14+4HxH/inRV8or5QJNLa9/SQIMZx1Z6BX1LKxpjxFlKTDequXAa7RjR2k7CY42fnt
 lvIO5TqAfAYhuZ+lDDAXGpM8Z7ouxPhFRb8wqKEVaH0YZTd6UddnDRjeoeFecBhQjc
 XwG7HZtIqzlrRkP0QRcqbFmjXq/FI0obKmBAwwHo+jQkEBV+bCk1G3zx1Bi8lN+rzu
 JV6z3N+zH/11KSGFybJdNwoNmfLueW9z5DRp56oYD/0vWEbWKMqKOpInZiy3l7hC8G
 G8m9wGo/Xk1NfxNb4R07J5Qs=
From: Tomas Volf <~@wolfsden.cz>
To: bug-guile@HIDDEN
Subject: [PATCH] Add copy-on-write support to scm_copy_file.
Date: Tue, 16 Jan 2024 13:48:17 +0100
Message-ID: <20240116124817.14680-1-~@wolfsden.cz>
X-Mailer: git-send-email 2.41.0
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=37.205.8.62; envelope-from=~@wolfsden.cz;
 helo=wolfsden.cz
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001,
 SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-Spam-Score: 1.0 (+)
X-Debbugs-Envelope-To: submit
Cc: Tomas Volf <~@wolfsden.cz>
X-BeenThere: debbugs-submit <at> debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
List-Id: <debbugs-submit.debbugs.gnu.org>
List-Unsubscribe: <https://debbugs.gnu.org/cgi-bin/mailman/options/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=unsubscribe>
List-Archive: <https://debbugs.gnu.org/cgi-bin/mailman/private/debbugs-submit/>
List-Post: <mailto:debbugs-submit <at> debbugs.gnu.org>
List-Help: <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=help>
List-Subscribe: <https://debbugs.gnu.org/cgi-bin/mailman/listinfo/debbugs-submit>, 
 <mailto:debbugs-submit-request <at> debbugs.gnu.org?subject=subscribe>
Errors-To: debbugs-submit-bounces <at> debbugs.gnu.org
Sender: "Debbugs-submit" <debbugs-submit-bounces <at> debbugs.gnu.org>
X-Spam-Score: -0.0 (/)

On modern file-systems (BTRFS, ZFS) it is possible to copy a file using
copy-on-write method.  For large files it has the advantage of being
much faster and saving disk space (since identical extents are not
duplicated).  This feature is stable and for example coreutils' `cp'
does use it automatically (see --reflink).

This commit adds support for this feature into our
copy-file (scm_copy_file) procedure.  Same as `cp', it defaults to
'auto, meaning the copy-on-write is attempted, and in case of failure
the regular copy is performed.

No tests are provided, because the behavior depends on the system,
underlying file-system and its configuration.  That makes it challenging
to write a test for it.  Manual testing was performed instead:

    $ btrfs filesystem du /tmp/cow*
         Total   Exclusive  Set shared  Filename
      36.00KiB    36.00KiB       0.00B  /tmp/cow

    $ cat cow-test.scm
    (copy-file "/tmp/cow" "/tmp/cow-unspecified")
    (copy-file "/tmp/cow" "/tmp/cow-always" #:copy-on-write 'always)
    (copy-file "/tmp/cow" "/tmp/cow-auto" #:copy-on-write 'auto)
    (copy-file "/tmp/cow" "/tmp/cow-never" #:copy-on-write 'never)
    (copy-file "/tmp/cow" "/dev/shm/cow-unspecified")
    (copy-file "/tmp/cow" "/dev/shm/cow-auto" #:copy-on-write 'auto)
    (copy-file "/tmp/cow" "/dev/shm/cow-never" #:copy-on-write 'never)
    $ ./meta/guile -s cow-test.scm

    $ btrfs filesystem du /tmp/cow*
         Total   Exclusive  Set shared  Filename
      36.00KiB       0.00B    36.00KiB  /tmp/cow
      36.00KiB       0.00B    36.00KiB  /tmp/cow-always
      36.00KiB       0.00B    36.00KiB  /tmp/cow-auto
      36.00KiB    36.00KiB       0.00B  /tmp/cow-never
      36.00KiB       0.00B    36.00KiB  /tmp/cow-unspecified

    $ sha1sum /tmp/cow* /dev/shm/cow*
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /tmp/cow
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /tmp/cow-always
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /tmp/cow-auto
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /tmp/cow-never
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /tmp/cow-unspecified
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /dev/shm/cow-auto
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /dev/shm/cow-never
    4c665f87b5dc2e7d26279c4b48968d085e1ace32  /dev/shm/cow-unspecified

This commit also adds to new failure modes for (copy-file).

Failure to copy-on-write when 'always was passed in:

    scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'always)
    ice-9/boot-9.scm:1676:22: In procedure raise-exception:
    In procedure copy-file: copy-on-write failed: Invalid cross-device link

Passing in invalid value for the #:copy-on-write keyword argument:

    scheme@(guile-user)> (copy-file "/tmp/cow" "/dev/shm/cow" #:copy-on-write 'nevr)
    ice-9/boot-9.scm:1676:22: In procedure raise-exception:
    In procedure copy-file: invalid value for #:copy-on-write: nevr

* NEWS: Add note for copy-file supporting copy-on-write.
* configure.ac: Check for linux/fs.h.
* doc/ref/posix.texi (File System)[copy-file]: Document the new
signature.
* libguile/filesys.c (clone_file): New function cloning a file using
FICLONE, if supported.
(k_copy_on_write): New keyword.
(sym_always, sym_auto, sym_never): New symbols.
(scm_copy_file): New #:copy-on-write keyword argument.  Attempt
copy-on-write copy by default.
* libguile/filesys.h: Update signature for scm_copy_file.
---
 NEWS               |  9 ++++++
 configure.ac       |  1 +
 doc/ref/posix.texi |  9 +++++-
 libguile/filesys.c | 74 +++++++++++++++++++++++++++++++++++++++-------
 libguile/filesys.h |  2 +-
 5 files changed, 82 insertions(+), 13 deletions(-)

diff --git a/NEWS b/NEWS
index b319404d7..9147098c9 100644
--- a/NEWS
+++ b/NEWS
@@ -21,6 +21,15 @@ definitely unused---this is notably the case for modules that are only
 used at macro-expansion time, such as (srfi srfi-26).  In those cases,
 the compiler reports it as "possibly unused".
 
+** copy-file now supports copy-on-write
+
+The copy-file procedure now takes an additional keyword argument,
+#:copy-on-write, specifying whether copy-on-write should be done, if the
+underlying file-system supports it.  Possible values are 'always, 'auto
+and 'never, with 'auto being the default.
+
+This speeds up copying large files a lot while saving the disk space.
+
 * Bug fixes
 
 ** (ice-9 suspendable-ports) incorrect UTF-8 decoding
diff --git a/configure.ac b/configure.ac
index d0a2dc79b..c46586e9b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -418,6 +418,7 @@ AC_SUBST([SCM_I_GSC_HAVE_STRUCT_DIRENT64])
 #   sys/sendfile.h - non-POSIX, found in glibc
 #
 AC_CHECK_HEADERS([complex.h fenv.h io.h memory.h process.h \
+linux/fs.h \
 sys/dir.h sys/ioctl.h sys/select.h \
 sys/time.h sys/timeb.h sys/times.h sys/stdtypes.h sys/types.h \
 sys/utime.h unistd.h utime.h pwd.h grp.h sys/utsname.h \
diff --git a/doc/ref/posix.texi b/doc/ref/posix.texi
index fec42d061..d26808d91 100644
--- a/doc/ref/posix.texi
+++ b/doc/ref/posix.texi
@@ -896,10 +896,17 @@ of @code{delete-file}.  Why doesn't POSIX have a @code{rmdirat} function
 for this instead?  No idea!
 @end deffn
 
-@deffn {Scheme Procedure} copy-file oldfile newfile
+@deffn {Scheme Procedure} copy-file @var{oldfile} @var{newfile} @
+       [#:copy-on-write='auto]
 @deffnx {C Function} scm_copy_file (oldfile, newfile)
 Copy the file specified by @var{oldfile} to @var{newfile}.
 The return value is unspecified.
+
+@code{#:copy-on-write} keyword argument determines whether copy-on-write
+copy should be attempted and the behavior in case of failure.  Possible
+values are @code{'always} (attempt the copy-on-write, return error if it
+fails), @code{'auto} (attempt the copy-on-write, fallback to regular
+copy if it fails) and @code{'never} (perform the regular copy).
 @end deffn
 
 @deffn {Scheme Procedure} sendfile out in count [offset]
diff --git a/libguile/filesys.c b/libguile/filesys.c
index 1f0bba556..4fb8b9831 100644
--- a/libguile/filesys.c
+++ b/libguile/filesys.c
@@ -67,6 +67,11 @@
 # include <sys/sendfile.h>
 #endif
 
+#if defined(HAVE_SYS_IOCTL_H) && defined(HAVE_LINUX_FS_H)
+# include <linux/fs.h>
+# include <sys/ioctl.h>
+#endif
+
 #include "async.h"
 #include "boolean.h"
 #include "dynwind.h"
@@ -75,6 +80,7 @@
 #include "fports.h"
 #include "gsubr.h"
 #include "iselect.h"
+#include "keywords.h"
 #include "list.h"
 #include "load.h"	/* for scm_i_mirror_backslashes */
 #include "modules.h"
@@ -1255,20 +1261,49 @@ SCM_DEFINE (scm_readlink, "readlink", 1, 0, 0,
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0,
-            (SCM oldfile, SCM newfile),
+static int
+clone_file (int oldfd, int newfd)
+{
+#ifdef FICLONE
+  return ioctl (newfd, FICLONE, oldfd);
+#else
+  (void)oldfd;
+  (void)newfd;
+  errno = EOPNOTSUPP;
+  return -1;
+#endif
+}
+
+SCM_KEYWORD (k_copy_on_write, "copy-on-write");
+SCM_SYMBOL (sym_always, "always");
+SCM_SYMBOL (sym_auto, "auto");
+SCM_SYMBOL (sym_never, "never");
+
+SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 1,
+            (SCM oldfile, SCM newfile, SCM rest),
 	    "Copy the file specified by @var{oldfile} to @var{newfile}.\n"
-	    "The return value is unspecified.")
+	    "The return value is unspecified.\n"
+            "\n"
+            "@code{#:copy-on-write} keyword argument determines whether "
+            "copy-on-write copy should be attempted and the "
+            "behavior in case of failure.  Possible values are "
+            "@code{'always} (attempt the copy-on-write, return error if "
+            "it fails), @code{'auto} (attempt the copy-on-write, "
+            "fallback to regular copy if it fails) and @code{'never} "
+            "(perform the regular copy)."
+            )
 #define FUNC_NAME s_scm_copy_file
 {
   char *c_oldfile, *c_newfile;
   int oldfd, newfd;
   int n, rv;
+  SCM cow = sym_auto;
+  int clone_res;
   char buf[BUFSIZ];
   struct stat_or_stat64 oldstat;
 
   scm_dynwind_begin (0);
-  
+
   c_oldfile = scm_to_locale_string (oldfile);
   scm_dynwind_free (c_oldfile);
   c_newfile = scm_to_locale_string (newfile);
@@ -1292,13 +1327,30 @@ SCM_DEFINE (scm_copy_file, "copy-file", 2, 0, 0,
       SCM_SYSERROR;
     }
 
-  while ((n = read (oldfd, buf, sizeof buf)) > 0)
-    if (write (newfd, buf, n) != n)
-      {
-	close (oldfd);
-	close (newfd);
-	SCM_SYSERROR;
-      }
+  scm_c_bind_keyword_arguments ("copy-file", rest, 0,
+                                k_copy_on_write, &cow,
+                                SCM_UNDEFINED);
+
+  if (scm_is_eq (cow, sym_always) || scm_is_eq (cow, sym_auto))
+    clone_res = clone_file(oldfd, newfd);
+  else if (scm_is_eq (cow, sym_never))
+    clone_res = -1;
+  else
+    scm_misc_error ("copy-file",
+                    "invalid value for #:copy-on-write: ~S",
+                    scm_list_1 (cow));
+
+  if (scm_is_eq (cow, sym_always) && clone_res)
+    scm_syserror ("copy-file: copy-on-write failed");
+
+  if (clone_res)
+    while ((n = read (oldfd, buf, sizeof buf)) > 0)
+      if (write (newfd, buf, n) != n)
+        {
+          close (oldfd);
+          close (newfd);
+          SCM_SYSERROR;
+        }
   close (oldfd);
   if (close (newfd) == -1)
     SCM_SYSERROR;
diff --git a/libguile/filesys.h b/libguile/filesys.h
index 1ce50d30e..4f620dfef 100644
--- a/libguile/filesys.h
+++ b/libguile/filesys.h
@@ -73,7 +73,7 @@ SCM_API SCM scm_symlink (SCM oldpath, SCM newpath);
 SCM_API SCM scm_symlinkat (SCM dir, SCM oldpath, SCM newpath);
 SCM_API SCM scm_readlink (SCM path);
 SCM_API SCM scm_lstat (SCM str);
-SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile);
+SCM_API SCM scm_copy_file (SCM oldfile, SCM newfile, SCM rest);
 SCM_API SCM scm_mkstemp (SCM tmpl);
 SCM_API SCM scm_mkdtemp (SCM tmpl);
 SCM_API SCM scm_dirname (SCM filename);
-- 
2.41.0





Acknowledgement sent to Tomas Volf <~@wolfsden.cz>:
New bug report received and forwarded. Copy sent to bug-guile@HIDDEN. Full text available.
Report forwarded to bug-guile@HIDDEN:
bug#68504; Package guile. Full text available.
Please note: This is a static page, with minimal formatting, updated once a day.
Click here to see this page with the latest information and nicer formatting.
Last modified: Sat, 20 Jan 2024 12:30:02 UTC

GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997 nCipher Corporation Ltd, 1994-97 Ian Jackson.