GNU bug report logs -
#63850
cp fails for files > 2 GB if copy offload is unsupported
Previous Next
Reported by: Sam James <sam <at> gentoo.org>
Date: Fri, 2 Jun 2023 15:50:02 UTC
Severity: normal
Done: Paul Eggert <eggert <at> cs.ucla.edu>
Bug is archived. No further changes may be made.
To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 63850 in the body.
You can then email your comments to 63850 AT debbugs.gnu.org in the normal way.
Toggle the display of automated, internal messages from the tracker.
Report forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Fri, 02 Jun 2023 15:50:02 GMT)
Full text and
rfc822 format available.
Acknowledgement sent
to
Sam James <sam <at> gentoo.org>
:
New bug report received and forwarded. Copy sent to
bug-coreutils <at> gnu.org
.
(Fri, 02 Jun 2023 15:50:02 GMT)
Full text and
rfc822 format available.
Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Hello,
Forwarding a downstream report of a behaviour change between
coreutils-9.1 and coreutils-9.3 from https://bugs.gentoo.org/907474.
The reporter bisected it to 093a8b4bfaba60005f14493ce7ef11ed665a0176
("copy: fix --reflink=auto to fallback in more cases", see bug#62404)
and gave strace output showing:
```
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 2147479552
copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = -1 EINVAL
(Invalid argument)
```
"""
When I try to copy a large file (> 2 GB) like so:
cp --debug file_a file_b
output looks like this:
'file_a' -> 'file_b'
cp: error copying 'file_a' to 'file_b': Invalid argument
copy offload: unsupported, reflink: unsupported, sparse detection: no
Afterwards file_b has a size of 2147479552 bytes (= 2G - 4K).
On another system (with newer kernel version) it looks like this:
cp --debug file_a file_b
'file_a' -> 'file_b'
copy offload: yes, reflink: unsupported, sparse detection: no
"""
Let me know if you need further information, although there's
some more on the downstream Gentoo bug I linked.
Apparently this is only happening w/ the 4.19.x kernels.
Thanks!
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Fri, 02 Jun 2023 16:32:02 GMT)
Full text and
rfc822 format available.
Message #8 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 02/06/2023 16:44, Sam James wrote:
> Hello,
>
> Forwarding a downstream report of a behaviour change between
> coreutils-9.1 and coreutils-9.3 from https://bugs.gentoo.org/907474.
>
> The reporter bisected it to 093a8b4bfaba60005f14493ce7ef11ed665a0176
> ("copy: fix --reflink=auto to fallback in more cases", see bug#62404)
> and gave strace output showing:
> ```
> fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
> copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = 2147479552
> copy_file_range(3, NULL, 4, NULL, 9223372035781033984, 0) = -1 EINVAL
> (Invalid argument)
> ```
>
> """
> When I try to copy a large file (> 2 GB) like so:
>
> cp --debug file_a file_b
>
> output looks like this:
>
> 'file_a' -> 'file_b'
> cp: error copying 'file_a' to 'file_b': Invalid argument
> copy offload: unsupported, reflink: unsupported, sparse detection: no
>
> Afterwards file_b has a size of 2147479552 bytes (= 2G - 4K).
>
> On another system (with newer kernel version) it looks like this:
>
> cp --debug file_a file_b
> 'file_a' -> 'file_b'
> copy offload: yes, reflink: unsupported, sparse detection: no
> """
>
> Let me know if you need further information, although there's
> some more on the downstream Gentoo bug I linked.
>
> Apparently this is only happening w/ the 4.19.x kernels.
I'm not sure it was working correctly before 9.3 either.
Before 9.3 we would have switched from copy_file_range() to read()/write()
upon receiving the EINVAL, which might have worked, but also I'm not sure
the file offsets would be correct in that case. Could you show the output with:
diff --git a/src/copy.c b/src/copy.c
index 0dd059d2e..35c54b905 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -363,7 +363,16 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
edge case where the file is made immutable after creating,
in which case the (more accurate) error is still shown. */
if (*total_n_read == 0 && is_CLONENOTSUP (errno))
- break;
+ {
+ if (*total_n_read != 0)
+ {
+ off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
+ off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
+ printf ("switching to standard copy at :%"PRIdMAX" read=%"PRIdMAX" write=%"PRIdMAX"\n",
+ *total_n_read, clone_read_offset, clone_write_offset);
+ }
+ break;
+ }
/* ENOENT was seen sometimes across CIFS shares, resulting in
no data being copied, but subsequent standard copies succeed. */
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Sat, 03 Jun 2023 05:26:01 GMT)
Full text and
rfc822 format available.
Message #11 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 03/06/2023 02:02, Mike Gilbert wrote:
> On Fri, Jun 02, 2023 at 05:31:50PM +0100, Pádraig Brady wrote:
>> I'm not sure it was working correctly before 9.3 either.
>> Before 9.3 we would have switched from copy_file_range() to read()/write()
>> upon receiving the EINVAL, which might have worked, but also I'm not sure
>> the file offsets would be correct in that case. Could you show the output with:
>>
>> diff --git a/src/copy.c b/src/copy.c
>> index 0dd059d2e..35c54b905 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -363,7 +363,16 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
>> edge case where the file is made immutable after creating,
>> in which case the (more accurate) error is still shown. */
>> if (*total_n_read == 0 && is_CLONENOTSUP (errno))
>> - break;
>> + {
>> + if (*total_n_read != 0)
>> + {
>> + off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
>> + off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
>> + printf ("switching to standard copy at :%"PRIdMAX" read=%"PRIdMAX" write=%"PRIdMAX"\n",
>> + *total_n_read, clone_read_offset, clone_write_offset);
>> + }
>> + break;
>> + }
>>
>> /* ENOENT was seen sometimes across CIFS shares, resulting in
>> no data being copied, but subsequent standard copies succeed. */
>
> I don't think this patch will do anything useful: *total_n_read cannot be 0
> and not 0 simultaneously, so the new block of code will never be
> executed. Maybe you meant to insert this block somewhere else?
Sorry was rushing.
Yes I meant to remove the first total_n_read check
as done in the following.
In any case an external check would be as useful.
I suppose we would have heard at this stage
but it would be good to have verification with md5sum or similar
on the source and destination files that the copy worked fine
on such a file >2G.
cheers,
Pádraig.
diff --git a/src/copy.c b/src/copy.c
index 0dd059d2e..296707c39 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -362,8 +362,17 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
also occur for immutable files, but that would only be in the
edge case where the file is made immutable after creating,
in which case the (more accurate) error is still shown. */
- if (*total_n_read == 0 && is_CLONENOTSUP (errno))
- break;
+ if (is_CLONENOTSUP (errno))
+ {
+ if (*total_n_read != 0)
+ {
+ off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
+ off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
+ printf ("switching to standard copy at :%"PRIdMAX" read=%"PRIdMAX" write=%"PRIdMAX"\n",
+ *total_n_read, clone_read_offset, clone_write_offset);
+ }
+ break;
+ }
/* ENOENT was seen sometimes across CIFS shares, resulting in
no data being copied, but subsequent standard copies succeed. */
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Sat, 03 Jun 2023 06:06:02 GMT)
Full text and
rfc822 format available.
Message #14 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 2023-06-02 09:31, Pádraig Brady wrote:
> I'm not sure it was working correctly before 9.3 either.
> Before 9.3 we would have switched from copy_file_range() to read()/write()
Actually, cp shouldn't have been using copy_file_range at all, as the
code is supposed to never use copy_file_range unless the Linux kernel
version is 5.3 or later. See m4/copy-file-range.m4 and
lib/copy-file-range.c.
Since the bug is being reported against kernel 4.19, someone needs to
investigate why the Gentoo build is using the copy_file_range syscall on
that kernel. Either the Gentoo build isn't properly compiling the
replacement function in coreutils/lib/copy-file-range.c, or the
replacement function is incorrectly deciding that the kernel is new
enough, or something like that.
We shouldn't need to fiddle with src/copy.c on this.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Sat, 03 Jun 2023 07:15:02 GMT)
Full text and
rfc822 format available.
Message #17 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On Fri, Jun 02, 2023 at 05:31:50PM +0100, Pádraig Brady wrote:
> I'm not sure it was working correctly before 9.3 either.
> Before 9.3 we would have switched from copy_file_range() to read()/write()
> upon receiving the EINVAL, which might have worked, but also I'm not sure
> the file offsets would be correct in that case. Could you show the output with:
>
> diff --git a/src/copy.c b/src/copy.c
> index 0dd059d2e..35c54b905 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -363,7 +363,16 @@ sparse_copy (int src_fd, int dest_fd, char **abuf, size_t buf_size,
> edge case where the file is made immutable after creating,
> in which case the (more accurate) error is still shown. */
> if (*total_n_read == 0 && is_CLONENOTSUP (errno))
> - break;
> + {
> + if (*total_n_read != 0)
> + {
> + off_t clone_read_offset = lseek (src_fd, 0, SEEK_CUR);
> + off_t clone_write_offset = lseek (dest_fd, 0, SEEK_CUR);
> + printf ("switching to standard copy at :%"PRIdMAX" read=%"PRIdMAX" write=%"PRIdMAX"\n",
> + *total_n_read, clone_read_offset, clone_write_offset);
> + }
> + break;
> + }
>
> /* ENOENT was seen sometimes across CIFS shares, resulting in
> no data being copied, but subsequent standard copies succeed. */
I don't think this patch will do anything useful: *total_n_read cannot be 0
and not 0 simultaneously, so the new block of code will never be
executed. Maybe you meant to insert this block somewhere else?
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Sat, 03 Jun 2023 16:44:03 GMT)
Full text and
rfc822 format available.
Message #20 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On Fri, Jun 02, 2023 at 11:05:24PM -0700, Paul Eggert wrote:
> On 2023-06-02 09:31, Pádraig Brady wrote:
> > I'm not sure it was working correctly before 9.3 either.
> > Before 9.3 we would have switched from copy_file_range() to read()/write()
>
> Actually, cp shouldn't have been using copy_file_range at all, as the
> code is supposed to never use copy_file_range unless the Linux kernel
> version is 5.3 or later. See m4/copy-file-range.m4 and
> lib/copy-file-range.c.
>
> Since the bug is being reported against kernel 4.19, someone needs to
> investigate why the Gentoo build is using the copy_file_range syscall on
> that kernel. Either the Gentoo build isn't properly compiling the
> replacement function in coreutils/lib/copy-file-range.c, or the
> replacement function is incorrectly deciding that the kernel is new
> enough, or something like that.
>
> We shouldn't need to fiddle with src/copy.c on this.
The macro in copy-file-range.m4 performs a build time version check
against the installed linux headers (/usr/include/linux).
In this case, headers from linux-6.1 are being used at build time.
However, the code is being run on a linux-4.19 kernel.
Generally speaking, syscall checks must be done at run time on Linux,
not build time.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Sat, 03 Jun 2023 17:18:02 GMT)
Full text and
rfc822 format available.
Message #23 received at 63850 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Mike Gilbert <floppym <at> gentoo.org> writes:
> The macro in copy-file-range.m4 performs a build time version check
> against the installed linux headers (/usr/include/linux).
>
> In this case, headers from linux-6.1 are being used at build time.
> However, the code is being run on a linux-4.19 kernel.
>
> Generally speaking, syscall checks must be done at run time on Linux,
> not build time.
Right, the replacement should always be emitted (perhaps with glibc stub
detection, and omitted if one is found, though). linux-headers is a
weird library, in that it makes no implication about the compatibility
level of your resulting executable, so, no assumption about runtime
versions or behavior can be made from static checks.
--
Arsen Arsenović
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Sat, 03 Jun 2023 18:41:02 GMT)
Full text and
rfc822 format available.
Message #26 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 03/06/2023 07:05, Paul Eggert wrote:
> On 2023-06-02 09:31, Pádraig Brady wrote:
>> I'm not sure it was working correctly before 9.3 either.
>> Before 9.3 we would have switched from copy_file_range() to read()/write()
>
> Actually, cp shouldn't have been using copy_file_range at all, as the
> code is supposed to never use copy_file_range unless the Linux kernel
> version is 5.3 or later. See m4/copy-file-range.m4 and
> lib/copy-file-range.c.
>
> Since the bug is being reported against kernel 4.19, someone needs to
> investigate why the Gentoo build is using the copy_file_range syscall on
> that kernel. Either the Gentoo build isn't properly compiling the
> replacement function in coreutils/lib/copy-file-range.c, or the
> replacement function is incorrectly deciding that the kernel is new
> enough, or something like that.
>
> We shouldn't need to fiddle with src/copy.c on this.
Yes good call on the kernel version checks.
Though I think we should be a bit more accommodating for edge cases
in code as fundamental as coreutils copy routines.
Personally I would have a preference for accommodating the partial failure
as originally proposed in the patch at https://bugs.gnu.org/62404#11
Personally I would definitely reinstate the _runtime_ kernel version check,
for this bug and all the other copy_file_range() issues on older kernels.
The runtime to build time change was originally discussed at:
https://lists.gnu.org/archive/html/bug-gnulib/2022-01/msg00118.html
cheers,
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 04:49:02 GMT)
Full text and
rfc822 format available.
Message #29 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 2023-06-03 06:54, Mike Gilbert wrote:
> In this case, headers from linux-6.1 are being used at build time.
> However, the code is being run on a linux-4.19 kernel.
Gnulib doesn't support that. If you build with headers from a particular
version of the operating system, you can't necessarily run on older
versions. The reasons for this sort of restriction should be obvious.
If Gentoo builds are regularly targeting older kernels or libraries than
the platform they are building on, then surely that's a problem in
general, not just here.
I'll cc this to bug-gnulib to give them a heads-up about the issue. For
gnulib readers, the original bug report is here:
https://bugs.gnu.org/63850
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 05:08:01 GMT)
Full text and
rfc822 format available.
Message #32 received at 63850 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
If we really need to cater to the practice of building for an older
system than the build host (which to my mind is asking for trouble),
then instead of having Gnulib try to work around the bugs in
copy_file_range in Linux kernels before 5.3, it should simply not use
copy_file_range in these older kernels.
In other words, something like the attached patch should be more
reliable than the other band-aids proposed.
[0001-copy-file-range-support-building-for-older-kernels.patch (text/x-patch, attachment)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 05:15:01 GMT)
Full text and
rfc822 format available.
Message #35 received at 63850 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> On 2023-06-03 06:54, Mike Gilbert wrote:
>> In this case, headers from linux-6.1 are being used at build time.
>> However, the code is being run on a linux-4.19 kernel.
>
> Gnulib doesn't support that. If you build with headers from a
> particular version of the operating system, you can't necessarily run
> on older versions. The reasons for this sort of restriction should be
> obvious.
>
This is a principle that other core parts of userland have no issue
with. For example, util-linux has various fallbacks based on the
*runtime* kernel version.
This doesn't square with reality, anyway: if I install linux-6.1
and its headers, then I downgrade, I need to then rebuild every
piece of the userland I built against the new headers. Tracking
that as a user is nontrivial.
> If Gentoo builds are regularly targeting older kernels or libraries
> than the platform they are building on, then surely that's a problem
> in general, not just here.
Now, continuing from what I said above, it's not feasible to *require*
users to use a kernel from the package manager. Not only do users want
to be able to run their own kernel (sometimes even just for a quick
test), but this is completely incompatible with having multiple kernels
installed in parallel, as you can't have multiple versions of the
same linux-headers in /usr/include.
Going further: are we really suggesting that someone who was using
say, Linux 6.1 for a few days, then downgrades to Linux 5.15 to test
something is in an unsupported configuration?
This isn't a practical position to have. This assumption *barely*
holds for binary distributions where you "upgrade the world" all
at once, and as I said, it's questionable there. And it's completely
incompatible with source-based distributions.
>
> I'll cc this to bug-gnulib to give them a heads-up about the
> issue. For gnulib readers, the original bug report is here:
>
> https://bugs.gnu.org/63850
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 05:29:01 GMT)
Full text and
rfc822 format available.
Message #38 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Sam James <sam <at> gentoo.org> writes:
> [[PGP Signed Part:Undecided]]
>
> Paul Eggert <eggert <at> cs.ucla.edu> writes:
>
>> On 2023-06-03 06:54, Mike Gilbert wrote:
>>> In this case, headers from linux-6.1 are being used at build time.
>>> However, the code is being run on a linux-4.19 kernel.
>>
>> Gnulib doesn't support that. If you build with headers from a
>> particular version of the operating system, you can't necessarily run
>> on older versions. The reasons for this sort of restriction should be
>> obvious.
>>
>
> This is a principle that other core parts of userland have no issue
> with. For example, util-linux has various fallbacks based on the
> *runtime* kernel version.
>
> This doesn't square with reality, anyway: if I install linux-6.1
> and its headers, then I downgrade, I need to then rebuild every
> piece of the userland I built against the new headers. Tracking
> that as a user is nontrivial.
>
>> If Gentoo builds are regularly targeting older kernels or libraries
>> than the platform they are building on, then surely that's a problem
>> in general, not just here.
>
> Now, continuing from what I said above, it's not feasible to *require*
> users to use a kernel from the package manager. Not only do users want
> to be able to run their own kernel (sometimes even just for a quick
> test), but this is completely incompatible with having multiple kernels
> installed in parallel, as you can't have multiple versions of the
> same linux-headers in /usr/include.
>
> Going further: are we really suggesting that someone who was using
> say, Linux 6.1 for a few days, then downgrades to Linux 5.15 to test
> something is in an unsupported configuration?
>
> This isn't a practical position to have. This assumption *barely*
> holds for binary distributions where you "upgrade the world" all
> at once, and as I said, it's questionable there. And it's completely
> incompatible with source-based distributions.
Just to be clear: I totally agree this isn't feasible for general
libraries (and their headers). It's just that linux-headers is
a special case, because you can't easily juggle different versions of
it, and it's expected that users may vary their kernel version.
Also, glibc says the latest should always be fine:
https://sourceware.org/glibc/wiki/FAQ#What_version_of_the_Linux_kernel_headers_should_be_used.3F.
Of course, gnulib isn't glibc, but I'm pointing out that this is
established practice.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 05:29:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 05:49:02 GMT)
Full text and
rfc822 format available.
Message #44 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 2023-06-05 22:26, Sam James wrote:
> It's just that linux-headers is
> a special case
Indeed it is. And apparently glibc avoids the copy_file_range bugs by
never, ever using copy_file_range internally - which explains why glibc
hasn't run into this backward compatibility issue. Presumably once glibc
starts assuming kernel 5.3 or later, it can start using copy_file_range
internally.
Anyway, thanks for explaining. I installed the patch I mentioned into
Gnulib and have updated coreutils accordingly on Savannah master. Please
give it a try.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 09:54:02 GMT)
Full text and
rfc822 format available.
Message #47 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 06/06/2023 06:48, Paul Eggert wrote:
> On 2023-06-05 22:26, Sam James wrote:
>> It's just that linux-headers is
>> a special case
>
> Indeed it is. And apparently glibc avoids the copy_file_range bugs by
> never, ever using copy_file_range internally - which explains why glibc
> hasn't run into this backward compatibility issue. Presumably once glibc
> starts assuming kernel 5.3 or later, it can start using copy_file_range
> internally.
Yes, glibc has the --enable-kernel configure option
to allow setting the compatibility range independently
from the build host (kernel headers).
In my experience a distro needs to support quite a range of kernel versions
mainly due to certain hardware platforms being tied to older kernels.
> Anyway, thanks for explaining. I installed the patch I mentioned into
> Gnulib and have updated coreutils accordingly on Savannah master. Please
> give it a try.
Thanks for making the change Paul.
Pádraig
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 10:04:02 GMT)
Full text and
rfc822 format available.
Message #50 received at 63850 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> On 2023-06-05 22:26, Sam James wrote:
>> It's just that linux-headers is
>> a special case
>
> Indeed it is. And apparently glibc avoids the copy_file_range bugs by
> never, ever using copy_file_range internally - which explains why
> glibc hasn't run into this backward compatibility issue. Presumably
> once glibc starts assuming kernel 5.3 or later, it can start using
> copy_file_range internally.
>
> Anyway, thanks for explaining. I installed the patch I mentioned into
> Gnulib and have updated coreutils accordingly on Savannah
> master. Please give it a try.
Thanks Paul. Do you know if there's any other cases of this in gnulib?
(or whether, by chance, if you know off the top of your head if any
other gnulib consumers definitely use copy_file_range?)
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Tue, 06 Jun 2023 10:28:02 GMT)
Full text and
rfc822 format available.
Message #53 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 06/06/2023 11:02, Sam James wrote:
>
> Paul Eggert <eggert <at> cs.ucla.edu> writes:
>
>> On 2023-06-05 22:26, Sam James wrote:
>>> It's just that linux-headers is
>>> a special case
>>
>> Indeed it is. And apparently glibc avoids the copy_file_range bugs by
>> never, ever using copy_file_range internally - which explains why
>> glibc hasn't run into this backward compatibility issue. Presumably
>> once glibc starts assuming kernel 5.3 or later, it can start using
>> copy_file_range internally.
>>
>> Anyway, thanks for explaining. I installed the patch I mentioned into
>> Gnulib and have updated coreutils accordingly on Savannah
>> master. Please give it a try.
>
> Thanks Paul. Do you know if there's any other cases of this in gnulib?
>
> (or whether, by chance, if you know off the top of your head if any
> other gnulib consumers definitely use copy_file_range?)
Both google and cs.github.com suggest only coreutils uses copy-file-range from gnulib:
https://www.google.com/search?q=filetype%3Aconf+%22copy-file-range%22
cheers,
Pádraig
Reply sent
to
Paul Eggert <eggert <at> cs.ucla.edu>
:
You have taken responsibility.
(Wed, 07 Jun 2023 19:14:01 GMT)
Full text and
rfc822 format available.
Notification sent
to
Sam James <sam <at> gentoo.org>
:
bug acknowledged by developer.
(Wed, 07 Jun 2023 19:14:02 GMT)
Full text and
rfc822 format available.
Message #58 received at 63850-done <at> debbugs.gnu.org (full text, mbox):
On 2023-06-06 03:02, Sam James wrote:
> Thanks Paul. Do you know if there's any other cases of this in gnulib?
There's no other use of linux/version.h in Gnulib. However, this doesn't
mean there are no other instances of the problem, as it's common to
assume that you'll run on a host no older than the build host. Every use
of Autoconf macros like AC_RUN_IFELSE assume this sort of thing, and I
would not be surprised if there are other examples of this more-general
problem when some builders (unwisely) build apps intended to run on
platforms older than the build platform. (For what it's worth, I count
331 uses of AC_RUN_IFELSE in Gnulib; it'd be a pain to audit them all.)
It's OK to build glibc for Linux kernels somewhat older than the build
kernel, since glibc explicitly exports that. But it's not OK to build
other packages that way, since they don't support it. You'll get away
with it in many cases, but when it doesn't work then the problem is on
the builder, not on the packages being built.
> (or whether, by chance, if you know off the top of your head if any
> other gnulib consumers definitely use copy_file_range?)
Other coreutils programs like 'install' use copy_file_range.
Emacs uses it. I just now installed the following patch into
bleeding-edge Emacs to work around the problem, by updating to current
Gnulib. This workaround should appear in Emacs 30, whenever that happens
(not for a year or two I'd think, as Emacs 29 isn't out yet).
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=a902156068ab071f93cc9bbd34cd320919b74064
PS. I'm boldly marking the coreutils bug as fixed.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Wed, 07 Jun 2023 19:27:01 GMT)
Full text and
rfc822 format available.
Message #61 received at submit <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> On 2023-06-06 03:02, Sam James wrote:
>> Thanks Paul. Do you know if there's any other cases of this in gnulib?
>
> There's no other use of linux/version.h in Gnulib. However, this
> doesn't mean there are no other instances of the problem, as it's
> common to assume that you'll run on a host no older than the build
> host. Every use of Autoconf macros like AC_RUN_IFELSE assume this sort
> of thing, and I would not be surprised if there are other examples of
> this more-general problem when some builders (unwisely) build apps
> intended to run on platforms older than the build platform. (For what
> it's worth, I count 331 uses of AC_RUN_IFELSE in Gnulib; it'd be a
> pain to audit them all.)
>
> It's OK to build glibc for Linux kernels somewhat older than the build
> kernel, since glibc explicitly exports that. But it's not OK to build
> other packages that way, since they don't support it. You'll get away
> with it in many cases, but when it doesn't work then the problem is on
> the builder, not on the packages being built.
I don't see how this is compatible with what I explained, given you
can't have parallel linux-headers installs.
Nobody uses glibc in a vacuum, either, so it's not particularly
meaningful to say you can use glibc but nothing else in that way.
If you object to this, please bring it up on the glibc mailing list
to discuss revising the guidance. But again, I genuinely don't see
how anything other than the status quo could work. We can't enforce
what kernels users are running, and we can't simultaneously make the
users install the headers for the kernel they just built manually while
also having things owned and controlled by the package manager.
We're also very much not the only distribution doing this - Arch
usually ships the latest linux-headers, and Alpine does as well.
And we've done it for years with very, very few issues.
>
>
>> (or whether, by chance, if you know off the top of your head if any
>> other gnulib consumers definitely use copy_file_range?)
>
> Other coreutils programs like 'install' use copy_file_range.
>
> Emacs uses it. I just now installed the following patch into
> bleeding-edge Emacs to work around the problem, by updating to current
> Gnulib. This workaround should appear in Emacs 30, whenever that
> happens (not for a year or two I'd think, as Emacs 29 isn't out yet).
>
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=a902156068ab071f93cc9bbd34cd320919b74064
>
Thanks!
> PS. I'm boldly marking the coreutils bug as fixed.
Works for me, thanks.
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Wed, 07 Jun 2023 19:27:02 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Wed, 07 Jun 2023 19:45:02 GMT)
Full text and
rfc822 format available.
Message #67 received at submit <at> debbugs.gnu.org (full text, mbox):
On 2023-06-07 12:21, Sam James wrote:
> I don't see how this is compatible with what I explained, given you
> can't have parallel linux-headers installs.
>
> Nobody uses glibc in a vacuum, either, so it's not particularly
> meaningful to say you can use glibc but nothing else in that way.
>
> If you object to this, please bring it up on the glibc mailing list
> to discuss revising the guidance.
There must be some miscommunication here. I'm not objecting to how glibc
is built, and I don't see why the glibc mailing list would be interested
in this problem as it's not their problem.
> We can't enforce
> what kernels users are running, and we can't simultaneously make the
> users install the headers for the kernel they just built manually while
> also having things owned and controlled by the package manager.
>
> We're also very much not the only distribution doing this - Arch
> usually ships the latest linux-headers, and Alpine does as well.
> And we've done it for years with very, very few issues.
That's not surprising, as this sort of problem arises only when building
for a newer platform yields a package that will run incorrectly on an
older platform. Problems like these are relatively rare if the only such
mismatch is the Linux kernel version, because current glibc explicitly
supports building for Linux kernel>=3.2, even when glibc is built on
newer kernels, these days nobody doing this sort of thing cares about
kernels older than 3.2, and most packages rely entirely on glibc to
access the kernel. There are exceptional packages, though, and you may
run into problems with those exceptions.
If users build for newer platforms and it usually works on older
platforms, that's great! But you might want to document that it might
not work. Because it might not.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Wed, 07 Jun 2023 19:45:03 GMT)
Full text and
rfc822 format available.
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Thu, 08 Jun 2023 11:46:02 GMT)
Full text and
rfc822 format available.
Message #73 received at 63850 <at> debbugs.gnu.org (full text, mbox):
[Message part 1 (text/plain, inline)]
Paul Eggert <eggert <at> cs.ucla.edu> writes:
> That's not surprising, as this sort of problem arises only when building for a
> newer platform yields a package that will run incorrectly on an older
> platform. Problems like these are relatively rare if the only such mismatch is
> the Linux kernel version, because current glibc explicitly supports building
> for Linux kernel>=3.2, even when glibc is built on newer kernels, these days
> nobody doing this sort of thing cares about kernels older than 3.2, and most
> packages rely entirely on glibc to access the kernel. There are exceptional
> packages, though, and you may run into problems with those exceptions.
>
> If users build for newer platforms and it usually works on older platforms,
> that's great! But you might want to document that it might not work. Because it
> might not.
In this instance, we have a configure-time test that makes a false
assumption (namely, that linux-headers version matches the linux
version), preventing a very valid runtime test from being emitted. The
cost for enabling this check is removing a dozen or so lines of code in
copy-file-range.m4 (and the subsequent addition of a few instructions
and a single syscall that is executed once).
It is entirely reasonable to expect that a user will roll back a kernel
update, as there are many reasons one might have to do so.
This is not quite comparable to downgrading libraries (which will,
reasonably, break as a result), because the kernel<->userspace boundary
is of very different nature to the library<->* boundary (namely, the
former is far more 'loose', in the sense that failure to implement
something manifests as a runtime failure rather than a build-time, or
even rtld-time failure). It is no coincidence that glibc explicitly
supports kernels older than the linux-headers it's being built with.
I'd understand not wanting to support this use-case in the cases where
doing so is difficult, but here, it's more difficult to not support this
use-case than to support it.
Please reconsider dropping the configure-time version check.
Thanks in advance, have a lovely day.
--
Arsen Arsenović
[signature.asc (application/pgp-signature, inline)]
Information forwarded
to
bug-coreutils <at> gnu.org
:
bug#63850
; Package
coreutils
.
(Thu, 08 Jun 2023 17:45:02 GMT)
Full text and
rfc822 format available.
Message #76 received at 63850 <at> debbugs.gnu.org (full text, mbox):
On 6/8/23 04:28, Arsen Arsenović wrote:
> Please reconsider dropping the configure-time version check.
I already dropped it, so this discussion is more about the general
principle than this particular case.
> It is entirely reasonable to expect that a user will roll back a kernel
> update, as there are many reasons one might have to do so.
Sure, but if they do so, and if programs use configure-time checks whose
results depend on the kernel version, users will have to rebuild and
reconfigure these programs.
That's just a fact of life. This is not just a Gnulib issue. Many
programs do lots of configure-time checks. Nobody that I know of has
time to audit them all.
*Usually* rolling back the kernel will work, because *usually* it's a
small rollback and programs built for a newer kernel won't exercise the
small area where the kernels differ in a way that will cause
user-visible symptoms.
But in general it does not work, and cannot reasonably be expected to
work.
So: when in doubt, rebuild. Of course if you've carefully audited the
program and know that its configure-time tests are valid for the older
kernel, you can skip the rebuild. Or if building speed is more important
to you than reliable applications, you can also skip the rebuild.
bug archived.
Request was from
Debbugs Internal Request <help-debbugs <at> gnu.org>
to
internal_control <at> debbugs.gnu.org
.
(Fri, 07 Jul 2023 11:24:08 GMT)
Full text and
rfc822 format available.
This bug report was last modified 1 year and 271 days ago.
Previous Next
GNU bug tracking system
Copyright (C) 1999 Darren O. Benham,
1997,2003 nCipher Corporation Ltd,
1994-97 Ian Jackson.