GNU bug report logs - #61386
[PATCH] cp,mv,install: Disable sparse copy on macOS

Previous Next

Package: coreutils;

Reported by: George Valkov <gvalkov <at> gmail.com>

Date: Thu, 9 Feb 2023 09:36:03 UTC

Severity: normal

Tags: patch

Done: Pádraig Brady <P <at> draigBrady.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 61386 in the body.
You can then email your comments to 61386 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 09 Feb 2023 09:36:03 GMT) Full text and rfc822 format available.

Acknowledgement sent to George Valkov <gvalkov <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Thu, 09 Feb 2023 09:36:03 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 9 Feb 2023 11:20:37 +0200
Due to a bug in macOS, sparse copies are corrupted on virtual disks
formatted with APFS. HFS is not affected. Affected are coreutils
install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.

While reading the entire file returns valid data, scanning for
allocated segments may return holes where valid data is present.
In this case a sparse copy does not contain these segments and returns
zeroes instead. Once the virtual disk is dismounted and then
mounted again, a sparse copy produces correct results.

This breaks OpenWRT build on macOS. Details:
https://github.com/openwrt/openwrt/pull/11960
https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579

Signed-off-by: Georgi Valkov <gvalkov <at> gmail.com>
---
 src/copy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/copy.c b/src/copy.c
index e16fedb28..4f56138a6 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
       return PLAIN_SCANTYPE;
     }
 
-#ifdef SEEK_HOLE
+#if defined(SEEK_HOLE) && !defined(__APPLE__)
   scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
   if (0 <= scan_inference->ext_start || errno == ENXIO)
     return LSEEK_SCANTYPE;
-- 
2.39.1


The issue is most likely within APFS, So we need to bring it to Apple's attention.
Since it is triggered by building `gcc`, it would be helpful
if they can shine some light how their tools are build
and what trigger this, in order to create PoC code.

**steps to reproduce**

_Create a sparse disk image with APFS_
- open Disk Utility
- press Command-N to create a new virtual disk
- Image Format: sparse disk image
- save as: ~/vhd/test
- Name: test
- Size: 50 GB
- Format: APFS (Case-sensitive)
- Encryption: none
- Partitions: Single partition - GUID Partition Map
- Save
- the image is mounted under `/Volumes/test`

_clone and build coreutils_
``` bash
cd /Volumes/test
# patched with sparse support disabled on macOS
git clone https://github.com/httpstorm/coreutils.git
mv coreutils coreutils-no-sparse
cd coreutils-no-sparse
git checkout macos_disable-sparse-copy
git submodule foreach git pull origin master
./bootstrap
make -j 16
cd ..

# original code
git clone https://github.com/coreutils/coreutils.git
cd coreutils
git submodule foreach git pull origin master
./bootstrap
make -j 16
```

_download OpenWRT and build gcc_
``` bash
cd /Volumes/test
git clone https://github.com/openwrt/openwrt.git
cd openwrt
./scripts/feeds update -a
./scripts/feeds install -a
make defconfig
make menuconfig
# Target System (Marvell EBU Armada)
# Target Profile (Linksys WRT3200ACM)
# Save, Exit
make tools/compile -j 16
make toolchain/gcc/initial/compile -j 16
```

proof of concept
``` bash
cd /Volumes/test/openwrt
cd build_dir/toolchain-arm_cortex-a9+vfpv3-d16_gcc-12.2.0_musl_eabi/gcc-12.2.0-initial/gcc

/Volumes/test/coreutils/src/cp cc1 cc1-ori
./cc1-ori --help
-bash: ./cc1-ori: cannot execute binary file: Exec format error

/Volumes/test/coreutils-no-sparse/src/cp cc1 cc1-fix
./cc1-fix --help
# works correctly

sha1sum cc1
# fcf5d9458f54d5588efc5a76a89388925ca04eda  cc1
sha1sum cc1-fix 
# fcf5d9458f54d5588efc5a76a89388925ca04eda  cc1-fix
sha1sum cc1-ori
# 7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-ori
```


Kind regards,

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 09 Feb 2023 11:57:02 GMT) Full text and rfc822 format available.

Message #8 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 9 Feb 2023 11:56:04 +0000
On 09/02/2023 09:20, George Valkov wrote:
> Due to a bug in macOS, sparse copies are corrupted on virtual disks
> formatted with APFS. HFS is not affected. Affected are coreutils
> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
> 
> While reading the entire file returns valid data, scanning for
> allocated segments may return holes where valid data is present.
> In this case a sparse copy does not contain these segments and returns
> zeroes instead. Once the virtual disk is dismounted and then
> mounted again, a sparse copy produces correct results.
> 
> This breaks OpenWRT build on macOS. Details:
> https://github.com/openwrt/openwrt/pull/11960
> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
> 
> Signed-off-by: Georgi Valkov <gvalkov <at> gmail.com>
> ---
>   src/copy.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/copy.c b/src/copy.c
> index e16fedb28..4f56138a6 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>         return PLAIN_SCANTYPE;
>       }
>   
> -#ifdef SEEK_HOLE
> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>     scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>     if (0 <= scan_inference->ext_start || errno == ENXIO)
>       return LSEEK_SCANTYPE;


Thanks for the detailed report.
The patch might very well be appropriate.
We avoided copy_file_range() for a long time on Linux for example
until it became usable.

Thanks for confirming the latest git is also impacted.

I see you're on macOS 12.
macOS 13 supports fclonefileat() which may
avoid the issue, or also be impacted?

I wonder might an fdatasync(fd) avoid your issue,
which we might do if defined(__APPLE__)?

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 09 Feb 2023 16:33:02 GMT) Full text and rfc822 format available.

Message #11 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 9 Feb 2023 16:32:15 +0000
On 09/02/2023 15:57, George Valkov wrote:
> 
> 
>> On 2023-02-09, at 1:56 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> On 09/02/2023 09:20, George Valkov wrote:
>>> Due to a bug in macOS, sparse copies are corrupted on virtual disks
>>> formatted with APFS. HFS is not affected. Affected are coreutils
>>> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
>>> While reading the entire file returns valid data, scanning for
>>> allocated segments may return holes where valid data is present.
>>> In this case a sparse copy does not contain these segments and returns
>>> zeroes instead. Once the virtual disk is dismounted and then
>>> mounted again, a sparse copy produces correct results.
>>> This breaks OpenWRT build on macOS. Details:
>>> https://github.com/openwrt/openwrt/pull/11960
>>> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
>>> Signed-off-by: Georgi Valkov <gvalkov <at> gmail.com>
>>> ---
>>>   src/copy.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>> diff --git a/src/copy.c b/src/copy.c
>>> index e16fedb28..4f56138a6 100644
>>> --- a/src/copy.c
>>> +++ b/src/copy.c
>>> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>>>         return PLAIN_SCANTYPE;
>>>       }
>>>   -#ifdef SEEK_HOLE
>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>>>     scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>>     if (0 <= scan_inference->ext_start || errno == ENXIO)
>>>       return LSEEK_SCANTYPE;
>>
>>
>> Thanks for the detailed report.
>> The patch might very well be appropriate.
> 
> Hi! Let’s test the ideas you have first, and fall-back to the patch.
> 
> In October 2021 macOS Finder was also affected, and that points directly at Apple.
> I tested again today, they have fixed Finder. After performing a copy in Finder,
> coreutils cp produces a good copy. I have to run
> make toolchain/gcc/initial/{clean,compile} -j 16
> before I can reproduce it again with the same file.
> 
> So while Apple didn't fix the underlaying issue with APFS, they did
> provide a solution for Finder. And we can make coreutils work correctly too.

That suggests that Finder may have sync'd the file.
Now syncing has overheads of course, so not an option to take lightly.
We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
A dtruss of Finder might be instructive BTW.
Why I suspect syncing is that old fiemap code we used to have in copy
needed the FIEMAP_FLAG_SYNC flag to avoid bugs like this on some file systems.

>> We avoided copy_file_range() for a long time on Linux for example
>> until it became usable.
>>
>> Thanks for confirming the latest git is also impacted.
> 
> Yes, I tested on master today.
> 
>> I see you're on macOS 12.
>> macOS 13 supports fclonefileat() which may
>> avoid the issue, or also be impacted?
> 
> Yes, I use macOS 12. There are too many complains about 13.
> I can boot macOS 13 recovery, so that might be an option,
> if I can setup a working build environment.
> Can you invite someone with macOS 13 to test the new API?
> Github has hosted runners. But I’ve never used one. And I think they use macOS 12.

OK let's assume fclonefileat() on macOS 13 is fine.
That's just a thing to consider if one is testing on macOS 13,
rather than jumping through hoops to test it.

>> I wonder might an fdatasync(fd) avoid your issue,
>> which we might do if defined(__APPLE__)?
> 
> Send me patches, and I will test them.
> 
> #ifdef SEEK_HOLE
>    fdatasync(fd); // this did not work, is fd writable? Do we need to call it on the disk image?
>     scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>     if (0 <= scan_inference->ext_start || errno == ENXIO)
>       return LSEEK_SCANTYPE;


Oh right interesting. When you say doesn't work, do you mean the sync
fails, or just doesn't help.  Note the separate coreutils sync util
only opens with write mode on AIX and Cygwin.

Relatedly you could just try that external `sync file && cp file dest` also to test the sync hypothesis.

If the sync doesn't work, or requires write mode
then that might be enough to decide to just avoid SEEK_DATA on __APPLE__ for now.

> It would be best if we test on an Intel Mac with T2 chip. I use MBP 2019 16”.
> That security chip and the encrypted physical disk with APFS might also affect our results.
> 
> Georgi Valkov
> httpstorm.com
> nano RTOS
> 
> off-topic: can you please point me at API to take a disk offline or lock it for exclusive access?
> I wrote a disk cloning tool and I have that functionality on Windows, I’d like to add it to Linux, BSD and macOS.


Note it's best to keep this on list.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 09 Feb 2023 17:39:02 GMT) Full text and rfc822 format available.

Message #14 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 9 Feb 2023 19:23:57 +0200

> On 2023-02-09, at 6:32 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 09/02/2023 15:57, George Valkov wrote:
>>> On 2023-02-09, at 1:56 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>> 
>>> On 09/02/2023 09:20, George Valkov wrote:
>>>> Due to a bug in macOS, sparse copies are corrupted on virtual disks
>>>> formatted with APFS. HFS is not affected. Affected are coreutils
>>>> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
>>>> While reading the entire file returns valid data, scanning for
>>>> allocated segments may return holes where valid data is present.
>>>> In this case a sparse copy does not contain these segments and returns
>>>> zeroes instead. Once the virtual disk is dismounted and then
>>>> mounted again, a sparse copy produces correct results.
>>>> This breaks OpenWRT build on macOS. Details:
>>>> https://github.com/openwrt/openwrt/pull/11960
>>>> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
>>>> Signed-off-by: Georgi Valkov <gvalkov <at> gmail.com>
>>>> ---
>>>>  src/copy.c | 2 +-
>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>> diff --git a/src/copy.c b/src/copy.c
>>>> index e16fedb28..4f56138a6 100644
>>>> --- a/src/copy.c
>>>> +++ b/src/copy.c
>>>> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>>>>        return PLAIN_SCANTYPE;
>>>>      }
>>>>  -#ifdef SEEK_HOLE
>>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>>>>    scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>>>    if (0 <= scan_inference->ext_start || errno == ENXIO)
>>>>      return LSEEK_SCANTYPE;
>>> 
>>> 
>>> Thanks for the detailed report.
>>> The patch might very well be appropriate.
>> Hi! Let’s test the ideas you have first, and fall-back to the patch.
>> In October 2021 macOS Finder was also affected, and that points directly at Apple.
>> I tested again today, they have fixed Finder. After performing a copy in Finder,
>> coreutils cp produces a good copy. I have to run
>> make toolchain/gcc/initial/{clean,compile} -j 16
>> before I can reproduce it again with the same file.
>> So while Apple didn't fix the underlaying issue with APFS, they did
>> provide a solution for Finder. And we can make coreutils work correctly too.
> 
> That suggests that Finder may have sync'd the file.
> Now syncing has overheads of course, so not an option to take lightly.

#include "stdio.h"
#include "unistd.h"
#include "fcntl.h"

int main(int argc, char ** argv)
{
    sync();

    int fd = open("cc1", O_RDWR);
    //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
    int a = fdatasync(fd);
    int b = fsync(fd);
    int c = close(fd);

    printf("fdatasync %u  fsync %u  close %u\n", a, b, c);

    return 0;
}

fdatasync 0  fsync 0  close 0

I agree. It takes about one second to run this code.
All calls are successful. The sparse copy after that is still corrupted.
I also tried doing this on the sparse image while it is mounted.

> We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
> A dtruss of Finder might be instructive BTW.

Yes, sync is definitely not good for performance. What is ‘dtruss'?

> Why I suspect syncing is that old fiemap code we used to have in copy
> needed the FIEMAP_FLAG_SYNC flag to avoid bugs like this on some file systems.

I’m not familiar with the coreutils code. Check an old commit.


>>> We avoided copy_file_range() for a long time on Linux for example
>>> until it became usable.
>>> 
>>> Thanks for confirming the latest git is also impacted.
>> Yes, I tested on master today.
>>> I see you're on macOS 12.
>>> macOS 13 supports fclonefileat() which may
>>> avoid the issue, or also be impacted?
>> Yes, I use macOS 12. There are too many complains about 13.
>> I can boot macOS 13 recovery, so that might be an option,
>> if I can setup a working build environment.
>> Can you invite someone with macOS 13 to test the new API?
>> Github has hosted runners. But I’ve never used one. And I think they use macOS 12.
> 
> OK let's assume fclonefileat() on macOS 13 is fine.
> That's just a thing to consider if one is testing on macOS 13,
> rather than jumping through hoops to test it.
> 
>>> I wonder might an fdatasync(fd) avoid your issue,
>>> which we might do if defined(__APPLE__)?
>> Send me patches, and I will test them.
>> #ifdef SEEK_HOLE
>>   fdatasync(fd); // this did not work, is fd writable? Do we need to call it on the disk image?
>>    scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>    if (0 <= scan_inference->ext_start || errno == ENXIO)
>>      return LSEEK_SCANTYPE;
> 
> 
> Oh right interesting. When you say doesn't work, do you mean the sync
> fails, or just doesn't help.  Note the separate coreutils sync util
> only opens with write mode on AIX and Cygwin.

fdatasync returns 0, which means success
The sparse copy after that is still corrupted.


> Relatedly you could just try that external `sync file && cp file dest` also to test the sync hypothesis.

I tried these on cc1 and on the sparse image file, the copy still gets corrupted.
sync -d cc1
sync -f cc1
sync


> If the sync doesn't work, or requires write mode
> then that might be enough to decide to just avoid SEEK_DATA on __APPLE__ for now.

manpage: On some UNIX systems (but not Linux), fd must be a writable file descriptor.
I’m not sure about macOS. The call succeeds if we open with O_RDONLY,
so I guess we don’t need a writable file descriptor.


>> It would be best if we test on an Intel Mac with T2 chip. I use MBP 2019 16”.
>> That security chip and the encrypted physical disk with APFS might also affect our results.
>> Georgi Valkov
>> httpstorm.com
>> nano RTOS
>> off-topic: can you please point me at API to take a disk offline or lock it for exclusive access?
>> I wrote a disk cloning tool and I have that functionality on Windows, I’d like to add it to Linux, BSD and macOS.
> 
> 
> Note it's best to keep this on list.

I apologise, I accidentally used replay instead of replay all.
Is there a safe way to add the group once a message is sent?

Good luck!


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 09 Feb 2023 21:36:02 GMT) Full text and rfc822 format available.

Message #17 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 9 Feb 2023 21:35:27 +0000
On 09/02/2023 17:23, George Valkov wrote:
> 
> 
>> On 2023-02-09, at 6:32 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> On 09/02/2023 15:57, George Valkov wrote:
>>>> On 2023-02-09, at 1:56 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>>>
>>>> On 09/02/2023 09:20, George Valkov wrote:
>>>>> Due to a bug in macOS, sparse copies are corrupted on virtual disks
>>>>> formatted with APFS. HFS is not affected. Affected are coreutils
>>>>> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
>>>>> While reading the entire file returns valid data, scanning for
>>>>> allocated segments may return holes where valid data is present.
>>>>> In this case a sparse copy does not contain these segments and returns
>>>>> zeroes instead. Once the virtual disk is dismounted and then
>>>>> mounted again, a sparse copy produces correct results.
>>>>> This breaks OpenWRT build on macOS. Details:
>>>>> https://github.com/openwrt/openwrt/pull/11960
>>>>> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
>>>>> Signed-off-by: Georgi Valkov <gvalkov <at> gmail.com>
>>>>> ---
>>>>>   src/copy.c | 2 +-
>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>> diff --git a/src/copy.c b/src/copy.c
>>>>> index e16fedb28..4f56138a6 100644
>>>>> --- a/src/copy.c
>>>>> +++ b/src/copy.c
>>>>> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>>>>>         return PLAIN_SCANTYPE;
>>>>>       }
>>>>>   -#ifdef SEEK_HOLE
>>>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>>>>>     scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>>>>     if (0 <= scan_inference->ext_start || errno == ENXIO)
>>>>>       return LSEEK_SCANTYPE;
>>>>
>>>>
>>>> Thanks for the detailed report.
>>>> The patch might very well be appropriate.
>>> Hi! Let’s test the ideas you have first, and fall-back to the patch.
>>> In October 2021 macOS Finder was also affected, and that points directly at Apple.
>>> I tested again today, they have fixed Finder. After performing a copy in Finder,
>>> coreutils cp produces a good copy. I have to run
>>> make toolchain/gcc/initial/{clean,compile} -j 16
>>> before I can reproduce it again with the same file.
>>> So while Apple didn't fix the underlaying issue with APFS, they did
>>> provide a solution for Finder. And we can make coreutils work correctly too.
>>
>> That suggests that Finder may have sync'd the file.
>> Now syncing has overheads of course, so not an option to take lightly.
> 
> #include "stdio.h"
> #include "unistd.h"
> #include "fcntl.h"
> 
> int main(int argc, char ** argv)
> {
>      sync();
> 
>      int fd = open("cc1", O_RDWR);
>      //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
>      int a = fdatasync(fd);
>      int b = fsync(fd);
>      int c = close(fd);
> 
>      printf("fdatasync %u  fsync %u  close %u\n", a, b, c);
> 
>      return 0;
> }
> 
> fdatasync 0  fsync 0  close 0
> 
> I agree. It takes about one second to run this code.
> All calls are successful. The sparse copy after that is still corrupted.
> I also tried doing this on the sparse image while it is mounted.

Thanks for confirming the sync doesn't help.

>> We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
>> A dtruss of Finder might be instructive BTW.
> 
> Yes, sync is definitely not good for performance. What is ‘dtruss'?

That would show the system calls that Finder is using to perform the copy.
Perhaps Finder is also just avoiding the SEEK_DATA on APFS?

Without further info, I'll apply your original avoidance patch for __APPLE__.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 01:18:01 GMT) Full text and rfc822 format available.

Message #20 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 03:03:21 +0200

> On 2023-02-09, at 11:35 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 09/02/2023 17:23, George Valkov wrote:
>>> On 2023-02-09, at 6:32 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>> 
>>> On 09/02/2023 15:57, George Valkov wrote:
>>>>> On 2023-02-09, at 1:56 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>>>> 
>>>>> On 09/02/2023 09:20, George Valkov wrote:
>>>>>> Due to a bug in macOS, sparse copies are corrupted on virtual disks
>>>>>> formatted with APFS. HFS is not affected. Affected are coreutils
>>>>>> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
>>>>>> While reading the entire file returns valid data, scanning for
>>>>>> allocated segments may return holes where valid data is present.
>>>>>> In this case a sparse copy does not contain these segments and returns
>>>>>> zeroes instead. Once the virtual disk is dismounted and then
>>>>>> mounted again, a sparse copy produces correct results.
>>>>>> This breaks OpenWRT build on macOS. Details:
>>>>>> https://github.com/openwrt/openwrt/pull/11960
>>>>>> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
>>>>>> Signed-off-by: Georgi Valkov <gvalkov <at> gmail.com>
>>>>>> ---
>>>>>>  src/copy.c | 2 +-
>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>> diff --git a/src/copy.c b/src/copy.c
>>>>>> index e16fedb28..4f56138a6 100644
>>>>>> --- a/src/copy.c
>>>>>> +++ b/src/copy.c
>>>>>> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>>>>>>        return PLAIN_SCANTYPE;
>>>>>>      }
>>>>>>  -#ifdef SEEK_HOLE
>>>>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>>>>>>    scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>>>>>    if (0 <= scan_inference->ext_start || errno == ENXIO)
>>>>>>      return LSEEK_SCANTYPE;
>>>>> 
>>>>> 
>>>>> Thanks for the detailed report.
>>>>> The patch might very well be appropriate.
>>>> Hi! Let’s test the ideas you have first, and fall-back to the patch.
>>>> In October 2021 macOS Finder was also affected, and that points directly at Apple.
>>>> I tested again today, they have fixed Finder. After performing a copy in Finder,
>>>> coreutils cp produces a good copy. I have to run
>>>> make toolchain/gcc/initial/{clean,compile} -j 16
>>>> before I can reproduce it again with the same file.
>>>> So while Apple didn't fix the underlaying issue with APFS, they did
>>>> provide a solution for Finder. And we can make coreutils work correctly too.
>>> 
>>> That suggests that Finder may have sync'd the file.
>>> Now syncing has overheads of course, so not an option to take lightly.
>> #include "stdio.h"
>> #include "unistd.h"
>> #include "fcntl.h"
>> int main(int argc, char ** argv)
>> {
>>     sync();
>>     int fd = open("cc1", O_RDWR);
>>     //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
>>     int a = fdatasync(fd);
>>     int b = fsync(fd);
>>     int c = close(fd);
>>     printf("fdatasync %u  fsync %u  close %u\n", a, b, c);
>>     return 0;
>> }
>> fdatasync 0  fsync 0  close 0
>> I agree. It takes about one second to run this code.
>> All calls are successful. The sparse copy after that is still corrupted.
>> I also tried doing this on the sparse image while it is mounted.
> 
> Thanks for confirming the sync doesn't help.
> 
>>> We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
>>> A dtruss of Finder might be instructive BTW.
>> Yes, sync is definitely not good for performance. What is ‘dtruss'?
> 
> That would show the system calls that Finder is using to perform the copy.
> Perhaps Finder is also just avoiding the SEEK_DATA on APFS?

I ran 3 traces, let me know if they are useful? If not, I will have to disable
integrity protection, and collect new data tomorrow. I got errors:
dtruss: system integrity protection is on, some features will not be available
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/dtruss/

You did not mention which flags you want with dtruss?


> Without further info, I'll apply your original avoidance patch for __APPLE__.

It’s up to you, as long as you have ideas, I can test them.
I can also invite you on Parsec or Anydesk. Working together is more efficient.

Cheers, mate!

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 03:58:01 GMT) Full text and rfc822 format available.

Message #23 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 9 Feb 2023 19:57:44 -0800
[Message part 1 (text/plain, inline)]
On 2/9/23 01:20, George Valkov wrote:
> -#ifdef SEEK_HOLE
> +#if defined(SEEK_HOLE) && !defined(__APPLE__)

Instead of always disabling the SEEK_HOLE optimization, how about doing 
it only on APFS files? Something like the attached, perhaps (this is 
against Savannah master).

If APFS is pretty much universal now on Apple platforms, this patch is 
overkill and we should use your much-simpler patch.
[0001-cp-work-around-APFS-bug.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 09:15:01 GMT) Full text and rfc822 format available.

Message #26 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 09:14:00 +0000
On 10/02/2023 01:03, George Valkov wrote:
> 
> 
>> On 2023-02-09, at 11:35 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> On 09/02/2023 17:23, George Valkov wrote:
>>>> On 2023-02-09, at 6:32 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>>>
>>>> On 09/02/2023 15:57, George Valkov wrote:
>>>>>> On 2023-02-09, at 1:56 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>>>>>
>>>>>> On 09/02/2023 09:20, George Valkov wrote:
>>>>>>> Due to a bug in macOS, sparse copies are corrupted on virtual disks
>>>>>>> formatted with APFS. HFS is not affected. Affected are coreutils
>>>>>>> install, and gcp when compiled with SEEK_HOLE, as well as macOS Finder.
>>>>>>> While reading the entire file returns valid data, scanning for
>>>>>>> allocated segments may return holes where valid data is present.
>>>>>>> In this case a sparse copy does not contain these segments and returns
>>>>>>> zeroes instead. Once the virtual disk is dismounted and then
>>>>>>> mounted again, a sparse copy produces correct results.
>>>>>>> This breaks OpenWRT build on macOS. Details:
>>>>>>> https://github.com/openwrt/openwrt/pull/11960
>>>>>>> https://github.com/openwrt/openwrt/pull/11960#issuecomment-1423185579
>>>>>>> Signed-off-by: Georgi Valkov <gvalkov <at> gmail.com>
>>>>>>> ---
>>>>>>>   src/copy.c | 2 +-
>>>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>> diff --git a/src/copy.c b/src/copy.c
>>>>>>> index e16fedb28..4f56138a6 100644
>>>>>>> --- a/src/copy.c
>>>>>>> +++ b/src/copy.c
>>>>>>> @@ -1065,7 +1065,7 @@ infer_scantype (int fd, struct stat const *sb,
>>>>>>>         return PLAIN_SCANTYPE;
>>>>>>>       }
>>>>>>>   -#ifdef SEEK_HOLE
>>>>>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>>>>>>>     scan_inference->ext_start = lseek (fd, 0, SEEK_DATA);
>>>>>>>     if (0 <= scan_inference->ext_start || errno == ENXIO)
>>>>>>>       return LSEEK_SCANTYPE;
>>>>>>
>>>>>>
>>>>>> Thanks for the detailed report.
>>>>>> The patch might very well be appropriate.
>>>>> Hi! Let’s test the ideas you have first, and fall-back to the patch.
>>>>> In October 2021 macOS Finder was also affected, and that points directly at Apple.
>>>>> I tested again today, they have fixed Finder. After performing a copy in Finder,
>>>>> coreutils cp produces a good copy. I have to run
>>>>> make toolchain/gcc/initial/{clean,compile} -j 16
>>>>> before I can reproduce it again with the same file.
>>>>> So while Apple didn't fix the underlaying issue with APFS, they did
>>>>> provide a solution for Finder. And we can make coreutils work correctly too.
>>>>
>>>> That suggests that Finder may have sync'd the file.
>>>> Now syncing has overheads of course, so not an option to take lightly.
>>> #include "stdio.h"
>>> #include "unistd.h"
>>> #include "fcntl.h"
>>> int main(int argc, char ** argv)
>>> {
>>>      sync();
>>>      int fd = open("cc1", O_RDWR);
>>>      //int fd = open("/Users/g/vhd/coreutils.sparseimage", O_RDWR);
>>>      int a = fdatasync(fd);
>>>      int b = fsync(fd);
>>>      int c = close(fd);
>>>      printf("fdatasync %u  fsync %u  close %u\n", a, b, c);
>>>      return 0;
>>> }
>>> fdatasync 0  fsync 0  close 0
>>> I agree. It takes about one second to run this code.
>>> All calls are successful. The sparse copy after that is still corrupted.
>>> I also tried doing this on the sparse image while it is mounted.
>>
>> Thanks for confirming the sync doesn't help.
>>
>>>> We may be safer just doing the normal copy on __APPLE__ as per your orig patch.
>>>> A dtruss of Finder might be instructive BTW.
>>> Yes, sync is definitely not good for performance. What is ‘dtruss'?
>>
>> That would show the system calls that Finder is using to perform the copy.
>> Perhaps Finder is also just avoiding the SEEK_DATA on APFS?
> 
> I ran 3 traces, let me know if they are useful? If not, I will have to disable
> integrity protection, and collect new data tomorrow. I got errors:
> dtruss: system integrity protection is on, some features will not be available
> https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/dtruss/
> 
> You did not mention which flags you want with dtruss?
> 
> 
>> Without further info, I'll apply your original avoidance patch for __APPLE__.
> 
> It’s up to you, as long as you have ideas, I can test them.
> I can also invite you on Parsec or Anydesk. Working together is more efficient.

I've only used dtruss a couple of times (with sudo)
as I'm not a macOS user, sorry.

Those 3 traces only had a single lseek() each,
so I'm assuming for now that Finder is not using SEEK_DATA.

Will apply your patch later.

thanks,
Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 09:19:02 GMT) Full text and rfc822 format available.

Message #29 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>,
 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 09:18:45 +0000
On 10/02/2023 03:57, Paul Eggert wrote:
> On 2/9/23 01:20, George Valkov wrote:
>> -#ifdef SEEK_HOLE
>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
> 
> Instead of always disabling the SEEK_HOLE optimization, how about doing
> it only on APFS files? Something like the attached, perhaps (this is
> against Savannah master).
> 
> If APFS is pretty much universal now on Apple platforms, this patch is
> overkill and we should use your much-simpler patch.

Thanks for doing that Paul.
I think the simpler patch is best though
since APFS is the default macOS file system since 2017,
and upon upgrade earlier file systems are upgraded to that.

Also SEEK_DATA doesn't seem to be a well supported feature
on macOS anyway as there was also this issue:
https://bugs.gnu.org/51857

I'll apply the simple patch later I think.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 12:14:01 GMT) Full text and rfc822 format available.

Message #32 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 14:13:20 +0200
> On 2023-02-10, at 11:18 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 10/02/2023 03:57, Paul Eggert wrote:
>> On 2/9/23 01:20, George Valkov wrote:
>>> -#ifdef SEEK_HOLE
>>> +#if defined(SEEK_HOLE) && !defined(__APPLE__)
>> Instead of always disabling the SEEK_HOLE optimization, how about doing
>> it only on APFS files? Something like the attached, perhaps (this is
>> against Savannah master).
>> If APFS is pretty much universal now on Apple platforms, this patch is
>> overkill and we should use your much-simpler patch.

Hi Paul,
I compiled your patch against git://git.savannah.gnu.org/coreutils.git
and it fixes the problem. But as I mention later, I think we should not use it.
Thank you!

> Thanks for doing that Paul.
> I think the simpler patch is best though
> since APFS is the default macOS file system since 2017,
> and upon upgrade earlier file systems are upgraded to that.

I agree. macOS supports HFS+ and APFS.
According to Wikipedia, HFS+ does not support sparse files.
On APFS we should not use SEEK_DATA since it is broken.
So the simple patch is better.


> Also SEEK_DATA doesn't seem to be a well supported feature
> on macOS anyway as there was also this issue:
> https://bugs.gnu.org/51857
> 
> I'll apply the simple patch later I think.

I have an interesting idea: If I copy a large file, say the 16 GB disk image
where I compiled OpenWRT. So I copy this on the same filesystem, and check
disk usage before an after: no change. Also the copy is instant. That’s because
APFS supports copy-on-write. Initially both files share the same sectors on disk,
any changes made after that are copied to new sectors.
This is the most efficient way to copy files on APFS, and should produce no corruption.
Let’s implement it. I would assume there is a system cal that does the entire copy,
And we don’t need to read and write any data.
Does the trace contain any interesting calls that might be related to that?


Regarding dtruss, on macOS there is integrity protection which prevents you
from debugging applications that came with macOS such as fined. I read
that in order to do this, integrity protection has to be disabled from recovery,
then copy the application to a non-system location, e.g. ~/ and finally
remove code signature. That’s why the traces I provided are limited, and it gets
tricky to do all of this, considering Finder runs all the time and the system
will restart it if it gets killed. But we need to run the copy. With that said, if you want,
I will give it a try.

dtruss [-acdefholLs] [-t syscall] { -p PID | -n name | command | -W name }
-p PID          # examine this PID
-n name         # examine this process name
-t syscall      # examine this syscall only
-W name         # wait for a process matching this name
-a              # print all details
-c              # print syscall counts
-d              # print relative times (us)
-e              # print elapsed times (us)
-f              # follow children
-l              # force printing pid/lwpid
-o              # print on cpu times
-s              # print stack backtraces
-L              # don't print pid/lwpid
-b bufsize      # dynamic variable buf size


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 14:03:01 GMT) Full text and rfc822 format available.

Message #35 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 14:02:19 +0000
On 10/02/2023 12:13, George Valkov wrote:
> 
>> On 2023-02-10, at 11:18 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> I'll apply the simple patch later I think.
> 
> I have an interesting idea: If I copy a large file, say the 16 GB disk image
> where I compiled OpenWRT. So I copy this on the same filesystem, and check
> disk usage before an after: no change. Also the copy is instant. That’s because
> APFS supports copy-on-write. Initially both files share the same sectors on disk,
> any changes made after that are copied to new sectors.
> This is the most efficient way to copy files on APFS, and should produce no corruption.
> Let’s implement it. I would assume there is a system cal that does the entire copy,
> And we don’t need to read and write any data.
> Does the trace contain any interesting calls that might be related to that?

When you say "I copy a large file", is that with gcp or something else?
Since coreutils 9.1 we try the CoW by default with fclonefileat()
which is available since macOS 10.12 (2016).
Note that works only when src and dest are within the same file system,
but that is the case for you if I'm reading your original report correctly.
When I mentioned that earlier I thought your macOS version was too old to support that,
but in fact your 12.6.3 should support fclonefileat() fine.
So that's another variable. Is that call failing completely for you?
You might be quicker to add a couple of printfs around the
fclonefileat() calls in the coreutils from latest git you compliled.
Note there is new error handling related to that call in the latest git,
compared to what was released in coreutils 9.1.
Note also I don't see any fclonefileat() syscalls in your Finder logs at least.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 17:25:02 GMT) Full text and rfc822 format available.

Message #38 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 19:24:02 +0200
> On 2023-02-10, at 4:02 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 10/02/2023 12:13, George Valkov wrote:
>>> On 2023-02-10, at 11:18 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>> 
>>> I'll apply the simple patch later I think.
>> I have an interesting idea: If I copy a large file, say the 16 GB disk image
>> where I compiled OpenWRT. So I copy this on the same filesystem, and check
>> disk usage before an after: no change. Also the copy is instant. That’s because
>> APFS supports copy-on-write. Initially both files share the same sectors on disk,
>> any changes made after that are copied to new sectors.
>> This is the most efficient way to copy files on APFS, and should produce no corruption.
>> Let’s implement it. I would assume there is a system cal that does the entire copy,
>> And we don’t need to read and write any data.
>> Does the trace contain any interesting calls that might be related to that?
> 
> When you say "I copy a large file", is that with gcp or something else?

Finder: option + drag in the same directory

> Since coreutils 9.1 we try the CoW by default with fclonefileat()
> which is available since macOS 10.12 (2016).

I can confirms fclonefileat works on macOS 12.6.3 and solves the issue.
File attributes are also preserved. To clarify: these observations are from
the sample below. I don’t have internal experience with cloreutils, but I can
test it if you provide some description.


> Note that works only when src and dest are within the same file system,
> but that is the case for you if I'm reading your original report correctly.

Correct. fclonefileat only works on the same file system.
If we attempt to clone to another volume, fclonefileat fails 18 Cross-device link.
When building OpenWRT everything works on the same file system, so
fclonefileat is applicable.


> When I mentioned that earlier I thought your macOS version was too old to support that,
> but in fact your 12.6.3 should support fclonefileat() fine.

Thankfully it is supported. The latest security update came in January 2023.


> So that's another variable. Is that call failing completely for you?
> You might be quicker to add a couple of printfs around the
> fclonefileat() calls in the coreutils from latest git you compliled.
> Note there is new error handling related to that call in the latest git,
> compared to what was released in coreutils 9.1.

Coreutils tests on master
git clone git://git.savannah.gnu.org/coreutils.git
mv coreutils coreutils-clone
cd coreutils-clone
git submodule foreach git pull origin master
./bootstrap
./configure
make -j 16
git log
commit d374d32ccf12f8cf33c8f823d573498b7c8b27a4 (HEAD -> clone, origin/master, origin/HEAD, master)
cd ..

./coreutils-clone/src/cp cc1 cc1-test

printf("HAVE_FCLONEFILEAT %u  USE_XATTR %u\n", HAVE_FCLONEFILEAT, USE_XATTR);
HAVE_FCLONEFILEAT 1  USE_XATTR 0

      int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
		printf(
			"data_copy_required      %u  x->reflink_mode        %u\n"
			"x->preserve_mode        %u  x->preserve_timestamps %u\n"
			"x->preserve_ownership   %u  CLONE_NOOWNERCOPY      %u\n",
			data_copy_required, x->reflink_mode,
			x->preserve_mode, x->preserve_timestamps,
			x->preserve_ownership, CLONE_NOOWNERCOPY
		);
      if (data_copy_required && x->reflink_mode
          && x->preserve_mode && x->preserve_timestamps
          && (x->preserve_ownership || CLONE_NOOWNERCOPY))
        {
			int a = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
			int e = errno;
			printf("fclonefileat %i %i\n", a, e);
          if (a == 0)
            goto close_src_desc;

data_copy_required      1  x->reflink_mode        1
x->preserve_mode        0  x->preserve_timestamps 0
x->preserve_ownership   0  CLONE_NOOWNERCOPY      2

fclonefileat is not used because x->preserve_mode = 0, x->preserve_timestamps = 0
That design seems wrong. The fact than no one requested to preserve that data,
doesn’t mean preserving it via fclonefileat is a bad thing. I think we should
always call fclonefileat and proceed to other means if it fails. It is more efficient, since
it requires no additional space or data to be copied. And always produces a valid copy.


> Note also I don't see any fclonefileat() syscalls in your Finder logs at least.

Great news: fclonefileat works on the same file system, and if we
need to copy outside, then just disable sparse copy according to my patch.
One option is to start blindly with fclonefileat, and if that fails, fall back to
normal copy. If you can use the trace to find what Finder is doing, we can try it.
It might bring some improvements. Else, disable sparse and do a regular copy.

It was easier for me to write a small sample:

// clone.c
#include <stdio.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/clonefile.h>

int main(int argc, char ** argv)
{
	int fd = open("cc1", O_RDONLY);
	int dir = open("./", O_RDONLY);
	int a = fclonefileat(fd, dir, "cc1-clone", 0);
	close(fd);
	close(dir);

	printf("fd %i  dir %i  fclonefileat %i\n", fd, dir, a);

	return 0;
}

./coreutils/src/cp cc1 cc1-ori-before
gcc clone.c && ./a.out
./coreutils/src/cp cc1 cc1-ori-after

ll cc1*
-rwxr-xr-x  1 g  staff  27551296 Feb 10 17:35 cc1*
-rwxr-xr-x  1 g  staff  27551296 Feb 10 17:35 cc1-clone*
-rwxr-xr-x  1 g  staff  27551296 Feb 10 17:39 cc1-ori-after*
-rwxr-xr-x  1 g  staff  27551296 Feb 10 17:39 cc1-ori-before*

sha1sum
812cb0591b000f56bdf75c0f3f94c622e371b970  cc1
812cb0591b000f56bdf75c0f3f94c622e371b970  cc1-clone
812cb0591b000f56bdf75c0f3f94c622e371b970  cc1-ori-after
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-ori-before

df -h
Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
/dev/disk1s1    1.1Ti  1.0Ti   58Gi    95% 2400403 605165920    0%   /System/Volumes/Data

# clone a large file 15 GB using fclonefileat
./a.out

# copy a large file 15 GB using Finder: option + drag in the same directory
# both complete instantly

df -h
Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
/dev/disk1s1    1.1Ti  1.0Ti   58Gi    95% 2400401 605207240    0%   /System/Volumes/Data

ll -h /Users/g/vhd/OpenWRT.wrt3200*
-rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200-clone.sparseimage
-rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200-finder.sparseimage
-rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200.sparseimage


Cheers mate!

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 18:59:02 GMT) Full text and rfc822 format available.

Message #41 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 18:58:33 +0000
[Message part 1 (text/plain, inline)]
On 10/02/2023 17:24, George Valkov wrote:
> 
>> On 2023-02-10, at 4:02 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> On 10/02/2023 12:13, George Valkov wrote:
>>>> On 2023-02-10, at 11:18 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>>>
>>>> I'll apply the simple patch later I think.
>>> I have an interesting idea: If I copy a large file, say the 16 GB disk image
>>> where I compiled OpenWRT. So I copy this on the same filesystem, and check
>>> disk usage before an after: no change. Also the copy is instant. That’s because
>>> APFS supports copy-on-write. Initially both files share the same sectors on disk,
>>> any changes made after that are copied to new sectors.
>>> This is the most efficient way to copy files on APFS, and should produce no corruption.
>>> Let’s implement it. I would assume there is a system cal that does the entire copy,
>>> And we don’t need to read and write any data.
>>> Does the trace contain any interesting calls that might be related to that?
>>
>> When you say "I copy a large file", is that with gcp or something else?
> 
> Finder: option + drag in the same directory
> 
>> Since coreutils 9.1 we try the CoW by default with fclonefileat()
>> which is available since macOS 10.12 (2016).
> 
> I can confirms fclonefileat works on macOS 12.6.3 and solves the issue.
> File attributes are also preserved. To clarify: these observations are from
> the sample below. I don’t have internal experience with cloreutils, but I can
> test it if you provide some description.
> 
> 
>> Note that works only when src and dest are within the same file system,
>> but that is the case for you if I'm reading your original report correctly.
> 
> Correct. fclonefileat only works on the same file system.
> If we attempt to clone to another volume, fclonefileat fails 18 Cross-device link.
> When building OpenWRT everything works on the same file system, so
> fclonefileat is applicable.
> 
> 
>> When I mentioned that earlier I thought your macOS version was too old to support that,
>> but in fact your 12.6.3 should support fclonefileat() fine.
> 
> Thankfully it is supported. The latest security update came in January 2023.
> 
> 
>> So that's another variable. Is that call failing completely for you?
>> You might be quicker to add a couple of printfs around the
>> fclonefileat() calls in the coreutils from latest git you compliled.
>> Note there is new error handling related to that call in the latest git,
>> compared to what was released in coreutils 9.1.
> 
> Coreutils tests on master
> git clone git://git.savannah.gnu.org/coreutils.git
> mv coreutils coreutils-clone
> cd coreutils-clone
> git submodule foreach git pull origin master
> ./bootstrap
> ./configure
> make -j 16
> git log
> commit d374d32ccf12f8cf33c8f823d573498b7c8b27a4 (HEAD -> clone, origin/master, origin/HEAD, master)
> cd ..
> 
> ./coreutils-clone/src/cp cc1 cc1-test
> 
> printf("HAVE_FCLONEFILEAT %u  USE_XATTR %u\n", HAVE_FCLONEFILEAT, USE_XATTR);
> HAVE_FCLONEFILEAT 1  USE_XATTR 0
> 
>        int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
> 		printf(
> 			"data_copy_required      %u  x->reflink_mode        %u\n"
> 			"x->preserve_mode        %u  x->preserve_timestamps %u\n"
> 			"x->preserve_ownership   %u  CLONE_NOOWNERCOPY      %u\n",
> 			data_copy_required, x->reflink_mode,
> 			x->preserve_mode, x->preserve_timestamps,
> 			x->preserve_ownership, CLONE_NOOWNERCOPY
> 		);
>        if (data_copy_required && x->reflink_mode
>            && x->preserve_mode && x->preserve_timestamps
>            && (x->preserve_ownership || CLONE_NOOWNERCOPY))
>          {
> 			int a = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
> 			int e = errno;
> 			printf("fclonefileat %i %i\n", a, e);
>            if (a == 0)
>              goto close_src_desc;
> 
> data_copy_required      1  x->reflink_mode        1
> x->preserve_mode        0  x->preserve_timestamps 0
> x->preserve_ownership   0  CLONE_NOOWNERCOPY      2
> 
> fclonefileat is not used because x->preserve_mode = 0, x->preserve_timestamps = 0
> That design seems wrong. The fact than no one requested to preserve that data,
> doesn’t mean preserving it via fclonefileat is a bad thing. I think we should
> always call fclonefileat and proceed to other means if it fails. It is more efficient, since
> it requires no additional space or data to be copied. And always produces a valid copy.
> 
> 
>> Note also I don't see any fclonefileat() syscalls in your Finder logs at least.
> 
> Great news: fclonefileat works on the same file system, and if we
> need to copy outside, then just disable sparse copy according to my patch.
> One option is to start blindly with fclonefileat, and if that fails, fall back to
> normal copy. If you can use the trace to find what Finder is doing, we can try it.
> It might bring some improvements. Else, disable sparse and do a regular copy.
> 
> It was easier for me to write a small sample:
> 
> // clone.c
> #include <stdio.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <sys/clonefile.h>
> 
> int main(int argc, char ** argv)
> {
> 	int fd = open("cc1", O_RDONLY);
> 	int dir = open("./", O_RDONLY);
> 	int a = fclonefileat(fd, dir, "cc1-clone", 0);
> 	close(fd);
> 	close(dir);
> 
> 	printf("fd %i  dir %i  fclonefileat %i\n", fd, dir, a);
> 
> 	return 0;
> }
> 
> ./coreutils/src/cp cc1 cc1-ori-before
> gcc clone.c && ./a.out
> ./coreutils/src/cp cc1 cc1-ori-after
> 
> ll cc1*
> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:35 cc1*
> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:35 cc1-clone*
> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:39 cc1-ori-after*
> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:39 cc1-ori-before*
> 
> sha1sum
> 812cb0591b000f56bdf75c0f3f94c622e371b970  cc1
> 812cb0591b000f56bdf75c0f3f94c622e371b970  cc1-clone
> 812cb0591b000f56bdf75c0f3f94c622e371b970  cc1-ori-after
> 7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-ori-before
> 
> df -h
> Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
> /dev/disk1s1    1.1Ti  1.0Ti   58Gi    95% 2400403 605165920    0%   /System/Volumes/Data
> 
> # clone a large file 15 GB using fclonefileat
> ./a.out
> 
> # copy a large file 15 GB using Finder: option + drag in the same directory
> # both complete instantly
> 
> df -h
> Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
> /dev/disk1s1    1.1Ti  1.0Ti   58Gi    95% 2400401 605207240    0%   /System/Volumes/Data
> 
> ll -h /Users/g/vhd/OpenWRT.wrt3200*
> -rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200-clone.sparseimage
> -rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200-finder.sparseimage
> -rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200.sparseimage

Cool that clears up the fclonefileat() questions.
Attached is a patch to attempt the clone unless
we've explicitly specified --no-preserve=mode.
I was considering "touch"ing the timestamps after also,
but it's better to just maintain them as we're
pointing to the same data after all.

So with this patch we should operate quickly within a file system.
With your original patch we should operate robustly otherwise.

cheers,
Pádraig
[macos-more-cow.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 20:07:01 GMT) Full text and rfc822 format available.

Message #44 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 22:06:22 +0200
[Message part 1 (text/plain, inline)]
> On 2023-02-10, at 8:58 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 10/02/2023 17:24, George Valkov wrote:
>>> On 2023-02-10, at 4:02 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>> 
>>> On 10/02/2023 12:13, George Valkov wrote:
>>>>> On 2023-02-10, at 11:18 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>>>> 
>>>>> I'll apply the simple patch later I think.
>>>> I have an interesting idea: If I copy a large file, say the 16 GB disk image
>>>> where I compiled OpenWRT. So I copy this on the same filesystem, and check
>>>> disk usage before an after: no change. Also the copy is instant. That’s because
>>>> APFS supports copy-on-write. Initially both files share the same sectors on disk,
>>>> any changes made after that are copied to new sectors.
>>>> This is the most efficient way to copy files on APFS, and should produce no corruption.
>>>> Let’s implement it. I would assume there is a system cal that does the entire copy,
>>>> And we don’t need to read and write any data.
>>>> Does the trace contain any interesting calls that might be related to that?
>>> 
>>> When you say "I copy a large file", is that with gcp or something else?
>> Finder: option + drag in the same directory
>>> Since coreutils 9.1 we try the CoW by default with fclonefileat()
>>> which is available since macOS 10.12 (2016).
>> I can confirms fclonefileat works on macOS 12.6.3 and solves the issue.
>> File attributes are also preserved. To clarify: these observations are from
>> the sample below. I don’t have internal experience with cloreutils, but I can
>> test it if you provide some description.
>>> Note that works only when src and dest are within the same file system,
>>> but that is the case for you if I'm reading your original report correctly.
>> Correct. fclonefileat only works on the same file system.
>> If we attempt to clone to another volume, fclonefileat fails 18 Cross-device link.
>> When building OpenWRT everything works on the same file system, so
>> fclonefileat is applicable.
>>> When I mentioned that earlier I thought your macOS version was too old to support that,
>>> but in fact your 12.6.3 should support fclonefileat() fine.
>> Thankfully it is supported. The latest security update came in January 2023.
>>> So that's another variable. Is that call failing completely for you?
>>> You might be quicker to add a couple of printfs around the
>>> fclonefileat() calls in the coreutils from latest git you compliled.
>>> Note there is new error handling related to that call in the latest git,
>>> compared to what was released in coreutils 9.1.
>> Coreutils tests on master
>> git clone git://git.savannah.gnu.org/coreutils.git
>> mv coreutils coreutils-clone
>> cd coreutils-clone
>> git submodule foreach git pull origin master
>> ./bootstrap
>> ./configure
>> make -j 16
>> git log
>> commit d374d32ccf12f8cf33c8f823d573498b7c8b27a4 (HEAD -> clone, origin/master, origin/HEAD, master)
>> cd ..
>> ./coreutils-clone/src/cp cc1 cc1-test
>> printf("HAVE_FCLONEFILEAT %u  USE_XATTR %u\n", HAVE_FCLONEFILEAT, USE_XATTR);
>> HAVE_FCLONEFILEAT 1  USE_XATTR 0
>>       int fc_flags = x->preserve_ownership ? 0 : CLONE_NOOWNERCOPY;
>> 		printf(
>> 			"data_copy_required      %u  x->reflink_mode        %u\n"
>> 			"x->preserve_mode        %u  x->preserve_timestamps %u\n"
>> 			"x->preserve_ownership   %u  CLONE_NOOWNERCOPY      %u\n",
>> 			data_copy_required, x->reflink_mode,
>> 			x->preserve_mode, x->preserve_timestamps,
>> 			x->preserve_ownership, CLONE_NOOWNERCOPY
>> 		);
>>       if (data_copy_required && x->reflink_mode
>>           && x->preserve_mode && x->preserve_timestamps
>>           && (x->preserve_ownership || CLONE_NOOWNERCOPY))
>>         {
>> 			int a = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
>> 			int e = errno;
>> 			printf("fclonefileat %i %i\n", a, e);
>>           if (a == 0)
>>             goto close_src_desc;
>> data_copy_required      1  x->reflink_mode        1
>> x->preserve_mode        0  x->preserve_timestamps 0
>> x->preserve_ownership   0  CLONE_NOOWNERCOPY      2
>> fclonefileat is not used because x->preserve_mode = 0, x->preserve_timestamps = 0
>> That design seems wrong. The fact than no one requested to preserve that data,
>> doesn’t mean preserving it via fclonefileat is a bad thing. I think we should
>> always call fclonefileat and proceed to other means if it fails. It is more efficient, since
>> it requires no additional space or data to be copied. And always produces a valid copy.
>>> Note also I don't see any fclonefileat() syscalls in your Finder logs at least.
>> Great news: fclonefileat works on the same file system, and if we
>> need to copy outside, then just disable sparse copy according to my patch.
>> One option is to start blindly with fclonefileat, and if that fails, fall back to
>> normal copy. If you can use the trace to find what Finder is doing, we can try it.
>> It might bring some improvements. Else, disable sparse and do a regular copy.
>> It was easier for me to write a small sample:
>> // clone.c
>> #include <stdio.h>
>> #include <fcntl.h>
>> #include <unistd.h>
>> #include <sys/clonefile.h>
>> int main(int argc, char ** argv)
>> {
>> 	int fd = open("cc1", O_RDONLY);
>> 	int dir = open("./", O_RDONLY);
>> 	int a = fclonefileat(fd, dir, "cc1-clone", 0);
>> 	close(fd);
>> 	close(dir);
>> 	printf("fd %i  dir %i  fclonefileat %i\n", fd, dir, a);
>> 	return 0;
>> }
>> ./coreutils/src/cp cc1 cc1-ori-before
>> gcc clone.c && ./a.out
>> ./coreutils/src/cp cc1 cc1-ori-after
>> ll cc1*
>> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:35 cc1*
>> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:35 cc1-clone*
>> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:39 cc1-ori-after*
>> -rwxr-xr-x  1 g  staff  27551296 Feb 10 17:39 cc1-ori-before*
>> sha1sum
>> 812cb0591b000f56bdf75c0f3f94c622e371b970  cc1
>> 812cb0591b000f56bdf75c0f3f94c622e371b970  cc1-clone
>> 812cb0591b000f56bdf75c0f3f94c622e371b970  cc1-ori-after
>> 7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-ori-before
>> df -h
>> Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
>> /dev/disk1s1    1.1Ti  1.0Ti   58Gi    95% 2400403 605165920    0%   /System/Volumes/Data
>> # clone a large file 15 GB using fclonefileat
>> ./a.out
>> # copy a large file 15 GB using Finder: option + drag in the same directory
>> # both complete instantly
>> df -h
>> Filesystem       Size   Used  Avail Capacity iused     ifree %iused  Mounted on
>> /dev/disk1s1    1.1Ti  1.0Ti   58Gi    95% 2400401 605207240    0%   /System/Volumes/Data
>> ll -h /Users/g/vhd/OpenWRT.wrt3200*
>> -rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200-clone.sparseimage
>> -rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200-finder.sparseimage
>> -rw-r--r--@ 1 g  staff    15G Feb 10 04:06 /Users/g/vhd/OpenWRT.wrt3200.sparseimage
> 
> Cool that clears up the fclonefileat() questions.
> Attached is a patch to attempt the clone unless
> we've explicitly specified --no-preserve=mode.
> I was considering "touch"ing the timestamps after also,
> but it's better to just maintain them as we're
> pointing to the same data after all.
> 
> So with this patch we should operate quickly within a file system.
> With your original patch we should operate robustly otherwise.

Thank you, Pádriag! Nice teamwork. I added x-> and Tested-by to your patch.
The copy was instant and accurate. Both apply to cc1 as well as the 15 GB file.
Please let me know when there is a new release, so I can push it to OpenWRT.

[0001-copy-on-macOS-try-COW-even-if-not-preserving-mode.patch (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]

./coreutils-clone/src/cp cc1 cc1-test
c9b79253a4b690c7be03ebc3187f1f09834f7472  cc1
c9b79253a4b690c7be03ebc3187f1f09834f7472  cc1-test
-rwxr-xr-x    1 g     staff  27551296 Feb 10 21:45 cc1*
-rwxr-xr-x    1 g     staff  27551296 Feb 10 21:45 cc1-test*

Build log in case you feel like fixing unrelated sprintf deprecated warnings later
to improve quality.
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/build-log-d374d32ccf12f8cf33c8f823d573498b7c8b27a4.txt

Cheers!

Georgi Valkov
httpstorm.com
nano RTOS


Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 20:46:01 GMT) Full text and rfc822 format available.

Message #47 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 12:45:10 -0800
On 2/10/23 10:58, Pádraig Brady wrote:
> I was considering "touch"ing the timestamps after also,
> but it's better to just maintain them as we're
> pointing to the same data after all.

For POSIX conformance we must touch if the user has specified only POSIX 
options (and has not specified -p).

And it's not just a POSIX conformance issue. Ordinary users will be 
surprised if plain 'cp A B' creates a file B with a timestamp from last 
year.

Likewise for B's modes.

There's another complication: recent macOS versions have CLONE_ACL, and 
we're not using that.





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 21:36:01 GMT) Full text and rfc822 format available.

Message #50 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 23:35:12 +0200
> On 2023-02-10, at 10:45 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2/10/23 10:58, Pádraig Brady wrote:
>> I was considering "touch"ing the timestamps after also,
>> but it's better to just maintain them as we're
>> pointing to the same data after all.
> 
> For POSIX conformance we must touch if the user has specified only POSIX options (and has not specified -p).
> 
> And it's not just a POSIX conformance issue. Ordinary users will be surprised if plain 'cp A B' creates a file B with a timestamp from last year.
> 
> Likewise for B's modes.
> 
> There's another complication: recent macOS versions have CLONE_ACL, and we're not using that.

I personally prefer using CoW as default (no parameters), because it is way more efficient:
could save a lot of time and space, especially when long builds or large files are involved.
Of course I also understand that coreutils are expected to meet a certain standard behaviour
among all architectures. Since the source and it’s clone have separate metadata,
it should be possible to change it on the clone, to comply with standards. On the other hand
It feels more natural when the metadata is kept intact for the two files. This is a good
indication that no changes have occurred since the clone or copy, and they are the same.
This approach is also consistent with how macOS Finder copies files and directories:
all metadata is preserved.

Here is a good example: I create a tarball on my OpenWRT router. Extract on the Mac.
Copy to the build directory and create new firmware. This preserves all permissions and
metadata. So there is no need for any special tools or command arguments, it just works.
In this scenario, the permissions are critical to preserve. Timestamp is a good indication of
when the file was last changed, setting a different value on the copy is confusing and makes
it very hard to track which files in a copied directory has changed since.

That’s just my personal opinion for default operation and may not match standards,
but it makes work natural and productive. Command line arguments should be respected.

I found this link in one of the mailing lists you sent, it explains what was wrong
with the original SEEK_DATA and SEEK_HOLE approach on macOS, and why
we can’t relay on them:
https://lists.gnu.org/archive/html/bug-gnulib/2018-09/msg00054.html

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 21:47:02 GMT) Full text and rfc822 format available.

Message #53 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 21:46:35 +0000
On 10/02/2023 20:45, Paul Eggert wrote:
> On 2/10/23 10:58, Pádraig Brady wrote:
>> I was considering "touch"ing the timestamps after also,
>> but it's better to just maintain them as we're
>> pointing to the same data after all.
> 
> For POSIX conformance we must touch if the user has specified only POSIX
> options (and has not specified -p).
> 
> And it's not just a POSIX conformance issue. Ordinary users will be
> surprised if plain 'cp A B' creates a file B with a timestamp from last
> year.

Maybe. Though POSIX says cp "shall copy" and we're not making a copy, we're making a reflink.
So technically we're violating POSIX already in that regard.
One might take the view that the fact we write no new data
means we should not update the data modification time etc. by default,
and this may be more signal to a user that new data has not in fact been written.

> Likewise for B's modes.

Yes agreed on this. We'll have to honor umask etc.

> There's another complication: recent macOS versions have CLONE_ACL, and
> we're not using that.

Oh right, we should consider that too.

thanks,
Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 21:51:02 GMT) Full text and rfc822 format available.

Message #56 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 13:50:46 -0800
[Message part 1 (text/plain, inline)]
On 2/10/23 13:35, George Valkov wrote:

> Since the source and it’s clone have separate metadata,
> it should be possible to change it on the clone, to comply with standards.

Attached is a hacky patch to do that. It also uses the new CLONE_ACL 
flag. The old code apparently mishandled ACLs; the new code tries to do 
a better job.

This whole area is a mess, but the patch should improve correctness on 
macOS, while also being efficient in the usual case.


> It feels more natural when the metadata is kept intact for the two files.

That depends on the application. The longstanding tradition for cp is to 
preserve metadata if you use 'cp -p', and to not preserve with plain cp. 
For interactive use you can use an alias or a little script if you 
prefer -p to be enabled by default.


> Here is a good example: I create a tarball on my OpenWRT router. Extract on the Mac.

That's fine, because 'tar' historically has preserved the timestamp and 
(if you're root) permissions. Plain cp has not.


> I found this link in one of the mailing lists you sent, it explains what was wrong
> with the original SEEK_DATA and SEEK_HOLE approach on macOS

GNU cp does a pass over the file with SEEK_HOLE and SEEK_DATA, so if I 
understand things correctly it shouldn't run into the problem mentioned 
there.

In other words, that email doesn't appear to explain our current problem.
[0001-cp-improve-use-of-fclonefileat.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 22:04:01 GMT) Full text and rfc822 format available.

Message #59 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 22:02:54 +0000
On 10/02/2023 21:50, Paul Eggert wrote:
> On 2/10/23 13:35, George Valkov wrote:
> 
>> Since the source and it’s clone have separate metadata,
>> it should be possible to change it on the clone, to comply with standards.
> 
> Attached is a hacky patch to do that. It also uses the new CLONE_ACL
> flag. The old code apparently mishandled ACLs; the new code tries to do
> a better job.

Yes this is looking much better.
I'm not fully convinced that we should also update the times,
but not against it either.

> This whole area is a mess, but the patch should improve correctness on
> macOS, while also being efficient in the usual case.

Yes it's a pity there is no fd (data) only interface for this,
rather than dealing with names (metadata) too.

thanks,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 22:43:01 GMT) Full text and rfc822 format available.

Message #62 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 14:42:44 -0800
On 2/10/23 13:46, Pádraig Brady wrote:

> Maybe. Though POSIX says cp "shall copy" and we're not making a copy, 
> we're making a reflink.

If that were an important reason not to clone, then cp should not have 
made --reflink=auto the default, as clones would not be considered to be 
copies.

However, I'm comfortable with the idea that clones satisfy the POSIX 
definition of copying. The ordinary interpretation of "clone" means a 
copy, and at the POSIX level users can't tell the difference so the 
as-if rule applies. And if the file system does deduplication+CoW at the 
block level, with something like VDO say, there's no good way GNU 'cp' 
can force a copy rather than cloning regardless.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 10 Feb 2023 23:49:02 GMT) Full text and rfc822 format available.

Message #65 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 01:48:38 +0200

> On 2023-02-10, at 11:46 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 10/02/2023 20:45, Paul Eggert wrote:
>> On 2/10/23 10:58, Pádraig Brady wrote:
>>> I was considering "touch"ing the timestamps after also,
>>> but it's better to just maintain them as we're
>>> pointing to the same data after all.
>> For POSIX conformance we must touch if the user has specified only POSIX
>> options (and has not specified -p).
>> And it's not just a POSIX conformance issue. Ordinary users will be
>> surprised if plain 'cp A B' creates a file B with a timestamp from last
>> year.
> 
> Maybe. Though POSIX says cp "shall copy" and we're not making a copy, we're making a reflink.
> So technically we're violating POSIX already in that regard.

A hard link is when we have two or more names for the same file.
We can read or write and it affects the same disk content.
That would violate POSIX shall copy, but we are not doing a hard link.
A clone behaves exactly like a copy. Reads and writes affect only
the selected file. It’s rather an optimised copy.


> One might take the view that the fact we write no new data
> means we should not update the data modification time etc. by default,
> and this may be more signal to a user that new data has not in fact been written.

We can describe the changes in the documentation. From a user perspective,
everything works exactly as before, only faster. Some users might get surprised
the first time they copy several gigabytes and we finish instantly. If will check:
the copy is there and it works. So they’ll get used to it and be happy about the change.
I always wanted a tool that does CoW. I knew APFS supports it.

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Feb 2023 00:50:02 GMT) Full text and rfc822 format available.

Message #68 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>, Pádraig Brady
 <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 10 Feb 2023 16:48:54 -0800
I want to emphasize the fact that we need to get to the bottom of the 
SEEK_HOLE / SEEK_DATA situation on macOS. Core programs other than 'cp' 
use these options, and working around the bug in 'cp' won't fix the bug 
elsewhere.

One possibility is to modify Gnulib so that SEEK_HOLE and SEEK_DATA are 
never used by any Gnulib-using application. This will help somewhat. But 
not every program uses Gnulib.

The bug is really at the macOS level and it needs to get fixed there. 
Has anyone filed a bug report? (I'm not a macOS user and so can't file one.)




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Feb 2023 01:54:02 GMT) Full text and rfc822 format available.

Message #71 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 03:21:44 +0200


> On 2023-02-10, at 11:50 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2/10/23 13:35, George Valkov wrote:
> 
>> Since the source and it’s clone have separate metadata,
>> it should be possible to change it on the clone, to comply with standards.
> 
> Attached is a hacky patch to do that. It also uses the new CLONE_ACL flag. The old code apparently mishandled ACLs; the new code tries to do a better job.

Hint: remember to trace all code paths errors.
The program may and does steer in directions other than what you expected.

Without CLONE_ACL, UNIX permissions and macOS extended attributes
from the source file are preserved, timestamp is updated:
cp A B
-rw-------@   1 g     staff       332 Feb 11 01:57 A
-rw-------@   1 g     staff       332 Feb 11 02:19 B

With CLONE_ACL, I get 22 Invalid argument. Then falls-back to something else,
UNIX permissions and timestamp are preserved, extended attributes are lost:
cp -p A B
-rw-------@   1 g     staff       332 Feb 11 01:57 A
-rw-------    1 g     staff       332 Feb 11 01:57 B

fclonefileat also requires that the target does not exist, if it does we have to remove it,
otherwise it fails: 17 File exists. With your patch, cp does not print anything.

Now the question here is:
do we want to go the quick route of unlinking the destination and creating a fresh clone?
Or do we consider the possibility of multiple hard-links in which case if we want to update
a hard-link, we must overwrite the entire file and not use unlink, fclonefileat.
If we are absolutely certain that only one link exists, the first approach is safe
and more efficient.
If more than one link exist, we must always overwrite the existing data, to keep
hard-links consistent.


> This whole area is a mess, but the patch should improve correctness on macOS, while also being efficient in the usual case.

We need to resolve the issues described above first.


>> It feels more natural when the metadata is kept intact for the two files.
> 
> That depends on the application. The longstanding tradition for cp is to preserve metadata if you use 'cp -p', and to not preserve with plain cp. For interactive use you can use an alias or a little script if you prefer -p to be enabled by default.

Thank you for clarifying!
I used the sample I wrote in a previous message to test fclonefileat independently
of coreutils: the default behaviour is to preserve everything.
The manpage installed on macOS is from June 3, 2021. While it does describe the
behaviour of CLONE_ACL, reality shows it is not supported. CLONE_ACL fails with
22 Invalid argument. Which means additional code would be required to
create what cp thinks as default metadata, which might be prone to errors


>> Here is a good example: I create a tarball on my OpenWRT router. Extract on the Mac.
> 
> That's fine, because 'tar' historically has preserved the timestamp and (if you're root) permissions. Plain cp has not.

Fair enough. For comparison, here is the behaviour of coreutils 9.1 cp release:

copy UNIX permissions, without extended attributes or timestamp
cp A B
-rw-------@   1 g     staff       332 Feb 11 01:57 A
-rw-------    1 g     staff       332 Feb 11 02:40 B

copy UNIX permissions, extended attributes and timestamp
cp -p A B
-rw-------@   1 g     staff       332 Feb 11 01:57 A
-rw-------@   1 g     staff       332 Feb 11 01:57 B


Good luck!

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Feb 2023 01:56:02 GMT) Full text and rfc822 format available.

Message #74 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 03:48:00 +0200
> On 2023-02-11, at 2:48 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> I want to emphasize the fact that we need to get to the bottom of the SEEK_HOLE / SEEK_DATA situation on macOS. Core programs other than 'cp' use these options, and working around the bug in 'cp' won't fix the bug elsewhere.
> 
> One possibility is to modify Gnulib so that SEEK_HOLE and SEEK_DATA are never used by any Gnulib-using application. This will help somewhat. But not every program uses Gnulib.

Would it help, if I provide access to macOS?


> The bug is really at the macOS level and it needs to get fixed there. Has anyone filed a bug report? (I'm not a macOS user and so can't file one.)

I wrote to product-security <at> apple.com on 2021-10-08.
> I have reviewed your report, and we are investigating.
No further replay. I also tried phone and development support.
They are extremely hard to work with.
They always keep us in the dark and never admit to anything.
30 months of repair services, and erasing my data every time.
Perhaps we can try in public here? https://github.com/apple

It helps me value and appreciate working with OpenWRT and you kind people.


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Feb 2023 01:59:01 GMT) Full text and rfc822 format available.

Message #77 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 03:27:25 +0200
> On 2023-02-11, at 12:02 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 10/02/2023 21:50, Paul Eggert wrote:
>> On 2/10/23 13:35, George Valkov wrote:
>>> Since the source and it’s clone have separate metadata,
>>> it should be possible to change it on the clone, to comply with standards.
>> Attached is a hacky patch to do that. It also uses the new CLONE_ACL
>> flag. The old code apparently mishandled ACLs; the new code tries to do
>> a better job.
> 
> Yes this is looking much better.
> I'm not fully convinced that we should also update the times,
> but not against it either.

I’d prefer a timestamp consistent with source. This is the default when
we clone a file. So no action needed. If we fall-back to copy, we need to set it.


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Feb 2023 18:21:01 GMT) Full text and rfc822 format available.

Message #80 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>, Pádraig Brady
 <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 10:19:55 -0800
On 2023-02-10 17:27, George Valkov wrote:
> I’d prefer a timestamp consistent with source.

You can get that with 'cp -p' (or with 'cp --preserve=timestamps' etc.). 
So this is not an issue of whether you can get what you prefer. It's 
merely an issue about what cp's default behavior is, when the user 
doesn't specify options.

POSIX is reasonably clear that if you use plain cp to create a file, the 
file's last-modified time should be the current time. And it's not just 
a question of standards conformance: lots of real-world uses of 'cp' 
expect this behavior. For example, people use plain 'cp' in Makefiles, 
and 'make' relies on the last-modified time being the current time. If 
cp changed this behavior, it'd break a lot of builds.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Feb 2023 20:40:02 GMT) Full text and rfc822 format available.

Message #83 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 22:39:29 +0200
> On 2023-02-11, at 8:19 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2023-02-10 17:27, George Valkov wrote:
>> I’d prefer a timestamp consistent with source.
> 
> You can get that with 'cp -p' (or with 'cp --preserve=timestamps' etc.). So this is not an issue of whether you can get what you prefer. It's merely an issue about what cp's default behavior is, when the user doesn't specify options.

Thanks!


> POSIX is reasonably clear that if you use plain cp to create a file, the file's last-modified time should be the current time. And it's not just a question of standards conformance: lots of real-world uses of 'cp' expect this behavior. For example, people use plain 'cp' in Makefiles, and 'make' relies on the last-modified time being the current time. If cp changed this behavior, it'd break a lot of builds.

Ah, yes, make relays on timestamps. So we should update timestamp after clone by default.
Let me know when you have a patch without CLONE_ACL and I will test it.


Georgi Valkov
httpstorm.com
nano RTOS




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Feb 2023 21:54:01 GMT) Full text and rfc822 format available.

Message #86 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 13:53:43 -0800
[Message part 1 (text/plain, inline)]
On 2023-02-10 17:21, George Valkov wrote:

> remember to trace all code paths errors.

Yes, I've been doing that. However, I don't use macOS and the two of us 
are debugging remotely where I write the code and you debug it but 
neither of us know how macOS works. Of course it would be much more 
efficient if we weren't operating with these obstacles, but here we are.

Because the macOS behavior is poorly documented and we lack the source, 
if we can't figure this out fairly quickly coreutils should simply stop 
using fclonefileat because it's unreliable for users and a time sink for 
us. I'm hoping, though, we can get something working with another round 
or two.


> With CLONE_ACL, I get 22 Invalid argument.

OK, this means that although Apple has documented CLONE_ACL and it's in 
your include files, it doesn't work in your version of macOS, because 
either it's not supported by the kernel, or by the filesystem code, or 
by the particular filesystem instance, or for some other reason. EINVAL 
hints that it's the kernel but that's by no means certain.

One possible way to defend against this macOS misbehavior is for cp to 
try to use CLONE_ACL, and if that fails with errno == EINVAL try again 
without CLONE_ACL. I attempted to implement this in the attached patch; 
please give it a try.


> do we want to go the quick route of unlinking the destination and creating a fresh clone?

No, POSIX is clear that plain cp should not unlink and re-create the 
destination. Of course we can add a non-POSIX flag to do that, and we 
already have: cp --remove-destination.
[0001-cp-improve-use-of-fclonefileat.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 12 Feb 2023 00:39:01 GMT) Full text and rfc822 format available.

Message #89 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sun, 12 Feb 2023 02:38:39 +0200
> On 2023-02-11, at 11:53 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2023-02-10 17:21, George Valkov wrote:
> 
>> remember to trace all code paths errors.
> 
> Yes, I've been doing that. However, I don't use macOS and the two of us are debugging remotely where I write the code and you debug it but neither of us know how macOS works. Of course it would be much more efficient if we weren't operating with these obstacles, but here we are.

Yes, it’s true. It takes a lot of energy from both of us. Especially the time to document observations in hope to be helpful. There is a saying that two people working together is more than each on their own.
Let’s work together! I have a good experience with mutiplatform development: macOS, Linux, BSD, Windows. For the most part it’s the same as Linux with minor differences, so pretty easy to support according to my experience. Build environment and debugging tools are all set. That will keep you comfortable and productive. Just drop me a private mail and we can set a conference call. Then test for standard compliance according to manpage.


> Because the macOS behavior is poorly documented and we lack the source, if we can't figure this out fairly quickly coreutils should simply stop using fclonefileat because it's unreliable for users and a time sink for us. I'm hoping, though, we can get something working with another round or two.

Sad but true. This might help:
https://github.com/apple/darwin-xnu/blob/main/bsd/sys/clonefile.h
The kernel should be open source. We might be able to find the implementation there.
I hope we can keep fclonefileat.


>> With CLONE_ACL, I get 22 Invalid argument.
> 
> OK, this means that although Apple has documented CLONE_ACL and it's in your include files, it doesn't work in your version of macOS, because either it's not supported by the kernel, or by the filesystem code, or by the particular filesystem instance, or for some other reason. EINVAL hints that it's the kernel but that's by no means certain.
> 
> One possible way to defend against this macOS misbehavior is for cp to try to use CLONE_ACL, and if that fails with errno == EINVAL try again without CLONE_ACL. I attempted to implement this in the attached patch; please give it a try.

Good approach. Tour patch confirms it works. copy.c says it has been introduced in 12.6, I’m using 12.6.3. That suggests it is available in headers, but not available in the kernel yet. I will check if it is implemented in the recovery environment for macOS 13.

I added some code for tracing. The check works as intended.

int s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
int a = s ? errno : 0;
if (s != 0 && (fc_flags & CLONE_ACL) != 0 && errno == EINVAL)
  {
    printf("s %2i  errno %2i %s  fc_flags %x !!\n", s, a, strerror(a), fc_flags);
    fc_flags &= ~CLONE_ACL;
    s = fclonefileat (source_desc, dst_dirfd, dst_relname, fc_flags);
    a = s ? errno : 0;
  }
  printf("s %2i  errno %2i %s  fc_flags %x\n", s, a, strerror(a), fc_flags);

./coreutils-clone/src/cp A B
s  0  errno  0 Success  fc_flags 2

./coreutils-clone/src/cp -p A C
s -1  errno 22 Invalid argument  fc_flags 4 !!
s  0  errno  0 Success  fc_flags 0

Extended attributes are preserved in both copies: note the @ flag.
-rw-------@    1 g     staff       415 Feb 11 02:10 A
-rw-------@    1 g     staff       415 Feb 12 01:30 B
-rw-------@    1 g     staff       415 Feb 11 02:10 C


Good luck, Paul!

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 12 Feb 2023 00:48:01 GMT) Full text and rfc822 format available.

Message #92 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Feb 2023 16:47:24 -0800
On 2023-02-11 16:38, George Valkov wrote:
> This might help:
> https://github.com/apple/darwin-xnu/blob/main/bsd/sys/clonefile.h

It doesn't help, because it doesn't mention CLONE_ACL.

> The kernel should be open source.

No, the macOS kernel is not free software. Nor is iOS's kernel.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 12 Feb 2023 02:19:01 GMT) Full text and rfc822 format available.

Message #95 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sun, 12 Feb 2023 04:18:17 +0200
> On 2023-02-12, at 2:47 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2023-02-11 16:38, George Valkov wrote:
>> This might help:
>> https://github.com/apple/darwin-xnu/blob/main/bsd/sys/clonefile.h
> 
> It doesn't help, because it doesn't mention CLONE_ACL.

Here is what I found: The version of vfs_syscalls.c on that repository is 3 years old and does not support CLONE_ACL. Still it should provide a good idea about the implementation before this flags was introduced.

https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/bsd/vfs/vfs_syscalls.c#L8201

if (uap->flags & ~(CLONE_NOFOLLOW | CLONE_NOOWNERCOPY)) {
	return EINVAL;
}

I was able to run my sample on macOS 13 recovery environment. The CLONE_ACL flag is supported there. I don’t see any difference in the final result with or without the flag. Both clones have UNIX permissions, extended attributes and time stamp from the source.

Darwin gMac.lan 22.3.0 Darwin Kernel Version 22.3.0: Thu Jan  5 20:53:49 PST 2023; root:xnu-8792.81.2~2/RELEASE_X86_64 x86_64

fd 3  dir 4
fclonefileat  0   0 Undefined error: 0
fclonefileat  0   0 Undefined error: 0 CLONE_ACL

-rw-------@  1 501   staff    553 12 Feb 00:50 A
-rw-------@  1 501   staff    553 12 Feb 00:50 B
-rw-------@  1 501   staff    553 12 Feb 00:50 CLONE_ACL


I tried running cp with your patch there, but it depends on a dynamic library and fails to run. My attempt to use chroot failed, probably due to file signatures: Killed 9.


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 12 Feb 2023 14:16:01 GMT) Full text and rfc822 format available.

Message #98 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sun, 12 Feb 2023 14:15:45 +0000
On 11/02/2023 21:53, Paul Eggert wrote:
> On 2023-02-10 17:21, George Valkov wrote:
> 
>> remember to trace all code paths errors.
> 
> Yes, I've been doing that. However, I don't use macOS and the two of us
> are debugging remotely where I write the code and you debug it but
> neither of us know how macOS works. Of course it would be much more
> efficient if we weren't operating with these obstacles, but here we are.
> 
> Because the macOS behavior is poorly documented and we lack the source,
> if we can't figure this out fairly quickly coreutils should simply stop
> using fclonefileat because it's unreliable for users and a time sink for
> us. I'm hoping, though, we can get something working with another round
> or two.
> 
> 
>> With CLONE_ACL, I get 22 Invalid argument.
> 
> OK, this means that although Apple has documented CLONE_ACL and it's in
> your include files, it doesn't work in your version of macOS, because
> either it's not supported by the kernel, or by the filesystem code, or
> by the particular filesystem instance, or for some other reason. EINVAL
> hints that it's the kernel but that's by no means certain.
> 
> One possible way to defend against this macOS misbehavior is for cp to
> try to use CLONE_ACL, and if that fails with errno == EINVAL try again
> without CLONE_ACL. I attempted to implement this in the attached patch;
> please give it a try.

I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
I tested with various umask, --no-preserve=mode, -p, ... combinations
and didn't find any inconsistencies in permissions with --reflink=never copies.
Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
I've not got access at the moment, so will need to retest later with newer xcode.

One divergence with fclonefileat() is that it writes through a dangling symlink,
where a standard --reflink=never copy will fail.
I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
Now there is nothing wrong with writing through the dangling symlink,
and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
So I may just need need to adjust the test to work / skip in this case.

cheers,
Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 12 Feb 2023 20:37:02 GMT) Full text and rfc822 format available.

Message #101 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sun, 12 Feb 2023 20:36:24 +0000
On 12/02/2023 14:15, Pádraig Brady wrote:
> On 11/02/2023 21:53, Paul Eggert wrote:
>> On 2023-02-10 17:21, George Valkov wrote:
>>
>>> remember to trace all code paths errors.
>>
>> Yes, I've been doing that. However, I don't use macOS and the two of us
>> are debugging remotely where I write the code and you debug it but
>> neither of us know how macOS works. Of course it would be much more
>> efficient if we weren't operating with these obstacles, but here we are.
>>
>> Because the macOS behavior is poorly documented and we lack the source,
>> if we can't figure this out fairly quickly coreutils should simply stop
>> using fclonefileat because it's unreliable for users and a time sink for
>> us. I'm hoping, though, we can get something working with another round
>> or two.
>>
>>
>>> With CLONE_ACL, I get 22 Invalid argument.
>>
>> OK, this means that although Apple has documented CLONE_ACL and it's in
>> your include files, it doesn't work in your version of macOS, because
>> either it's not supported by the kernel, or by the filesystem code, or
>> by the particular filesystem instance, or for some other reason. EINVAL
>> hints that it's the kernel but that's by no means certain.
>>
>> One possible way to defend against this macOS misbehavior is for cp to
>> try to use CLONE_ACL, and if that fails with errno == EINVAL try again
>> without CLONE_ACL. I attempted to implement this in the attached patch;
>> please give it a try.
> 
> I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
> I tested with various umask, --no-preserve=mode, -p, ... combinations
> and didn't find any inconsistencies in permissions with --reflink=never copies.
> Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
> I've not got access at the moment, so will need to retest later with newer xcode.

Good news is that with newer xcode supporting macOS 13.1
CLONE_ACL is set and that all works as expected, with the
first fclonefileat() succeeding.
I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" file`),
and they were copied or dropped as expected. Note ls -l on macOS will list
ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.

> One divergence with fclonefileat() is that it writes through a dangling symlink,
> where a standard --reflink=never copy will fail.
> I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
> Now there is nothing wrong with writing through the dangling symlink,
> and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
> So I may just need need to adjust the test to work / skip in this case.

Another divergence I noticed is wrt umask and setuid handling.
The following transcript shows that with fclonefileat() we do _not_
preserve setuid when it is preserved without.
Also with fclonefileat() the umask is not honored,
while it is honored without.

# id -u
0

# echo create setuid file
# rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s

# umask; for reflink in always never; do
>  for preserve in mode timestamps; do
>   rm -f src/cp.s.$reflink.$preserve
>   src/cp --reflink=$reflink --preserve=$preserve src/cp.s src/cp.s.$reflink.$preserve
>  done
> done
> ls -le src/cp.s*
0022
-rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
-rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
-rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
-rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
-rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 12 Feb 2023 23:38:02 GMT) Full text and rfc822 format available.

Message #104 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 13 Feb 2023 01:37:49 +0200
> On 2023-02-12, at 10:36 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 12/02/2023 14:15, Pádraig Brady wrote:
>> On 11/02/2023 21:53, Paul Eggert wrote:
>>> On 2023-02-10 17:21, George Valkov wrote:
>>> 
>>>> remember to trace all code paths errors.
>>> 
>>> Yes, I've been doing that. However, I don't use macOS and the two of us
>>> are debugging remotely where I write the code and you debug it but
>>> neither of us know how macOS works. Of course it would be much more
>>> efficient if we weren't operating with these obstacles, but here we are.
>>> 
>>> Because the macOS behavior is poorly documented and we lack the source,
>>> if we can't figure this out fairly quickly coreutils should simply stop
>>> using fclonefileat because it's unreliable for users and a time sink for
>>> us. I'm hoping, though, we can get something working with another round
>>> or two.
>>> 
>>> 
>>>> With CLONE_ACL, I get 22 Invalid argument.
>>> 
>>> OK, this means that although Apple has documented CLONE_ACL and it's in
>>> your include files, it doesn't work in your version of macOS, because
>>> either it's not supported by the kernel, or by the filesystem code, or
>>> by the particular filesystem instance, or for some other reason. EINVAL
>>> hints that it's the kernel but that's by no means certain.
>>> 
>>> One possible way to defend against this macOS misbehavior is for cp to
>>> try to use CLONE_ACL, and if that fails with errno == EINVAL try again
>>> without CLONE_ACL. I attempted to implement this in the attached patch;
>>> please give it a try.
>> I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
>> I tested with various umask, --no-preserve=mode, -p, ... combinations
>> and didn't find any inconsistencies in permissions with --reflink=never copies.
>> Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
>> I've not got access at the moment, so will need to retest later with newer xcode.
> 
> Good news is that with newer xcode supporting macOS 13.1
> CLONE_ACL is set and that all works as expected, with the
> first fclonefileat() succeeding.

I can confirm that. CLONE_ACL is 4 when building on macOS 12.6.3. I rand the compiled sample on the recovery environment, which is equivalent to macOS 13.3. The flag is supported, though I did not observe any difference with or without it. The sample was ran as root, while the source and destination were set to g:staff 600 @.


> I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" file`),
> and they were copied or dropped as expected. Note ls -l on macOS will list
> ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.

@ is shown on files that have extended attributes. I can remove them with
xattr -cr . # recursive
xattr -cr file # single item

+ means ACL

ll -e -@
drwx------@   3 g     staff     96 May 16  2021 Applications/
	com.apple.quarantine	   21
drwx------+   8 g     staff    256 May 16  2021 Music/
 0: group:everyone deny delete


>> One divergence with fclonefileat() is that it writes through a dangling symlink,
>> where a standard --reflink=never copy will fail.
>> I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
>> Now there is nothing wrong with writing through the dangling symlink,
>> and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
>> So I may just need need to adjust the test to work / skip in this case.

The same test fails on macOS 12.6.3. See test-01*.txt.
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/

I ran your test, followed by make check-TESTS for both original (8 failed) and patched (12 filed).


> Another divergence I noticed is wrt umask and setuid handling.
> The following transcript shows that with fclonefileat() we do _not_
> preserve setuid when it is preserved without.
> Also with fclonefileat() the umask is not honored,
> while it is honored without.
> 
> # id -u
> 0
> 
> # echo create setuid file
> # rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s
> 
> # umask; for reflink in always never; do
> >  for preserve in mode timestamps; do
> >   rm -f src/cp.s.$reflink.$preserve
> >   src/cp --reflink=$reflink --preserve=$preserve src/cp.s src/cp.s.$reflink.$preserve
> >  done
> > done
> > ls -le src/cp.s*
> 0022
> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
> -rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps

The manpage is clonefile.2.txt, use the link above.


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 13 Feb 2023 14:06:01 GMT) Full text and rfc822 format available.

Message #107 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 13 Feb 2023 14:05:40 +0000
On 12/02/2023 20:36, Pádraig Brady wrote:
> On 12/02/2023 14:15, Pádraig Brady wrote:
>> On 11/02/2023 21:53, Paul Eggert wrote:
>>> On 2023-02-10 17:21, George Valkov wrote:
>>>
>>>> remember to trace all code paths errors.
>>>
>>> Yes, I've been doing that. However, I don't use macOS and the two of us
>>> are debugging remotely where I write the code and you debug it but
>>> neither of us know how macOS works. Of course it would be much more
>>> efficient if we weren't operating with these obstacles, but here we are.
>>>
>>> Because the macOS behavior is poorly documented and we lack the source,
>>> if we can't figure this out fairly quickly coreutils should simply stop
>>> using fclonefileat because it's unreliable for users and a time sink for
>>> us. I'm hoping, though, we can get something working with another round
>>> or two.
>>>
>>>
>>>> With CLONE_ACL, I get 22 Invalid argument.
>>>
>>> OK, this means that although Apple has documented CLONE_ACL and it's in
>>> your include files, it doesn't work in your version of macOS, because
>>> either it's not supported by the kernel, or by the filesystem code, or
>>> by the particular filesystem instance, or for some other reason. EINVAL
>>> hints that it's the kernel but that's by no means certain.
>>>
>>> One possible way to defend against this macOS misbehavior is for cp to
>>> try to use CLONE_ACL, and if that fails with errno == EINVAL try again
>>> without CLONE_ACL. I attempted to implement this in the attached patch;
>>> please give it a try.
>>
>> I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
>> I tested with various umask, --no-preserve=mode, -p, ... combinations
>> and didn't find any inconsistencies in permissions with --reflink=never copies.
>> Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
>> I've not got access at the moment, so will need to retest later with newer xcode.
> 
> Good news is that with newer xcode supporting macOS 13.1
> CLONE_ACL is set and that all works as expected, with the
> first fclonefileat() succeeding.
> I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" file`),
> and they were copied or dropped as expected. Note ls -l on macOS will list
> ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.
> 
>> One divergence with fclonefileat() is that it writes through a dangling symlink,
>> where a standard --reflink=never copy will fail.
>> I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
>> Now there is nothing wrong with writing through the dangling symlink,
>> and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
>> So I may just need need to adjust the test to work / skip in this case.
> 
> Another divergence I noticed is wrt umask and setuid handling.
> The following transcript shows that with fclonefileat() we do _not_
> preserve setuid when it is preserved without.
> Also with fclonefileat() the umask is not honored,
> while it is honored without.
> 
> # id -u
> 0
> 
> # echo create setuid file
> # rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s
> 
> # umask; for reflink in always never; do
>   >  for preserve in mode timestamps; do
>   >   rm -f src/cp.s.$reflink.$preserve
>   >   src/cp --reflink=$reflink --preserve=$preserve src/cp.s src/cp.s.$reflink.$preserve
>   >  done
>   > done
>   > ls -le src/cp.s*
> 0022
> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
> -rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps

This amendment seems to make the treatment of umask and setuid consistent
between using fclonefileat() and not.

diff --git a/src/copy.c b/src/copy.c
index 782fab98d..8370f55bd 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1280,10 +1280,11 @@ copy_reg (char const *src_name, char const *dst_name,
                       return_val = false;
                     }
                 }
-              mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
+              mode_already_preserved = (fc_flags & CLONE_ACL) != 0
+                                       && cloned_mode == src_mode;
               dest_desc = -1;
               omitted_permissions = dst_mode & ~cloned_mode;
-              extra_permissions = 0;
+              extra_permissions = dst_mode & ~ cached_umask ();
               goto set_dest_mode;
             }
           else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,

With this, all tests pass (apart from thru-dangling.sh as mentioned above
which I'll skip in another patch).
Paul I'm happy to merge this amendment, mention the improvement in NEWS,
and push in your name if you're OK with it.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 13 Feb 2023 14:16:02 GMT) Full text and rfc822 format available.

Message #110 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 13 Feb 2023 14:15:22 +0000
On 12/02/2023 23:37, George Valkov wrote:
> 
> I can confirm that. CLONE_ACL is 4 when building on macOS 12.6.3. I rand the compiled sample on the recovery environment, which is equivalent to macOS 13.3. The flag is supported, though I did not observe any difference with or without it. The sample was ran as root, while the source and destination were set to g:staff 600 @.

I guess the CLONE_ACL value is more dependent on xcode rather than macOS version.
I confirmed again on macOS 13.2 that CLONE_ACL is honored and behaves as expected.
I guess on 12.6.3 it's failing with EINVAL and falling back to the CLONE_ACL==0 case.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 13 Feb 2023 17:17:01 GMT) Full text and rfc822 format available.

Message #113 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 13 Feb 2023 19:16:11 +0200
[Message part 1 (text/plain, inline)]
> On 2023-02-13, at 4:05 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 12/02/2023 20:36, Pádraig Brady wrote:
>> On 12/02/2023 14:15, Pádraig Brady wrote:
>>> On 11/02/2023 21:53, Paul Eggert wrote:
>>>> On 2023-02-10 17:21, George Valkov wrote:
>>>> 
>>>>> remember to trace all code paths errors.
>>>> 
>>>> Yes, I've been doing that. However, I don't use macOS and the two of us
>>>> are debugging remotely where I write the code and you debug it but
>>>> neither of us know how macOS works. Of course it would be much more
>>>> efficient if we weren't operating with these obstacles, but here we are.
>>>> 
>>>> Because the macOS behavior is poorly documented and we lack the source,
>>>> if we can't figure this out fairly quickly coreutils should simply stop
>>>> using fclonefileat because it's unreliable for users and a time sink for
>>>> us. I'm hoping, though, we can get something working with another round
>>>> or two.
>>>> 
>>>> 
>>>>> With CLONE_ACL, I get 22 Invalid argument.
>>>> 
>>>> OK, this means that although Apple has documented CLONE_ACL and it's in
>>>> your include files, it doesn't work in your version of macOS, because
>>>> either it's not supported by the kernel, or by the filesystem code, or
>>>> by the particular filesystem instance, or for some other reason. EINVAL
>>>> hints that it's the kernel but that's by no means certain.
>>>> 
>>>> One possible way to defend against this macOS misbehavior is for cp to
>>>> try to use CLONE_ACL, and if that fails with errno == EINVAL try again
>>>> without CLONE_ACL. I attempted to implement this in the attached patch;
>>>> please give it a try.
>>> 
>>> I tested this on macOS 13.2 and the first fclonefileat(..., CLONE_ACL) succeeds.
>>> I tested with various umask, --no-preserve=mode, -p, ... combinations
>>> and didn't find any inconsistencies in permissions with --reflink=never copies.
>>> Actually scratch that. There was an older xcode installed so CLONE_ACL was 0.
>>> I've not got access at the moment, so will need to retest later with newer xcode.
>> Good news is that with newer xcode supporting macOS 13.1
>> CLONE_ACL is set and that all works as expected, with the
>> first fclonefileat() succeeding.
>> I checked with actual ACLs (set with `chmod +a "user:$USER allow readattr" file`),
>> and they were copied or dropped as expected. Note ls -l on macOS will list
>> ACLs with a "+" rather than a "@", as was mentioned elsewhere in this thread.
>>> One divergence with fclonefileat() is that it writes through a dangling symlink,
>>> where a standard --reflink=never copy will fail.
>>> I.e. `make TESTS=tests/cp/thru-dangling.sh SUBDIRS=. check` fails.
>>> Now there is nothing wrong with writing through the dangling symlink,
>>> and in fact standard cp will also with POSIXLY_CORRECT set in the environment.
>>> So I may just need need to adjust the test to work / skip in this case.
>> Another divergence I noticed is wrt umask and setuid handling.
>> The following transcript shows that with fclonefileat() we do _not_
>> preserve setuid when it is preserved without.
>> Also with fclonefileat() the umask is not honored,
>> while it is honored without.
>> # id -u
>> 0
>> # echo create setuid file
>> # rm src/cp.s; src/cp src/cp src/cp.s; chmod u+s src/cp.s
>> # umask; for reflink in always never; do
>>  >  for preserve in mode timestamps; do
>>  >   rm -f src/cp.s.$reflink.$preserve
>>  >   src/cp --reflink=$reflink --preserve=$preserve src/cp.s src/cp.s.$reflink.$preserve
>>  >  done
>>  > done
>>  > ls -le src/cp.s*
>> 0022
>> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s
>> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.mode
>> -rwxrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.always.timestamps
>> -rwsrwxr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.mode
>> -rwxr-xr-x  1 root  staff  234032 12 Feb 20:21 src/cp.s.never.timestamps
> 
> This amendment seems to make the treatment of umask and setuid consistent
> between using fclonefileat() and not.
> 
> diff --git a/src/copy.c b/src/copy.c
> index 782fab98d..8370f55bd 100644
> --- a/src/copy.c
> +++ b/src/copy.c
> @@ -1280,10 +1280,11 @@ copy_reg (char const *src_name, char const *dst_name,
>                       return_val = false;
>                     }
>                 }
> -              mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
> +              mode_already_preserved = (fc_flags & CLONE_ACL) != 0
> +                                       && cloned_mode == src_mode;
>               dest_desc = -1;
>               omitted_permissions = dst_mode & ~cloned_mode;
> -              extra_permissions = 0;
> +              extra_permissions = dst_mode & ~ cached_umask ();
>               goto set_dest_mode;
>             }
>           else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,
> 
> With this, all tests pass (apart from thru-dangling.sh as mentioned above
> which I'll skip in another patch).
> Paul I'm happy to merge this amendment, mention the improvement in NEWS,
> and push in your name if you're OK with it.

To summarise:
1. We apply my original patch to disable sparse-copy on macOS. Otherwise since fclonefileat is not used whenever we overwrite a file, corruption will still occur. An updated copy of the patch is attached below:
By doing this we fail the following test so you have to disable it on macOS:
FAIL: tests/cp/sparse-perf.sh
[0001-cp-mv-install-Disable-sparse-copy-on-macOS.patch (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]


2. Next we add support for fclonefileat using cp: improve use of fclonefileat
A few more tests fail.

3. We fix those tests using the patch from your last mail. Then we have to disable the following test on macOS:
FAIL: tests/cp/thru-dangling.sh

Final results: copying sparse-files on macOS no longer produces corrupted copies. Performance is greatly optimised by cloning instead of copying, where possible. The following two tests now fail on macOS and will be disabled:
FAIL: tests/cp/sparse-perf.sh
FAIL: tests/cp/thru-dangling.sh

Looks good to me. Yes, please add me to the commit history. Also feel free to add my web site if appropriate. Finally please let me know when the patches are pushed, so that I can update OpenWRT to use the latest version of coreutils.

Pádraig and Paul, thank you very much for your kind help! It was nice working with you and learning new experience. Cheers, mates!


Georgi Valkov
httpstorm.com
nano RTOS


Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 13 Feb 2023 17:22:02 GMT) Full text and rfc822 format available.

Message #116 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 13 Feb 2023 19:21:04 +0200
Test results on macOS 12.6.3
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-02-d374d32ccf12f8cf33c8f823d573498b7c8b27a4-clone.txt


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 13 Feb 2023 18:05:02 GMT) Full text and rfc822 format available.

Message #119 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 13 Feb 2023 18:04:18 +0000
On 13/02/2023 17:16, George Valkov wrote:
> 
>> On 2023-02-13, at 4:05 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>> This amendment seems to make the treatment of umask and setuid consistent
>> between using fclonefileat() and not.
>>
>> diff --git a/src/copy.c b/src/copy.c
>> index 782fab98d..8370f55bd 100644
>> --- a/src/copy.c
>> +++ b/src/copy.c
>> @@ -1280,10 +1280,11 @@ copy_reg (char const *src_name, char const *dst_name,
>>                        return_val = false;
>>                      }
>>                  }
>> -              mode_already_preserved = (fc_flags & CLONE_ACL) != 0;
>> +              mode_already_preserved = (fc_flags & CLONE_ACL) != 0
>> +                                       && cloned_mode == src_mode;
>>                dest_desc = -1;
>>                omitted_permissions = dst_mode & ~cloned_mode;
>> -              extra_permissions = 0;
>> +              extra_permissions = dst_mode & ~ cached_umask ();
>>                goto set_dest_mode;
>>              }
>>            else if (! handle_clone_fail (dst_dirfd, dst_relname, src_name,
>>
>> With this, all tests pass (apart from thru-dangling.sh as mentioned above
>> which I'll skip in another patch).
>> Paul I'm happy to merge this amendment, mention the improvement in NEWS,
>> and push in your name if you're OK with it.
> 
> To summarise:
> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise since fclonefileat is not used whenever we overwrite a file, corruption will still occur. An updated copy of the patch is attached below:
> By doing this we fail the following test so you have to disable it on macOS:
> FAIL: tests/cp/sparse-perf.sh

Right. I'll adjust your patch to also get
tests/seek-data-capable to return false on Darwin,
which will then skip that test.

> 2. Next we add support for fclonefileat using cp: improve use of fclonefileat
> A few more tests fail.
> 
> 3. We fix those tests using the patch from your last mail. Then we have to disable the following test on macOS:
> FAIL: tests/cp/thru-dangling.sh
> 
> Final results: copying sparse-files on macOS no longer produces corrupted copies. Performance is greatly optimised by cloning instead of copying, where possible. The following two tests now fail on macOS and will be disabled:
> FAIL: tests/cp/sparse-perf.sh

handled above

> FAIL: tests/cp/thru-dangling.sh

I'll fix this

> Looks good to me. Yes, please add me to the commit history. Also feel free to add my web site if appropriate. Finally please let me know when the patches are pushed, so that I can update OpenWRT to use the latest version of coreutils.
> 
> Pádraig and Paul, thank you very much for your kind help! It was nice working with you and learning new experience. Cheers, mates!

Summary sounds good.

Thanks for all the testing.

Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 14 Feb 2023 06:13:01 GMT) Full text and rfc822 format available.

Message #122 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>, Pádraig Brady
 <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 13 Feb 2023 22:12:43 -0800
[Message part 1 (text/plain, inline)]
On 2023-02-13 09:16, George Valkov wrote:
> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise since fclonefileat is not used whenever we overwrite a file, corruption will still occur.

I'm not entirely sold on this patch, because I still don't understand 
what's happening. The original bug report in 
<https://github.com/openwrt/openwrt/pull/11960> basically says only "it 
doesn't work", and I'd like to know more.

Part of the reason I'm hesitating is that there are multiple ways of 
interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just 
that Apple has come up with yet another way to interpret it, which we 
need to know about.

Another reason to hesitate is that GNU coreutils is not the only core 
program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic 
Apple problem we need to know which Apple releases have it, so that we 
can program around it at the Gnulib level instead of mucking about with 
each individual program.

Yet another reason is that perhaps the bug was introduced by this pair 
of changes introduced to Gnulib in November 2021 to try to address 
<https://bugs.gnu.org/51857>:

https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4db8db34112b86ddf8bac48f16b5acff732b5fa9
https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=1a268176fbb184e393c98575e61fe692264c7d91

These Gnulib patches are attached. If you revert these, does the problem 
go away?

And yet another reason is that perhaps the bug was introduced by this 
pair of Coreutils changes made in October 2021 to try to address 
<https://bugs.gnu.org/51433>:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a9e31457bf6a63072b54a9324cdf59a09441a21f
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=1753012b8d8cdd0817155f4756f5bf9bfe347816

These Coreutils patches are also attached. Does reverting them help?

All things considered, I'd like a copy of the dtruss output for 
unmodified coreutils 9.1 cp on the failing case. This is really helpful 
for debugging this sort of thing. It's what helped us debug Bug#51857.
[0001-lseek-port-around-macOS-SEEK_DATA-glitch.patch (text/x-patch, attachment)]
[0002-lseek-port-around-macOS-SEEK_HOLE-glitch.patch (text/x-patch, attachment)]
[0001-cp-defend-better-against-FreeBSD-9.1-zfs-bug.patch (text/x-patch, attachment)]
[0002-cp-revert-unnecessary-FreeBSD-workaround.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 14 Feb 2023 07:05:01 GMT) Full text and rfc822 format available.

Message #125 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 14 Feb 2023 09:04:02 +0200
> On 2023-02-14, at 8:12 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2023-02-13 09:16, George Valkov wrote:
>> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise since fclonefileat is not used whenever we overwrite a file, corruption will still occur.
> 
> I'm not entirely sold on this patch, because I still don't understand what's happening. The original bug report in <https://github.com/openwrt/openwrt/pull/11960> basically says only "it doesn't work", and I'd like to know more.
> 
> Part of the reason I'm hesitating is that there are multiple ways of interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just that Apple has come up with yet another way to interpret it, which we need to know about.
> 
> Another reason to hesitate is that GNU coreutils is not the only core program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic Apple problem we need to know which Apple releases have it, so that we can program around it at the Gnulib level instead of mucking about with each individual program.
> 
> Yet another reason is that perhaps the bug was introduced by this pair of changes introduced to Gnulib in November 2021 to try to address <https://bugs.gnu.org/51857>:
> 
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=4db8db34112b86ddf8bac48f16b5acff732b5fa9
> https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=1a268176fbb184e393c98575e61fe692264c7d91
> 
> These Gnulib patches are attached. If you revert these, does the problem go away?
> 
> And yet another reason is that perhaps the bug was introduced by this pair of Coreutils changes made in October 2021 to try to address <https://bugs.gnu.org/51433>:
> 
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=a9e31457bf6a63072b54a9324cdf59a09441a21f
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=1753012b8d8cdd0817155f4756f5bf9bfe347816
> 
> These Coreutils patches are also attached. Does reverting them help?
> 
> All things considered, I'd like a copy of the dtruss output for unmodified coreutils 9.1 cp on the failing case. This is really helpful for debugging this sort of thing. It's what helped us debug Bug#51857.<0001-lseek-port-around-macOS-SEEK_DATA-glitch.patch><0002-lseek-port-around-macOS-SEEK_HOLE-glitch.patch><0001-cp-defend-better-against-FreeBSD-9.1-zfs-bug.patch><0002-cp-revert-unnecessary-FreeBSD-workaround.patch>


Hi Paul, I can check later today. One thing I notices is that the sparse-copy corruption has been present on 2021-10-06. (October 6th). As I recall it was not present on coreutils-8, it was introduced as soon as coreutils-9 came, precisely ever since sparse-copy was enabled. I will have to check if I can find any old logs or investigate again. But I don’t think reverting these patches will make any difference, since the issue was present before them.

Can we bring someone from the gcc team here? They know best how the build system for gcc works, and might be able to create a simple PoC to craft a sparse file that gets corrupted on copy. Currently I need to rebuild gcc, which takes 2-3 minutes and we have no clue how it is crafted.

Any idea how to make Apple aware? Last time I contacted the product-security <at> apple.com team on 2021-10-06, the replayed two days later that they are investigating, but did not bother fixing the underlaying issue. Perhaps if you contact them as a member of GNU/coreutils, they might be more open to work? If you do, refer them to Follow-up: 782620861.

I’ll come back to test your requests, at a late point today.

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 14 Feb 2023 12:13:01 GMT) Full text and rfc822 format available.

Message #128 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 14 Feb 2023 12:12:45 +0000
[Message part 1 (text/plain, inline)]
On 14/02/2023 06:12, Paul Eggert wrote:
> On 2023-02-13 09:16, George Valkov wrote:
>> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise since fclonefileat is not used whenever we overwrite a file, corruption will still occur.
> 
> I'm not entirely sold on this patch, because I still don't understand
> what's happening. The original bug report in
> <https://github.com/openwrt/openwrt/pull/11960> basically says only "it
> doesn't work", and I'd like to know more.
> 
> Part of the reason I'm hesitating is that there are multiple ways of
> interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just
> that Apple has come up with yet another way to interpret it, which we
> need to know about.

I agree it would be good to know more.
However given it works on HFS but fails on APFS suggests a file system specific issue,
rather than an API mismatch issue (over what we're already catering for on macOS).
I suspect it's a similar issue to the one that openzfs had:
https://github.com/openzfs/zfs/issues/11900

Given how closed / uncommunicative Apple are in general
and specifically for this already reported bug,
I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.

Also we've mitigated the impact somewhat by enabling fclonefileat() in more cases.

> Another reason to hesitate is that GNU coreutils is not the only core
> program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic
> Apple problem we need to know which Apple releases have it, so that we
> can program around it at the Gnulib level instead of mucking about with
> each individual program.

Yes that would be best if possible.

I've attached the 3 latest patches I'm considering in this area.
I presume you're OK with your amended fclonefileat() improvement one?

thanks,
Pádraig
[0001-copy-improve-use-of-fclonefileat.patch (text/x-patch, attachment)]
[0002-tests-allow-copying-through-dangling-symlinks-on-mac.patch (text/x-patch, attachment)]
[0003-copy-disable-sparse-copy-on-macOS.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 14 Feb 2023 19:03:01 GMT) Full text and rfc822 format available.

Message #131 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 14 Feb 2023 11:02:15 -0800
On 2023-02-14 04:12, Pádraig Brady wrote:

> I suspect it's a similar issue to the one that openzfs had:
> https://github.com/openzfs/zfs/issues/11900

Yes, quite likely.


> Given how closed / uncommunicative Apple are in general
> and specifically for this already reported bug,
> I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.

If there's an imminent release that's the only way to go. However, if we 
do that we should also disable fclonefileat since the latest edition of 
the patch isn't right either. It messes up the modes in some cases and 
there's a good reason we don't otherwise allow writing through dangling 
symlinks (see the length email discussions when that was put in) so that 
would have to be addressed too.

I'll see if I can come up with an improved version of the fclonefileat 
patch that would handle these issues.


> I've attached the 3 latest patches I'm considering in this area.
> I presume you're OK with your amended fclonefileat() improvement one?

Not yet, because it doesn't work correctly in some cases (e.g., 
x->explicit_no_preserve_mode) and it mishandles the dangling symlink cases.

The dangling symlink patch might be OK; depends on how fclonefileat 
patch ends up. It's OK to copy through a dangling symlink with 
fclonefileat only if we don't need to fix up permissions or timestamps 
afterwards (e.g., cp -p).

The disabling SEEK_HOLE on Apple is OK if a release is imminent. 
Otherwise I'd like to get to the bottom of it first. This can be done by 
having George use dtrace or (if that's not possible) adding debugging 
printfs.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 09:45:01 GMT) Full text and rfc822 format available.

Message #134 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Wed, 15 Feb 2023 09:43:51 +0000
On 14/02/2023 19:02, Paul Eggert wrote:
> On 2023-02-14 04:12, Pádraig Brady wrote:
> 
>> I suspect it's a similar issue to the one that openzfs had:
>> https://github.com/openzfs/zfs/issues/11900
> 
> Yes, quite likely.
> 
> 
>> Given how closed / uncommunicative Apple are in general
>> and specifically for this already reported bug,
>> I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.
> 
> If there's an imminent release that's the only way to go. However, if we
> do that we should also disable fclonefileat since the latest edition of
> the patch isn't right either. It messes up the modes in some cases and
> there's a good reason we don't otherwise allow writing through dangling
> symlinks (see the length email discussions when that was put in) so that
> would have to be addressed too.
> 
> I'll see if I can come up with an improved version of the fclonefileat
> patch that would handle these issues.
> 
> 
>> I've attached the 3 latest patches I'm considering in this area.
>> I presume you're OK with your amended fclonefileat() improvement one?
> 
> Not yet, because it doesn't work correctly in some cases (e.g.,
> x->explicit_no_preserve_mode) and it mishandles the dangling symlink cases.
> 
> The dangling symlink patch might be OK; depends on how fclonefileat
> patch ends up. It's OK to copy through a dangling symlink with
> fclonefileat only if we don't need to fix up permissions or timestamps
> afterwards (e.g., cp -p).
> 
> The disabling SEEK_HOLE on Apple is OK if a release is imminent.
> Otherwise I'd like to get to the bottom of it first. This can be done by
> having George use dtrace or (if that's not possible) adding debugging
> printfs.

Cool.
The release is 1 - 2 weeks away anyway.

thanks for looking at this Paul.

Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 10:57:02 GMT) Full text and rfc822 format available.

Message #137 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org, George Valkov <gvalkov <at> gmail.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Wed, 15 Feb 2023 02:56:35 -0800
[Message part 1 (text/plain, inline)]
Attached is an updated proposal for the fclonefileat patch.

CVE-2021-30995 confirmed my suspicion that coreutils 9.1 and the current 
bleeding-edge coreutils on Savannah both have an exploitable security 
bug on macOS. Although I hope this patch fixes the bug, it could use 
more pairs of eyes, given that Apple had so many problems fixing its own 
security bug involving fclonefileat, and given that the coreutils code 
has so many flags and conditions.
[0001-cp-fclonefileat-security-fix-CLONE_ACL-fixups.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 14:06:01 GMT) Full text and rfc822 format available.

Message #140 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>,
 Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Wed, 15 Feb 2023 16:05:22 +0200
[Message part 1 (text/plain, inline)]
> On 2023-02-14, at 2:12 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 14/02/2023 06:12, Paul Eggert wrote:
>> On 2023-02-13 09:16, George Valkov wrote:
>>> 1. We apply my original patch to disable sparse-copy on macOS. Otherwise since fclonefileat is not used whenever we overwrite a file, corruption will still occur.
>> I'm not entirely sold on this patch, because I still don't understand
>> what's happening. The original bug report in
>> <https://github.com/openwrt/openwrt/pull/11960> basically says only "it
>> doesn't work", and I'd like to know more.
>> Part of the reason I'm hesitating is that there are multiple ways of
>> interpreting what SEEK_HOLE and SEEK_DATA mean, and perhaps it's just
>> that Apple has come up with yet another way to interpret it, which we
>> need to know about.

I created a sparse copy sample that traces all calls. It relays only on the manpage for lseek. I should note that I am not familiar with the implementation used by coreutils, which mages it a good ground for independent research on the topic. The code is attached as d.c Here are my observations:
1. Given a non-sparse file A, it produces an exact copy B.
gcc d.c && ./a.out
src 3  dst 4
c 553  p 553  h 553  d 0
total bytes copied 553 / 553

a2d8f888d5c88f6eef83a3bf4ef2434a85a64792  A
a2d8f888d5c88f6eef83a3bf4ef2434a85a64792  B

2. Given cc1, it produces a corrupted copy cc1-sparse, that matches the checksum of the copy cc-ori created by coreutils/src/cp. So unless you can find a flaw in my implementation, we can safely assume that SEEK_DATA skips valid data blocks and hence it should not be used on macOS because it is broken:
./coreutils/src/cp cc1 cc1-ori
gcc d.c && ./a.out
src 3  dst 4
c 14053376  p 20344832  h 20344832  d 6291456
total bytes copied 14053376 / 27551296
e8eb21ec118ff3a8fce3ad85d5f9a72d47a257c6  cc1
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-ori
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-sparse
-rwxr-xr-x     1 g     staff  27551296 Feb 15 09:59 cc1*
-rwxr-xr-x     1 g     staff  27551296 Feb 15 14:29 cc1-ori*
-rwx------     1 g     staff  27551296 Feb 15 14:29 cc1-sparse*

Further research: it would be interesting to find how my sparse copy sample, as well as coreutils/src/cp handle custom crafted sparse files. My first idea would be a file with the same size as cc1 (27 551 296 bytes) with a couple of data blocks in the middle. My second idea is to read the entire cc1, then call ftruncate to set the size of the copy and write only those bytes that are not 0.


> I agree it would be good to know more.
> However given it works on HFS but fails on APFS suggests a file system specific issue,
> rather than an API mismatch issue (over what we're already catering for on macOS).
> I suspect it's a similar issue to the one that openzfs had:
> https://github.com/openzfs/zfs/issues/11900

According to Wikipedia:
> Apple's HFS+ does not provide support for sparse files, but in OS X, the virtual file system layer supports storing them in any supported file system, including HFS+.
https://en.wikipedia.org/wiki/Sparse_file

Hence the reason we don’t observe any issues on HFS+ is most likely that the files we try to copy from it are not sparse in first place.


> Given how closed / uncommunicative Apple are in general
> and specifically for this already reported bug,
> I'm inclined to disable SEEK_DATA on __APPLE__ for the upcoming release.
> 
> Also we've mitigated the impact somewhat by enabling fclonefileat() in more cases.

I agree. We can try to send them one last message with a link to this group, and be done with it if they don’t relay. I just found a feedback ticket from 2022-09-16, where they replayed asking me to test macOS 13 Beta 8. At that time Finder was still broken, and now it isn’t. So they might have at least partially addressed the issue.
https://feedbackassistant.apple.com/feedback/11522527


>> Another reason to hesitate is that GNU coreutils is not the only core
>> program that uses SEEK_HOLE and SEEK_DATA. If this really is a generic
>> Apple problem we need to know which Apple releases have it, so that we
>> can program around it at the Gnulib level instead of mucking about with
>> each individual program.

We need to start somewhere. Fixing cp and install will certainly be helpful to users, until you address the issue globally.
It might be possible to install VMs with various versions of macOS, though I suspect it will take time.

Pádraig, did your try cp on
build_dir/toolchain-arm_cortex-a9+vfpv3-d16_gcc-12.2.0_musl_eabi/gcc-12.2.0-initial/gcc/cc1
after building gcc in OpenWRT build root?
make toolchain/gcc/initial/{clean,compile} -j 16

Here is a step by step article how to setup the build environment:
https://httpstorm.com/share/.openwrt/help/OpenWRT%20build%20on%20macOS.htm


> I've attached the 3 latest patches I'm considering in this area.
> I presume you're OK with your amended fclonefileat() improvement one?

Thank you! Yes, I’m all for the performance benefits of using fclonefileat. I applied your patches on top of cf80f988eeb97cc3f8c65ae58e735d36f865277b:
Both rewriting an existing file and cloning to a new one produce a correct copies of a sparse source. Here are the test results:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-03-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt

I wrote a sparse copy sample
[d.c (application/octet-stream, attachment)]
[Message part 3 (text/plain, inline)]


Cheers!

Georgi Valkov
httpstorm.com
nano RTOS



Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 15:27:02 GMT) Full text and rfc822 format available.

Message #143 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Wed, 15 Feb 2023 17:26:21 +0200
> On 2023-02-15, at 12:56 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> The disabling SEEK_HOLE on Apple is OK if a release is imminent. Otherwise I'd like to get to the bottom of it first. This can be done by having George use dtrace or (if that's not possible) adding debugging printfs.

Hi Paul! Sure, send me some code, describe the tests you need and I will come back with the results. Since we are tracing regular programs and not system components, dtruss can be used. Here is a trace of the sparse copy sample I wrote earlier, when it tries to copy cc1 to cc1-sparse, resulting in a corrupted copy:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/dtruss-sparse-copy.txt


> Attached is an updated proposal for the fclonefileat patch.
> 
> CVE-2021-30995 confirmed my suspicion that coreutils 9.1 and the current bleeding-edge coreutils on Savannah both have an exploitable security bug on macOS. Although I hope this patch fixes the bug, it could use more pairs of eyes, given that Apple had so many problems fixing its own security bug involving fclonefileat, and given that the coreutils code has so many flags and conditions.
> <0001-cp-fclonefileat-security-fix-CLONE_ACL-fixups.patch>

I tested your patch: both overwrite existing and clone to new produce a working copy. Here are the test results:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt

In the case when a dangling symlink is involved and depending on the arguments passed to cp, would it be possible to use CLONE_NOFOLLOW with fclonefileat or fall-back to a normal copy? Does dangling refer to source or destination?


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 19:27:02 GMT) Full text and rfc822 format available.

Message #146 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>, Pádraig Brady
 <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Wed, 15 Feb 2023 11:26:36 -0800
[Message part 1 (text/plain, inline)]
On 2023-02-15 06:05, George Valkov wrote:
> gcc d.c && ./a.out
> src 3  dst 4
> c 14053376  p 20344832  h 20344832  d 6291456
> total bytes copied 14053376 / 27551296

Thanks, this is due to a known incompatibility in macOS lseek that 
coreutils is supposed to work around. See 
<https://www.gnu.org/software/gnulib/manual/html_node/lseek.html>, which 
says, "On some platforms, lseek (fd, offset, SEEK_DATA) returns a value 
greater than offset even when offset addresses data: macOS 12".

I guess that somehow, the way you're building coreutils defeats the 
workaround. If so, we'll need to change how coreutils is built in your 
environment, or fix coreutils 'configure' so that the workaround isn't 
defeated in your environment. Although in 
<https://bugs.gnu.org/61386#128> Pádraig was dubious about this guess, 
his reasoning that the bug is likely specific to APFS rather than an API 
mismatch could be wrong, as I think HFS doesn't support SEEK_DATA at all 
or reports trivial answers, so coreutils is not likely to run into the 
problem on HFS even if the bug is an API issue.

Here are some things we can do to test this guess.

1. Please try the attached program e.c in place of your d.c program. e.c 
is like d.c, except it attempts to use the coreutils workaround. What 
symptoms do you observe? If e.c works then it's almost surely a problem 
in how coreutils is built (compiler options or whatnot), not in the 
coreutils workaround. If e.c does not work it's likely that the Gnulib 
workaround does not suffice on your macOS platform, in which case we 
need to improve the workaround by hacking further on e.c and porting the 
result back to Gnulib. (There are other possibilities.)

2. Please verify that coreutils cp is using the Gnulib workaround. In 
the src directory, the shell command "nm -o *.o | grep lseek" should 
output only lines containing "rpl_lseek"; there shouldn't be any lines 
saying just "lseek". Also, please run the command "objdump -d 
lib/libcoreutils_a-lseek.o" and verify that the replacement lseek is 
actually doing something nontrivial (you should get maybe three dozen 
lines of assembly language; if it's much less then this is the problem).

3. Please confirm that _DARWIN_C_SOURCE is defined to 1 in lib/config.h.

4. What is the output of the following commands, in the coreutils build 
directory?

rm lib/libcoreutils_a-lseek.o
make V=1 lib/libcoreutils_a-lseek.o
gcc -E -Ilib lib/lseek.c
[e.c (text/x-csrc, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 19:49:02 GMT) Full text and rfc822 format available.

Message #149 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Wed, 15 Feb 2023 11:48:32 -0800
On 2023-02-15 07:26, George Valkov wrote:
> I tested your patch: both overwrite existing and clone to new produce a working copy. Here are the test results:
> https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt

I see some test failures there, involving cp. Do you get the same set of 
test failures without the patch?

> In the case when a dangling symlink is involved and depending on the arguments passed to cp, would it be possible to use CLONE_NOFOLLOW with fclonefileat or fall-back to a normal copy? Does dangling refer to source or destination?

Dangling refers to the destination. The latest proposed patch does use 
CLONE_NOFOLLOW, which means fclonefileat should not follow symlinks to 
the destination. (This behavior is not documented, unfortunately, but 
it's the only behavior that makes sense.)




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 22:42:02 GMT) Full text and rfc822 format available.

Message #152 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 16 Feb 2023 00:40:49 +0200
[Message part 1 (text/plain, inline)]
> On 2023-02-15, at 9:26 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2023-02-15 06:05, George Valkov wrote:
>> gcc d.c && ./a.out
>> src 3  dst 4
>> c 14053376  p 20344832  h 20344832  d 6291456
>> total bytes copied 14053376 / 27551296
> 
> Thanks, this is due to a known incompatibility in macOS lseek that coreutils is supposed to work around. See <https://www.gnu.org/software/gnulib/manual/html_node/lseek.html>, which says, "On some platforms, lseek (fd, offset, SEEK_DATA) returns a value greater than offset even when offset addresses data: macOS 12".

Hi Paul! There are two possible solutions here:
1. Do not use lseek(SEEK_DATA) on macOS.
2. Update the cached list of valid sectors which is used by lseek. See §§§ below. There could be some API to achieve this. A workaround is to fclonefileat to a temporary file and then remove it. Internally fclonefileat updates that cache. We might be able to find how it’s done.

manpage
If whence is SEEK_DATA, the offset is set to the start of the next non-hole file region greater than or equal to the supplied offset.

I believe lseek operates correctly, however the underlaying cache it uses is not current after certain conditions occur.


> I guess that somehow, the way you're building coreutils defeats the workaround. If so, we'll need to change how coreutils is built in your environment, or fix coreutils 'configure' so that the workaround isn't defeated in your environment.

Here are the build commands I use:
git clone https://github.com/coreutils/coreutils.git

cd
 coreutils
git submodule foreach git pull origin master
./bootstrap
./configure
make -j 16

Here is a tarball of coreutils after building:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/coreutils-cf80f988eeb97cc3f8c65ae58e735d36f865277b.tgz


> Although in <https://bugs.gnu.org/61386#128> Pádraig was dubious about this guess, his reasoning that the bug is likely specific to APFS rather than an API mismatch could be wrong, as I think HFS doesn't support SEEK_DATA at all or reports trivial answers, so coreutils is not likely to run into the problem on HFS even if the bug is an API issue.



> Here are some things we can do to test this guess.
> 
> 1. Please try the attached program e.c in place of your d.c program. e.c is like d.c, except it attempts to use the coreutils workaround. What symptoms do you observe? If e.c works then it's almost surely a problem in how coreutils is built (compiler options or whatnot), not in the coreutils workaround. If e.c does not work it's likely that the Gnulib workaround does not suffice on your macOS platform, in which case we need to improve the workaround by hacking further on e.c and porting the result back to Gnulib. (There are other possibilities.)

It produces the same corrupted copy as the original sample:
gcc e.c && ./a.out 
__APPLE__ 1  __MACH__ 1  SEEK_DATA 4
src 3  dst 4
c 14053376  p 20344832  h 20344832  d 6291456
total bytes copied 14053376 / 27551296

In hex
c d67000  p 1367000  h 1367000  d 600000
total bytes copied d67000 / 1a46640

419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-sparse

The after I use clone to update the cached list of valid sectors for cc1
./coreutils-2023-02-15/src/cp cc1 cc1-clone

It returns a bunch of errors, but produces a valid copy. There is no point in using this workaround on macOS, it does not offer any improvements. I would recommend making sure that lseek sees an updated list of valid sectors.
gcc e.c && ./a.out 
__APPLE__ 1  __MACH__ 1  SEEK_DATA 4
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
lseek failed  ENXIO 6  EBADF 9  EINVAL 22  EOVERFLOW 84  ESPIPE 29
  p 1a46640
  d 1a46640  6 Device not configured
  h ffffffffffffffff  6 Device not configured
  a 1a46640  6 Device not configured
  b 1a46640  6 Device not configured
total bytes copied 1a45640 / 1a46640

419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1
419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1-clone
419494dd4527ebb22c7bf2b388316a4beb5c73d2  cc1-sparse


> 2. Please verify that coreutils cp is using the Gnulib workaround. In the src directory, the shell command "nm -o *.o | grep lseek" should output only lines containing "rpl_lseek"; there shouldn't be any lines saying just "lseek". Also, please run the command "objdump -d lib/libcoreutils_a-lseek.o" and verify that the replacement lseek is actually doing something nontrivial (you should get maybe three dozen lines of assembly language; if it's much less then this is the problem).

nm -o *.o | grep lseek
cat.o:                  U _rpl_lseek
copy.o:                  U _rpl_lseek
dd.o:                  U _rpl_lseek
head.o:                  U _rpl_lseek
selinux.o: no symbols
shred.o:                  U _rpl_lseek
shuf.o:                  U _rpl_lseek
split.o:                  U _rpl_lseek
tac.o:                  U _rpl_lseek
tail.o:                  U _rpl_lseek
tail.o: 0000000000002d80 t _xlseek
truncate.o:                  U _rpl_lseek
wc.o:                  U _rpl_lseek


objdump -d lib/libcoreutils_a-lseek.o

lib/libcoreutils_a-lseek.o:	file format mach-o 64-bit x86-64

Disassembly of section __TEXT,__text:

0000000000000000 <_rpl_lseek>:
       0: 55                           	pushq	%rbp
       1: 48 89 e5                     	movq	%rsp, %rbp
       4: 41 57                        	pushq	%r15
       6: 41 56                        	pushq	%r14
       8: 53                           	pushq	%rbx
       9: 50                           	pushq	%rax
       a: 49 89 f7                     	movq	%rsi, %r15
       d: 41 89 fe                     	movl	%edi, %r14d
      10: 83 fa 04                     	cmpl	$4, %edx
      13: 75 23                        	jne	0x38 <_rpl_lseek+0x38>
      15: 44 89 f7                     	movl	%r14d, %edi
      18: 4c 89 fe                     	movq	%r15, %rsi
      1b: ba 03 00 00 00               	movl	$3, %edx
      20: e8 00 00 00 00               	callq	0x25 <_rpl_lseek+0x25>
      25: 48 89 c3                     	movq	%rax, %rbx
      28: 48 85 c0                     	testq	%rax, %rax
      2b: 78 20                        	js	0x4d <_rpl_lseek+0x4d>
      2d: 31 d2                        	xorl	%edx, %edx
      2f: 4c 39 fb                     	cmpq	%r15, %rbx
      32: 0f 94 c2                     	sete	%dl
      35: c1 e2 02                     	shll	$2, %edx
      38: 44 89 f7                     	movl	%r14d, %edi
      3b: 4c 89 fe                     	movq	%r15, %rsi
      3e: 48 83 c4 08                  	addq	$8, %rsp
      42: 5b                           	popq	%rbx
      43: 41 5e                        	popq	%r14
      45: 41 5f                        	popq	%r15
      47: 5d                           	popq	%rbp
      48: e9 00 00 00 00               	jmp	0x4d <_rpl_lseek+0x4d>
      4d: e8 00 00 00 00               	callq	0x52 <_rpl_lseek+0x52>
      52: 83 38 06                     	cmpl	$6, (%rax)
      55: 49 0f 44 df                  	cmoveq	%r15, %rbx
      59: 48 89 d8                     	movq	%rbx, %rax
      5c: 48 83 c4 08                  	addq	$8, %rsp
      60: 5b                           	popq	%rbx
      61: 41 5e                        	popq	%r14
      63: 41 5f                        	popq	%r15
      65: 5d                           	popq	%rbp
      66: c3                           	retq


> 3. Please confirm that _DARWIN_C_SOURCE is defined to 1 in lib/config.h.

Yes
Lib/config.h : line 3640
/* Enable general extensions on macOS.  */
#ifndef _DARWIN_C_SOURCE
# define _DARWIN_C_SOURCE 1
#endif


> 4. What is the output of the following commands, in the coreutils build directory?
> 
> rm lib/libcoreutils_a-lseek.o
> make V=1 lib/libcoreutils_a-lseek.o
> gcc -E -Ilib lib/lseek.c<e.c>

rm lib/libcoreutils_a-lseek.o
make V=1 lib/libcoreutils_a-lseek.o
gcc  -I. -I./lib  -Ilib -I./lib -Isrc -I./src -I/usr/local/include   -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -g -O2 -MT lib/libcoreutils_a-lseek.o -MD -MP -MF lib/.deps/libcoreutils_a-lseek.Tpo -c -o lib/libcoreutils_a-lseek.o `test -f 'lib/lseek.c' || echo './'`lib/lseek.c
mv -f lib/.deps/libcoreutils_a-lseek.Tpo lib/.deps/libcoreutils_a-lseek.Po
gcc -E -Ilib lib/lseek.c < e.c > epp.txt
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/epp.txt


§§§
I updated d.c to print in hex and use O_TRUNC on dst. I wrote another sample to read the entire source and write non-blank sectors creating a sparse file.

Copy cc1 to B
gcc f.c && ./a.out 
src 3  dst 4
00001000 skip
total bytes copied 1a45640 / 1a46640 1a46640

Copy B to cc1-sparse
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
total bytes copied 1a45640 / 1a46640

c6daa8af9a75ab24e03252e63c6a855979f04095  cc1
c6daa8af9a75ab24e03252e63c6a855979f04095  B
c6daa8af9a75ab24e03252e63c6a855979f04095  cc1-sparse

I also tried playing with smaller values for BLOCK_SIZE down to 1. The output from d.c will always be the same because cc1 has only one sector that is a hole 0x1000-0x2000. The resulting sparse file B is then copied correctly by d.c, which suggests that lseek(SEEK_DATA) operates correctly when the file-system is in a good state, however it is likely that it returns cached data that could become invalid under certain conditions. If I unmount the sparse disk image and mount it again or use fclonefileat on cc1, the cache is updated and lseek returns good results. It would be interesting to trace the creation of cc1 during gcc/initial/compile.

Copy cc1 to cc1-sparse before the cached list of valid sectors is updated
src 3  dst 4
c d67000  p 1367000  h 1367000  d 600000
total bytes copied d67000 / 1a46640
7b447132f56f3b549ef62a4780945e82d26f7d2c  cc1-sparse

Update the cached list of valid sectors for cc1
./coreutils-2023-02-15/src/cp cc1 cc1-clone

Copy cc1 to cc1-sparse after the cached list of valid sectors is updated
gcc d.c && ./a.out 
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
total bytes copied 1a45640 / 1a46640
c6daa8af9a75ab24e03252e63c6a855979f04095  cc1-sparse

For a normal file, the list of valid sectors is up to date
gcc d.c && ./a.out 
src 3  dst 4
c 229  p 229  h 229  d 0
total bytes copied 229 / 229

The sector size on the internal NVME disk is 4 KB (0x1000).


[d.c (application/octet-stream, attachment)]
[e.c (application/octet-stream, attachment)]
[f.c (application/octet-stream, attachment)]
[Message part 5 (text/plain, inline)]

Georgi Valkov
httpstorm.com
nano RTOS


Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 15 Feb 2023 22:54:02 GMT) Full text and rfc822 format available.

Message #155 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 16 Feb 2023 00:53:30 +0200
> On 2023-02-15, at 9:48 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2023-02-15 07:26, George Valkov wrote:
>> I tested your patch: both overwrite existing and clone to new produce a working copy. Here are the test results:
>> https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt
> 
> I see some test failures there, involving cp. Do you get the same set of test failures without the patch?

Yes. Note that Pádraig disabled two tests which fail after the patch:
tests/cp/thru-dangling.sh, tests/seek-data-capable (the actual test that failed was related to sparse, I think sparse-extents.sh)

https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-01-cf80f988eeb97cc3f8c65ae58e735d36f865277b-ori.txt
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-04-cf80f988eeb97cc3f8c65ae58e735d36f865277b-clone.txt


>> In the case when a dangling symlink is involved and depending on the arguments passed to cp, would it be possible to use CLONE_NOFOLLOW with fclonefileat or fall-back to a normal copy? Does dangling refer to source or destination?
> 
> Dangling refers to the destination. The latest proposed patch does use CLONE_NOFOLLOW, which means fclonefileat should not follow symlinks to the destination. (This behavior is not documented, unfortunately, but it's the only behavior that makes sense.)

Ok, thank you for clarifying!
Good luck!


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 16 Feb 2023 19:42:02 GMT) Full text and rfc822 format available.

Message #158 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>,
 Pádraig Brady <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 16 Feb 2023 21:41:41 +0200
[Message part 1 (text/plain, inline)]
Gentlemen, I found a solution, as well as the reason why lseek(SEEK_DATA) returns invalid data.

I found a way to craft a file that gets corrupted when we try to make a sparse copy. It’s based on mmap:
1. Open src and dst.
2. Obtain the size of src, and truncate dst to the same size.
3. Use mmap to map dst to RAM.
4. Read src and memcpy all non-blank pages to dst.
5. Use munmap and then close src and dst.
6. If src contained any blank pages, dst is now sparse.
7. Since we did not call msync, the changes are still cached in RAM and lseek(SEEK_DATA) is not aware that some of the pages contain valid data.

Workaround:
1. open(dst, O_RDONLY)
2. Obtain the file size = lseek(dst, 0, SEEK_END)
3. Use mmap to map dst to RAM.
4. Call msync.
5. Use munmap and then close dst.
6. The memory view of dst is now synchronised with the underlaying file-system.
7. lseek(SEEK_DATA) returns valid data.

Proof of Concept:
1. Start by extracting cc1
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/cc1.tgz

2. The extracted file is not sparse, we need to craft a sparse copy cc1-mmap without synchronising the filesystem with the memory view
gcc m.c && ./a.out
src 3  dst 4
00001000 skip
total bytes copied 1a45640 / 1a46640 1a46640

3. Now sparse copy cc1-mmap to cc1-sparse
gcc d.c && ./a.out
src 3  dst 4
c a46640  p 1a46640  h 1a46640  d 1000000
total bytes copied a46640 / 1a46640

4. cc1-sparse is corrupted
sha1sum cc1*
16d835378ab973a114082a585cc76958bdbccec0  cc1
16d835378ab973a114082a585cc76958bdbccec0  cc1-mmap
75e6f6cb303cb5d3909c6d7830c417fc6ed658c3  cc1-sparse

5. Use msync to update the filesystem
gcc n.c && ./a.out
dst 3  MS_ASYNC 1  MS_INVALIDATE 2  MS_SYNC 16
size bytes 1a46640  msync ok

6. Make sparse copy again cc1-mmap to cc1-sparse
gcc d.c && ./a.out
src 3  dst 4
c 1000  p 1000  h 1000  d 0
c 1a44640  p 1a46640  h 1a46640  d 2000
total bytes copied 1a45640 / 1a46640

7. cc1-sparse is good
sha1sum cc1*
16d835378ab973a114082a585cc76958bdbccec0  cc1
16d835378ab973a114082a585cc76958bdbccec0  cc1-mmap
16d835378ab973a114082a585cc76958bdbccec0  cc1-sparse


PoC samples
[d.c (application/octet-stream, attachment)]
[m.c (application/octet-stream, attachment)]
[n.c (application/octet-stream, attachment)]
[Message part 5 (text/plain, inline)]


Cheers!

Georgi Valkov
httpstorm.com
nano RTOS


Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 16 Feb 2023 21:33:01 GMT) Full text and rfc822 format available.

Message #161 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, George Valkov <gvalkov <at> gmail.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 16 Feb 2023 21:32:01 +0000
On 15/02/2023 10:56, Paul Eggert wrote:
> Attached is an updated proposal for the fclonefileat patch.
> 
> CVE-2021-30995 confirmed my suspicion that coreutils 9.1 and the current
> bleeding-edge coreutils on Savannah both have an exploitable security
> bug on macOS. Although I hope this patch fixes the bug, it could use
> more pairs of eyes, given that Apple had so many problems fixing its own
> security bug involving fclonefileat, and given that the coreutils code
> has so many flags and conditions.

The logic looks good to me, and all tests pass here on macOS 13.2

I see the added `if (! (cloned_mode & ~desired_mode))` restriction I presume
for security reasons, so that we only _add_ permission bits to those copied by fclonefileat(),
and never create an overly accessible file and then remove permission bits.
That wasn't obvious to me at least so I'd suggest adding comments from the diff below.

For the record this restriction will mean we can't reflink in some cases.
For example the following would fail with ENOSUP on macOS:

  touch file;
  chmod g+w file
  umask 0022
  cp --reflink=always file file.copy

Using -p in the above cp command would work though.

thanks,
Pádraig.

diff --git a/src/copy.c b/src/copy.c
index c5f2cc186..7c866e4fb 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -1284,6 +1284,7 @@ copy_reg (char const *src_name, char const *dst_name,
                : x->set_mode ? x->mode
                : ((x->explicit_no_preserve_mode ? MODE_RW_UGO : dst_mode)
                   & ~ cached_umask ()));
+          /* If fclonefileat() would not set undesired extra permissions.  */
           if (! (cloned_mode & ~desired_mode))
             {
               int fc_flags
@@ -1318,7 +1319,9 @@ copy_reg (char const *src_name, char const *dst_name,
                         }
                     }

+                  /* any desired permissions that weren't cloned.  */
                   extra_permissions = desired_mode & ~cloned_mode;
+
                   if (!extra_permissions
                       && (!x->preserve_mode || (fc_flags & CLONE_ACL)
                           || !fd_has_acl (source_desc)))





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 16 Feb 2023 23:42:01 GMT) Full text and rfc822 format available.

Message #164 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org, George Valkov <gvalkov <at> gmail.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 16 Feb 2023 15:41:23 -0800
On 2/16/23 13:32, Pádraig Brady wrote:
> +                  /* any desired permissions that weren't cloned.  */
>                     extra_permissions = desired_mode & ~cloned_mode;

That comment is a tad redundant, but I did add some commentary along 
those lines and installed the patch into Savannah master.

I haven't had time to look into the SEEK_HOLE stuff but hope to find time.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 19 Feb 2023 06:10:01 GMT) Full text and rfc822 format available.

Message #167 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>
Cc: Pádraig Brady <P <at> draigBrady.com>, 61386 <at> debbugs.gnu.org,
 Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sun, 19 Feb 2023 00:08:53 -0600
[Message part 1 (text/plain, inline)]
George, given what you've written I suppose we should give up the idea 
of copying sparse files efficiently on macOS (and on FreeBSD 
13.0-RELEASE, as it has a similar bug with SEEK_HOLE and SEEK_DATA), in 
cases where fclonefileat does not work.

Please try the attached Gnulib patch; it uses your idea of disabling 
SEEK_HOLE and SEEK_DATA on macOS, except on steroids because it disables 
these two macros everywhere, for all Gnulib-using applications, and it 
also disables them on FreeBSD < 14. For coreutils you need only this 
patch's change to lib/unistd.in.h. If this works for you I suppose we 
can install it into Gnulib and propagate that into coreutils etc.

Also, you reported the bug against macOS 12.6. Can you check whether the 
same bug occurs on macOS 13? If it's still there I suppose the attached 
patch will need to be updated since it guesses the bug is fixed in macOS 13.

I'll cc this email to bug-gnulib to give them a heads-up. (Gnulib 
readers can see <https://bugs.gnu.org/61386> for context.)

Really, Apple should fix this serious data corruption bug in APFS and macOS.
[0001-lseek-avoid-SEEK_HOLE-bugs-in-FreeBSD-macOS.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sun, 19 Feb 2023 16:32:01 GMT) Full text and rfc822 format available.

Message #170 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sun, 19 Feb 2023 16:31:37 +0000
On 19/02/2023 06:08, Paul Eggert wrote:
> George, given what you've written I suppose we should give up the idea
> of copying sparse files efficiently on macOS (and on FreeBSD
> 13.0-RELEASE, as it has a similar bug with SEEK_HOLE and SEEK_DATA), in
> cases where fclonefileat does not work.
> 
> Please try the attached Gnulib patch; it uses your idea of disabling
> SEEK_HOLE and SEEK_DATA on macOS, except on steroids because it disables
> these two macros everywhere, for all Gnulib-using applications, and it
> also disables them on FreeBSD < 14. For coreutils you need only this
> patch's change to lib/unistd.in.h. If this works for you I suppose we
> can install it into Gnulib and propagate that into coreutils etc.
> 
> Also, you reported the bug against macOS 12.6. Can you check whether the
> same bug occurs on macOS 13? If it's still there I suppose the attached
> patch will need to be updated since it guesses the bug is fixed in macOS 13.
> 
> I'll cc this email to bug-gnulib to give them a heads-up. (Gnulib
> readers can see <https://bugs.gnu.org/61386> for context.)
> 
> Really, Apple should fix this serious data corruption bug in APFS and macOS.

Yes I concur.
I'll adjust the coreutils tests to avoid
using SEEK_DATA if not available in a more dynamic manner.

thank you,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 20 Feb 2023 10:22:01 GMT) Full text and rfc822 format available.

Message #173 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Pádraig Brady <P <at> draigBrady.com>, 61386 <at> debbugs.gnu.org,
 Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 20 Feb 2023 12:21:27 +0200
[Message part 1 (text/plain, inline)]
> On 2023-02-19, at 8:08 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> George, given what you've written I suppose we should give up the idea of copying sparse files efficiently on macOS (and on FreeBSD 13.0-RELEASE, as it has a similar bug with SEEK_HOLE and SEEK_DATA), in cases where fclonefileat does not work.

I agree, Paul. I just ran some benchmarks: the time it takes to call msync unconditionally hurts the performance: 502 us if the memory view is already synced seems fine, however 28177-36585 us whenever a sync is required is more than the time it takes to performing a full-copy 7042-21528 us.


> Please try the attached Gnulib patch; it uses your idea of disabling SEEK_HOLE and SEEK_DATA on macOS, except on steroids because it disables these two macros everywhere, for all Gnulib-using applications, and it also disables them on FreeBSD < 14. For coreutils you need only this patch's change to lib/unistd.in.h. If this works for you I suppose we can install it into Gnulib and propagate that into coreutils etc.

What is the correct way to apply your patch? I tried
git apply 0001-lseek-avoid-SEEK_HOLE-bugs-in-FreeBSD-macOS.patch 
error: patch failed: ChangeLog:1
error: ChangeLog: patch does not apply
error: doc/posix-functions/lseek.texi: No such file or directory
error: lib/unistd.in.h: wrong type

Then I made the changes to lib/unistd.in.h manually, and ran the tests:
* before: corrupted
* after: fixed

Regarding FreeBSD, I have a FreeBSD 13.1-RELEASE VM running on Hyper-V, so I compiled and ran my samples: m.c, d.c: the copy was correct, however d.c reports that there are no segments to skip, which means that the file created by m.c using mmap is not sparse even though it should contain one blank segment and be sparse. Next I installed a fresh copy of FreeBSD 13.0-RELEASE and observed the same behaviour. Both installations use ZFS and the tests were ran as root.
If you know what conditions are required to reproduce the issue on FreeBSD, let me know and I will test it for you. Otherwise I am not sure if we need to make any changes to FreeBSD, since I did not observe anything wrong there.
Is it ok if I continue testing on FreeBSD 13.1, because the 13.0 I just installed lacks any configuration and is quite uncomfortable to use. It also rejects my password over SSH, so I’m forced to use the console. I can test pretty much anything that runs on Hyper-V.


> Also, you reported the bug against macOS 12.6. Can you check whether the same bug occurs on macOS 13? If it's still there I suppose the attached patch will need to be updated since it guesses the bug is fixed in macOS 13.

Yes it does. I just ran my samples under macOS Internet recovery, which reports as Ventura with kernel compiled on the 30th of January 2023. See the attached log:
[ventura-internet-recovery.txt (text/plain, attachment)]
[Message part 3 (text/plain, inline)]


> I'll cc this email to bug-gnulib to give them a heads-up. (Gnulib readers can see <https://bugs.gnu.org/61386> for context.)

Thanks, I’ll check it.


> Really, Apple should fix this serious data corruption bug in APFS and macOS.
> <0001-lseek-avoid-SEEK_HOLE-bugs-in-FreeBSD-macOS.patch>

I agree, I opened a new ticket with Apple on 2023-02-16. Both via e-mail, and https://security.apple.com/reports/OE1928608366012

This was at the same time I reported about mmap and msync here. I got this replay on the following day:
> Thanks for the additional information. We are reviewing it and will follow up if we have any questions or need additional details.

Feel free to write them at product-security <at> apple.com and include this line
OE0928602070811 - please include this ID in replies to this thread.

Introduce your self as part of GNU/coreutils and let them know it has been 18 months since the issue was reported privately to them, so we’ve gone public now. CC me <same as gmail> at abv dot bg.


Georgi Valkov
httpstorm.com
nano RTOS


Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 20 Feb 2023 15:04:02 GMT) Full text and rfc822 format available.

Message #176 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Pádraig Brady <P <at> draigBrady.com>, 61386 <at> debbugs.gnu.org,
 Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 20 Feb 2023 17:02:58 +0200
Hi Paul, the following tests fail after your patch:
FAIL: tests/split/filter.sh
FAIL: tests/split/b-chunk.sh
FAIL: tests/split/l-chunk.sh
FAIL: tests/cp/sparse-perf.sh

Before patch
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-05-95f4ee0577dd836de523f46999777fbbbe9d2772-ori.txt

After patch
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-05-95f4ee0577dd836de523f46999777fbbbe9d2772-patch.txt


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 20 Feb 2023 17:50:01 GMT) Full text and rfc822 format available.

Message #179 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 20 Feb 2023 17:49:19 +0000
On 20/02/2023 15:02, George Valkov wrote:
> Hi Paul, the following tests fail after your patch:
> FAIL: tests/split/filter.sh
> FAIL: tests/split/b-chunk.sh
> FAIL: tests/split/l-chunk.sh

These look unrelated and may be due to some other change in your environment.
Specifically lseek() is failing on /dev/null and returning:
split: /dev/null: cannot determine file size

> FAIL: tests/cp/sparse-perf.sh

As expected.
I've a fix for this locally
which leverages another patch to determine
dynamically if we support SEEK_HOLE.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 20 Feb 2023 19:36:01 GMT) Full text and rfc822 format available.

Message #182 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Gnulib bugs <bug-gnulib <at> gnu.org>, Paul Eggert <eggert <at> cs.ucla.edu>,
 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 20 Feb 2023 21:35:08 +0200
> On 2023-02-20, at 7:49 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 20/02/2023 15:02, George Valkov wrote:
>> Hi Paul, the following tests fail after your patch:
>> FAIL: tests/split/filter.sh
>> FAIL: tests/split/b-chunk.sh
>> FAIL: tests/split/l-chunk.sh
> 
> These look unrelated and may be due to some other change in your environment.
> Specifically lseek() is failing on /dev/null and returning:
> split: /dev/null: cannot determine file size

I deleted the lines that were introduced by the patch in unistd.in.h, then make clean; make -j 16 and ran all tests: back to 5 failed. Then I restored the deleted lines, rebuild and I got 9 failed tests again.


>> FAIL: tests/cp/sparse-perf.sh
> 
> As expected.
> I've a fix for this locally
> which leverages another patch to determine
> dynamically if we support SEEK_HOLE.

All right! Cheers mate!


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 20 Feb 2023 21:15:01 GMT) Full text and rfc822 format available.

Message #185 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: 61386 <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 20 Feb 2023 21:14:36 +0000
On 20/02/2023 19:35, George Valkov wrote:
> 
>> On 2023-02-20, at 7:49 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>
>> On 20/02/2023 15:02, George Valkov wrote:
>>> Hi Paul, the following tests fail after your patch:
>>> FAIL: tests/split/filter.sh
>>> FAIL: tests/split/b-chunk.sh
>>> FAIL: tests/split/l-chunk.sh
>>
>> These look unrelated and may be due to some other change in your environment.
>> Specifically lseek() is failing on /dev/null and returning:
>> split: /dev/null: cannot determine file size
> 
> I deleted the lines that were introduced by the patch in unistd.in.h, then make clean; make -j 16 and ran all tests: back to 5 failed. Then I restored the deleted lines, rebuild and I got 9 failed tests again.

Oh right sorry.
I think I see what's happening.
Since https://github.com/coreutils/gnulib/commit/4db8db34
gnulib has been unconditionally replacing lseek() on macos.
Now with SEEK_DATA undefined that replaced lseek()
will run the code intended for BeOS.
So either the lseek.m4 or lseek.c needs adjusting accordingly.

cheers,
Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 20 Feb 2023 21:26:01 GMT) Full text and rfc822 format available.

Message #188 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Gnulib bugs <bug-gnulib <at> gnu.org>, Paul Eggert <eggert <at> CS.UCLA.EDU>,
 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 20 Feb 2023 23:25:26 +0200


> On 2023-02-20, at 11:14 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 20/02/2023 19:35, George Valkov wrote:
>>> On 2023-02-20, at 7:49 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
>>> 
>>> On 20/02/2023 15:02, George Valkov wrote:
>>>> Hi Paul, the following tests fail after your patch:
>>>> FAIL: tests/split/filter.sh
>>>> FAIL: tests/split/b-chunk.sh
>>>> FAIL: tests/split/l-chunk.sh
>>> 
>>> These look unrelated and may be due to some other change in your environment.
>>> Specifically lseek() is failing on /dev/null and returning:
>>> split: /dev/null: cannot determine file size
>> I deleted the lines that were introduced by the patch in unistd.in.h, then make clean; make -j 16 and ran all tests: back to 5 failed. Then I restored the deleted lines, rebuild and I got 9 failed tests again.
> 
> Oh right sorry.
> I think I see what's happening.
> Since https://github.com/coreutils/gnulib/commit/4db8db34
> gnulib has been unconditionally replacing lseek() on macos.
> Now with SEEK_DATA undefined that replaced lseek()
> will run the code intended for BeOS.
> So either the lseek.m4 or lseek.c needs adjusting accordingly.

As I reported in one of my previous comments, there is no need for that hack in lseek.c. Either use the original lseek from macOS and make sure it doesn't return cached data, or disable sparse support until it gets fixed by Apple. The more people report it to them the higher the chance for them to take any action.
Let me know when you fix it.

Good luck!


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Wed, 22 Feb 2023 14:00:02 GMT) Full text and rfc822 format available.

Message #191 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>,
 Paul Eggert <eggert <at> CS.UCLA.EDU>
Cc: 61386 <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Wed, 22 Feb 2023 15:59:38 +0200
I received an update from Apple
> We reproduced the issue and are investigating.
> Our engineers are investigating the root cause of the issue you reported. If we need more information from you, we’ll add a comment and send you an email.


Georgi Valkov
httpstorm.com
nano RTOS




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Thu, 23 Feb 2023 22:24:01 GMT) Full text and rfc822 format available.

Message #194 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Thu, 23 Feb 2023 14:23:29 -0800
[Message part 1 (text/plain, inline)]
On 2/20/23 13:14, Pádraig Brady wrote:
> Since https://github.com/coreutils/gnulib/commit/4db8db34
> gnulib has been unconditionally replacing lseek() on macos.
> Now with SEEK_DATA undefined that replaced lseek()
> will run the code intended for BeOS.
> So either the lseek.m4 or lseek.c needs adjusting accordingly.

Good catch, thanks. I updated the Gnulib patch accordingly; see attached.

On 2/20/23 02:21, George Valkov wrote:
> What is the correct way to apply your patch?

Sounds like patching is a bit of a hassle, so to make things easier I 
installed the attached patch into Gnulib, and propagated this into 
Coreutils.

You should be able to get the latest version of Coreutils from 
savannah.gnu.org, and then run './bootstrap; ./configure; make' if you 
have suitable development tools like Autoconf. You can run ./bootstrap 
on a GNU/Linux platform and then copy the directory to macOS and run 
'./configure; make' on macOS. Please give it a try.

> If you know what conditions are required to reproduce the issue on 
FreeBSD,

I don't. I'm relying on FreeBSD bug report 256205. I cited that in the 
attached patch.

> Is it ok if I continue testing on FreeBSD 13.1

Sure, that's fine. Possibly the bug is fixed in FreeBSD 13.1; if so, 
perhaps we can improve performance on 13.1 by changing "__FreeBSD__ < 
14" to something else.

On 2/20/23 13:25, George Valkov wrote:
> there is no need for that hack in lseek.c.

Yes, we don't need any new hack in lseek.c. And strictly speaking, even 
the existing macOS-specific hack in lseek.c (look for __APPLE__) is no 
longer needed, since (with the changes I just installed) lseek.c is no 
longer compiled for macOS.

However, assuming Apple fixes the macOS bug in (say) macOS 14, so that 
we change the "99990000" to "140000" in Gnulib's lib/unistd.in.h and 
m4/lseek.m4, we'll likely need that hack back because it works around an 
incompatibility between GNU-or-FreeBSD-or-Solaris SEEK_DATA and macOS 
SEEK_DATA; see 
<https://www.gnu.org/software/gnulib/manual/html_node/lseek.html>. So I 
thought it nicer to leave it in for now; it has zero runtime cost on all 
platforms.


[0001-lseek-avoid-SEEK_HOLE-bugs-in-FreeBSD-macOS.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 24 Feb 2023 00:52:01 GMT) Full text and rfc822 format available.

Message #197 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 24 Feb 2023 00:51:08 +0000
On 23/02/2023 22:23, Paul Eggert wrote:
> On 2/20/23 13:14, Pádraig Brady wrote:
>> Since https://github.com/coreutils/gnulib/commit/4db8db34
>> gnulib has been unconditionally replacing lseek() on macos.
>> Now with SEEK_DATA undefined that replaced lseek()
>> will run the code intended for BeOS.
>> So either the lseek.m4 or lseek.c needs adjusting accordingly.
> 
> Good catch, thanks. I updated the Gnulib patch accordingly; see attached.

Cool.

> On 2/20/23 02:21, George Valkov wrote:
>   > What is the correct way to apply your patch?
> 
> Sounds like patching is a bit of a hassle, so to make things easier I
> installed the attached patch into Gnulib, and propagated this into
> Coreutils.

The latest coreutils that should include all fixes for this issue is at
https://www.pixelbeat.org/cu/coreutils-9.1.160-5c8c2.tar.gz
That should be buildable with the standard ./configure && make combo

thanks!,
Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 24 Feb 2023 14:34:02 GMT) Full text and rfc822 format available.

Message #200 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Gnulib bugs <bug-gnulib <at> gnu.org>, 61386 <at> debbugs.gnu.org,
 Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 24 Feb 2023 16:33:20 +0200
> On 2023-02-24, at 12:23 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 2/20/23 13:14, Pádraig Brady wrote:
>> Since https://github.com/coreutils/gnulib/commit/4db8db34
>> gnulib has been unconditionally replacing lseek() on macos.
>> Now with SEEK_DATA undefined that replaced lseek()
>> will run the code intended for BeOS.
>> So either the lseek.m4 or lseek.c needs adjusting accordingly.
> 
> Good catch, thanks. I updated the Gnulib patch accordingly; see attached.

Nice: I just downloaded a fresh copy of Savannah, and it already includes the attached patch, so no action needed on my side.

The copy created by cp is good. I noticed that you have added a —debug switch, so I gave it a test:
1. If the target exists I get this output. I would assume zeroes means that all data is read from the source, but pages containing only zeroes are skipped, while pages containing data are written to the target, which is fine. I would guess 4 KB page granularity or some form of detection takes place.

./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
'cc1-mmap' -> 'cc1-ori'
copy offload: avoided, reflink: unsupported, sparse detection: zeros

2. If the target does not exist, the file is cloned. You might want to report something about that in the debug output. Otherwise the clone is good.

./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
'cc1-mmap' -> 'cc1-ori’


My first attempt to run the tests stopped here, so I had to interrupt it 10 minutes later with Control+C.
make check-TESTS
make[1]: Entering directory '/Volumes/coreutils/coreutils-2023-02-24'
PASS: tests/misc/help-version.sh
PASS: tests/misc/help-version-getopt.sh

Then I ran it again, and I can see the tests run very slowly now. This time it hanged after:

PASS: tests/rm/isatty.sh
^Cmake[1]: *** Deleting file 'tests/rm/empty-inacc.log'
make[1]: *** [Makefile:21399: tests/rm/empty-inacc.log] Error 130
make: *** [Makefile:21381: check-TESTS] Interrupt: 2

make check-TESTS

make[1]: Entering directory '/Volumes/coreutils/coreutils-2023-02-24'
PASS: tests/misc/help-version.sh
PASS: tests/misc/help-version-getopt.sh
^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
^C^C^C^C

Now it hangs here and I cannot interrupt it. I don’t see any CPU usage related to the tests. My laptop is idle. Attempting to close the Terminal window gave me this warning:

Do you want to terminate running processes in this tab?
Closing this tab will terminate the running processes: gdb, bash, make (2), sh (4).

That’s a good culprit. Two days ago brew updated gdb to version 13.1 and there was a message asking me to sign it with some entitlements to make it more functional. So, I signed gdb. At first though make check-TESTS does not play nice when gdb is signed, so I removed the signature, but that did not help. Next I restored version 12.1_2 and ran make check-TESTS again, which still hangs at the same points. So gdb signature and version have nothing to do with this new issue.

Next I ran the tests on top of my previous unpatched build directory coreutils cf80f988eeb97cc3f8c65ae58e735d36f865277b, gnulib 32c16c45d7378b014d9aac6130104c4d02a9acdb and it works fine, so I would assume something related to the tests has been broken recently in coreutils or gnulib. I restored gdb 13.1 signed and the tests completed again.

Back to latest unmodified coreutils 5c8c2a5161c0b6f84212778f694c526105f10dab, gnulib 7352d9fec59398c76c3bb8a2ef86ba58818f0342, the tests hang.

make check-TESTS
make[1]: Entering directory '/Volumes/coreutils/coreutils-2023-02-24'
PASS: tests/misc/help-version.sh
PASS: tests/misc/help-version-getopt.sh
^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
^C^C^C^C^C^C

killall gdb
ps -A |grep gdb
29584 ??         0:00.09 gdb -nx --batch-silent --eval-command=break 1475 --eval-command=run --pid=29583 -f file --eval-command=quit tail
23269 ttys010    0:00.09 gdb -nx --batch-silent --eval-command=break 1475 --eval-command=run --pid=23268 -f file --eval-command=quit tail

killall -9 gdb
ps -A |grep gdb

Killing gdb allowed the tests to continue, I had to do it twice:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-06-5c8c2a5161c0b6f84212778f694c526105f10dab-ori.txt


> On 2/20/23 02:21, George Valkov wrote:
> > What is the correct way to apply your patch?
> 
> Sounds like patching is a bit of a hassle, so to make things easier I installed the attached patch into Gnulib, and propagated this into Coreutils.

I’m starting to think I made a mistake by trying to apply your patch inside coreutils, since it already had links to the affected files. I guess I should have applied the patch inside coreutils/gnulib?

So my question was: what command do you use to apply your patch and in which directory do you run it?


> You should be able to get the latest version of Coreutils from savannah.gnu.org, and then run './bootstrap; ./configure; make' if you have suitable development tools like Autoconf. You can run ./bootstrap on a GNU/Linux platform and then copy the directory to macOS and run './configure; make' on macOS. Please give it a try.

All the development tools should already be installed on the Mac using brew. I can confirm autoconf is installed:

brew list | grep autoconf
autoconf

I always start with these commands:

git clone https://github.com/coreutils/coreutils.git
cd coreutils
git submodule foreach git pull origin master
./bootstrap
./configure
make -j 16
git checkout -b 2023-02-24

Next I apply the changes you want, make clean ; make -j 16
Finally I proceed with testing.


> > If you know what conditions are required to reproduce the issue on FreeBSD,
> 
> I don't. I'm relying on FreeBSD bug report 256205. I cited that in the attached patch.

I see they reproduce the bug on both arm64 and x64. Based on the conversation their issue is very similar: they use a linker to craft a file, that linker is likely to use mmap. But their issue is fixed with fsync. And they also talk about nulls. macOS is indeed based on BSD.

Also if I read correctly, they do not experience the issue when working with the root file-system on FreeBSD, its the same on macOS: I need to mount an APFS sparse disk image, they use nullfs.

Comments say they seem to have fixed it partiality in 42881526d401e7a9c09241e392b7ffa18cfe11d6, and then completely later. I am too busy to play with FreeBSD at the moment, as it takes more advanced preparation.


> On 2023-02-24, at 2:51 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
>> On 2/20/23 02:21, George Valkov wrote:
>>  > What is the correct way to apply your patch?
>> Sounds like patching is a bit of a hassle, so to make things easier I
>> installed the attached patch into Gnulib, and propagated this into
>> Coreutils.
> 
> The latest coreutils that should include all fixes for this issue is at
> https://www.pixelbeat.org/cu/coreutils-9.1.160-5c8c2.tar.gz
> That should be buildable with the standard ./configure && make combo

Pádraig, I compiled from your archive, cp produces a good copy, however make check-TESTS hangs the same way as with the clone I compiled on the latest Savannah master.


Good luck!


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 24 Feb 2023 15:44:02 GMT) Full text and rfc822 format available.

Message #203 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Gnulib bugs <bug-gnulib <at> gnu.org>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 24 Feb 2023 15:43:39 +0000
[Message part 1 (text/plain, inline)]
On 24/02/2023 14:33, George Valkov wrote:
> 
>> On 2023-02-24, at 12:23 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>
>> On 2/20/23 13:14, Pádraig Brady wrote:
>>> Since https://github.com/coreutils/gnulib/commit/4db8db34
>>> gnulib has been unconditionally replacing lseek() on macos.
>>> Now with SEEK_DATA undefined that replaced lseek()
>>> will run the code intended for BeOS.
>>> So either the lseek.m4 or lseek.c needs adjusting accordingly.
>>
>> Good catch, thanks. I updated the Gnulib patch accordingly; see attached.
> 
> Nice: I just downloaded a fresh copy of Savannah, and it already includes the attached patch, so no action needed on my side.
> 
> The copy created by cp is good. I noticed that you have added a —debug switch, so I gave it a test:
> 1. If the target exists I get this output. I would assume zeroes means that all data is read from the source, but pages containing only zeroes are skipped, while pages containing data are written to the target, which is fine. I would guess 4 KB page granularity or some form of detection takes place.

Exactly.

> ./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
> 'cc1-mmap' -> 'cc1-ori'
> copy offload: avoided, reflink: unsupported, sparse detection: zeros
> 
> 2. If the target does not exist, the file is cloned. You might want to report something about that in the debug output. Otherwise the clone is good.
> 
> ./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
> 'cc1-mmap' -> 'cc1-ori’

Yes that was a mistake.
Should be fixed with
https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=65bb27656

> My first attempt to run the tests stopped here, so I had to interrupt it 10 minutes later with Control+C.

> ^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
> ^C^C^C^C^C^C
> 
> killall gdb
> ps -A |grep gdb
> 29584 ??         0:00.09 gdb -nx --batch-silent --eval-command=break 1475 --eval-command=run --pid=29583 -f file --eval-command=quit tail
> 23269 ttys010    0:00.09 gdb -nx --batch-silent --eval-command=break 1475 --eval-command=run --pid=23268 -f file --eval-command=quit tail
> 
> killall -9 gdb
> ps -A |grep gdb
> 
> Killing gdb allowed the tests to continue, I had to do it twice:

That's awkward.
That test documents the issue with protecting the gdb invocation with timeout(1).
I suppose we could restrict this test to inotify capable systems,
which would have the side effect of avoiding its use on non linux systems.
The attached patch does that.

thanks for all the testing.

Pádraig
[macos-gdb-hang.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 24 Feb 2023 19:35:02 GMT) Full text and rfc822 format available.

Message #206 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Gnulib bugs <bug-gnulib <at> gnu.org>,
 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 24 Feb 2023 21:34:02 +0200
> On 2023-02-24, at 5:43 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 24/02/2023 14:33, George Valkov wrote:
>>> On 2023-02-24, at 12:23 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>> 
>>> On 2/20/23 13:14, Pádraig Brady wrote:
>>>> Since https://github.com/coreutils/gnulib/commit/4db8db34
>>>> gnulib has been unconditionally replacing lseek() on macos.
>>>> Now with SEEK_DATA undefined that replaced lseek()
>>>> will run the code intended for BeOS.
>>>> So either the lseek.m4 or lseek.c needs adjusting accordingly.
>>> 
>>> Good catch, thanks. I updated the Gnulib patch accordingly; see attached.
>> Nice: I just downloaded a fresh copy of Savannah, and it already includes the attached patch, so no action needed on my side.
>> The copy created by cp is good. I noticed that you have added a —debug switch, so I gave it a test:
>> 1. If the target exists I get this output. I would assume zeroes means that all data is read from the source, but pages containing only zeroes are skipped, while pages containing data are written to the target, which is fine. I would guess 4 KB page granularity or some form of detection takes place.
> 
> Exactly.
> 
>> ./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
>> 'cc1-mmap' -> 'cc1-ori'
>> copy offload: avoided, reflink: unsupported, sparse detection: zeros
>> 2. If the target does not exist, the file is cloned. You might want to report something about that in the debug output. Otherwise the clone is good.
>> ./coreutils-2023-02-24/src/cp --debug cc1-mmap cc1-ori
>> 'cc1-mmap' -> 'cc1-ori’
> 
> Yes that was a mistake.
> Should be fixed with
> https://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commitdiff;h=65bb27656

Looks good
commit 65bb2765646df18488b266e6c1851593d3f9c966

./coreutils-2023-02-24.b/src/cp --debug cc1-mmap cc1-ori 
'cc1-mmap' -> 'cc1-ori'
copy offload: unknown, reflink: yes, sparse detection: unknown


>> My first attempt to run the tests stopped here, so I had to interrupt it 10 minutes later with Control+C.
> 
>> ^Cmake[1]: *** Deleting file 'tests/tail-2/inotify-race.log'
>> ^C^C^C^C^C^C
>> killall gdb
>> ps -A |grep gdb
>> 29584 ??         0:00.09 gdb -nx --batch-silent --eval-command=break 1475 --eval-command=run --pid=29583 -f file --eval-command=quit tail
>> 23269 ttys010    0:00.09 gdb -nx --batch-silent --eval-command=break 1475 --eval-command=run --pid=23268 -f file --eval-command=quit tail
>> killall -9 gdb
>> ps -A |grep gdb
>> Killing gdb allowed the tests to continue, I had to do it twice:
> 
> That's awkward.
> That test documents the issue with protecting the gdb invocation with timeout(1).
> I suppose we could restrict this test to inotify capable systems,
> which would have the side effect of avoiding its use on non linux systems.
> The attached patch does that.

Still hanging out there after
PASS: tests/rm/isatty.sh

There were no gdb instances to kill, but if I press enter, it continues

empty-inacc.sh: set-up failure: 
ERROR: tests/rm/empty-inacc.sh
PASS: tests/rm/empty-name.pl
…

https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-07-65bb2765646df18488b266e6c1851593d3f9c966-patch.txt


coreutils: git checkout cf80f988eeb97cc3f8c65ae58e735d36f865277b
gnulib: git checkout 32c16c45d7378b014d9aac6130104c4d02a9acdb

./bootstrap && ./configure && make clean && make -j 16
make check-TESTS # still hangs: gdb

git checkout -b cf80-macos-gdb-hang
git am < macos-gdb-hang.patch
make clean && make -j 16
make check-TESTS # completes successfully

Clone and configure another fresh copy; gnulib at master; test various coreutils commits, applying macos-gdb-hang.patch on top of them; make clean && make -j 16

cf80f988eeb97cc3f8c65ae58e735d36f865277b hangs:  gdb

I would suspect either some change in gnulib or some of the other repositories is causing these. I cannot checkout random gnulib commits, since the build breaks completely.

Check various commits of gnulib
gnulib 7352d9fec59398c76c3bb8a2ef86ba58818f0342 master hangs: ENTER
gnulib bb3fd10e6309f017618a12b2c10d3bfb813bfc08 hangs: ENTER
gnulib f77a31de60963c994cd9b42c8088be0e734962d7 aclocal-1.16: error: aclocal: file 'm4/build-to-host.m4' does not exist

Trying to revert some commits and go back in time:
git revert f77a31de60963c994cd9b42c8088be0e734962d7 fails
git revert 1e29238e40d118d4f769f7516700dd4fc494bfcd fails


> thanks for all the testing.

Look Pádraig, I’m glad to help, but this is really taking a lot of energy, these tests took another full day, and I’d be thankful if we can make everything work sooner. I’ve been spending many hours each day for a few weeks now. I need to finish my own tasks and find a job.

In the old build directory if I ./bootstrap && ./configure and make -j 16, the tests complete. It is using these checkpoints:
coreutils: git checkout cf80f988eeb97cc3f8c65ae58e735d36f865277b
gnulib: git checkout 32c16c45d7378b014d9aac6130104c4d02a9acdb

However if I clone a fresh copy ./bootstrap && ./configure, then check the same commits, ./configure again and make -j 16, I need to also apply your gdb patch, otherwise it hangs. And on master I need your patch and I need to press enter after PASS: tests/rm/isatty.sh.


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 24 Feb 2023 22:07:01 GMT) Full text and rfc822 format available.

Message #209 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Gnulib bugs <bug-gnulib <at> gnu.org>,
 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 25 Feb 2023 00:06:38 +0200
If I revert a0803c4bad6f8e11bb05effcc42ef433f4fc3f9b, the requirement to press enter after PASS: tests/rm/isatty.sh is fixed.

Sometimes it might randomly hang: gdb after
PASS: tests/rm/r-4.sh
^Cmake[1]: *** Deleting file 'tests/rm/r-root.log'
^C^C^C

ps -A |grep gdb
55051 ttys020    0:00.09 gdb -nx --batch-silent -return-child-result --eval-command=set exec-wrapper\011\011\011\011\011                       env 'LD_PRELOAD=:./k.so'  --eval-command=break '/Volumes/coreutils/coreutils-2023-02-24.d/src/remove.c:377' --eval-command=source bp.py --eval-command=run -rv --one-file-system dir --eval-command=quit rm

even though I have applied macos-gdb-hang.patch. Other times it goes past that:
r-root.sh: skipped test: internal test failure: maybe LD_PRELOAD or gdb doesn't work?
and eventually the tests complete.

If it hangs and I killall -9 gdb, it continues
r-root.sh: skipped test: internal test failure: maybe LD_PRELOAD or gdb doesn't work?
SKIP: tests/rm/r-root.sh
PASS: tests/rm/readdir-bug.sh
…


Georgi Valkov
httpstorm.com
nano RTOS





Reply sent to Pádraig Brady <P <at> draigBrady.com>:
You have taken responsibility. (Fri, 24 Feb 2023 22:48:02 GMT) Full text and rfc822 format available.

Notification sent to George Valkov <gvalkov <at> gmail.com>:
bug acknowledged by developer. (Fri, 24 Feb 2023 22:48:02 GMT) Full text and rfc822 format available.

Message #214 received at 61386-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Gnulib bugs <bug-gnulib <at> gnu.org>,
 61386-done <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Fri, 24 Feb 2023 22:47:52 +0000
On 24/02/2023 22:06, George Valkov wrote:
> If I revert a0803c4bad6f8e11bb05effcc42ef433f4fc3f9b, the requirement to press enter after PASS: tests/rm/isatty.sh is fixed.
> 

Ah very useful info.
I'll test this on my macOS 13.2 system.

> Sometimes it might randomly hang: gdb after
> PASS: tests/rm/r-4.sh
> ^Cmake[1]: *** Deleting file 'tests/rm/r-root.log'
> ^C^C^C

This is the same gdb issue, which I'll skip similarly
to the tail-2/inotify tests.


Since all issues related to sparse copy on macOS are now addressed,
I'm marking this bug as done.

The above bugs are unrelated, and I'll take them from here.

thanks again for all the testing.

Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Fri, 24 Feb 2023 23:25:02 GMT) Full text and rfc822 format available.

Message #217 received at 61386-done <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Gnulib bugs <bug-gnulib <at> gnu.org>,
 61386-done <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 25 Feb 2023 01:23:52 +0200
> On 2023-02-25, at 12:47 AM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 24/02/2023 22:06, George Valkov wrote:
>> If I revert a0803c4bad6f8e11bb05effcc42ef433f4fc3f9b, the requirement to press enter after PASS: tests/rm/isatty.sh is fixed.
> 
> Ah very useful info.
> I'll test this on my macOS 13.2 system.
> 
>> Sometimes it might randomly hang: gdb after
>> PASS: tests/rm/r-4.sh
>> ^Cmake[1]: *** Deleting file 'tests/rm/r-root.log'
>> ^C^C^C
> 
> This is the same gdb issue, which I'll skip similarly
> to the tail-2/inotify tests.
> 
> 
> Since all issues related to sparse copy on macOS are now addressed,
> I'm marking this bug as done.
> 
> The above bugs are unrelated, and I'll take them from here.
> 
> thanks again for all the testing.


Thank you Pádraig! You and Paul did a lot of work, and I’m happy the issue is resolved. I’ll let you know if I get any update from Apple.

Are we in a state where I can update OpenWRT to the latest coreutils master or perhaps it would be better to wait until you fix the tests and coreutils-9.2 is released? You mentioned the release is coming soon?

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 25 Feb 2023 12:28:02 GMT) Full text and rfc822 format available.

Message #220 received at 61386-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, Gnulib bugs <bug-gnulib <at> gnu.org>,
 61386-done <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 25 Feb 2023 12:27:11 +0000
On 24/02/2023 23:23, George Valkov wrote:
> Are we in a state where I can update OpenWRT to the latest coreutils master or perhaps it would be better to wait until you fix the tests and coreutils-9.2 is released? You mentioned the release is coming soon?

Probably best to wait for the 9.2 release at this stage.
Best guess is 1.5 weeks away.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 06 Mar 2023 07:38:02 GMT) Full text and rfc822 format available.

Message #223 received at 61386-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 George Valkov <gvalkov <at> gmail.com>
Cc: Gnulib bugs <bug-gnulib <at> gnu.org>, 61386-done <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sun, 5 Mar 2023 23:37:00 -0800
I recall reading somewhere in this thread that a 'split' test was 
failing on macOS because it doesn't let you lseek on /dev/null. I fixed 
that problem here:

https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=aa266f1b3dc4e12acdc46cc0f562adc03c2c0b8f

and fixed some other 'split' issues in adjacent patches, while I was in 
the neighborhood.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 06 Mar 2023 11:14:01 GMT) Full text and rfc822 format available.

Message #226 received at 61386-done <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: Gnulib bugs <bug-gnulib <at> gnu.org>, 61386-done <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 6 Mar 2023 11:13:08 +0000
On 06/03/2023 07:37, Paul Eggert wrote:
> I recall reading somewhere in this thread that a 'split' test was
> failing on macOS because it doesn't let you lseek on /dev/null. I fixed
> that problem here:
> 
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=aa266f1b3dc4e12acdc46cc0f562adc03c2c0b8f
> 
> and fixed some other 'split' issues in adjacent patches, while I was in
> the neighborhood.

The lseek /dev/null issue was only in your undef SEEK_DATA test patch,
and already addressed in your final gnulib patch in the area, as discussed at:
https://lists.gnu.org/archive/html/bug-coreutils/2023-02/msg00081.html

Also immediately rejecting input where we can't determine the size is a feature.
I.e. the following is the expected behavior:

  $ : | split -n l/1
  split: -: cannot determine file size

With the changes we now have:

  $ : | split -n l/1  # Creates an empty file
  $ yes | split -n l/1
  split: -: cannot determine file size

This is inconsistent, and an insidious issue that users may introduce to scripts,
that will only fail once input data hits a certain size.

Also there are a few `make syntax-check` issues in the new split code.

Would it be possible to revert this change in isolation?

thanks,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 06 Mar 2023 21:50:01 GMT) Full text and rfc822 format available.

Message #229 received at 61386-done <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: Gnulib bugs <bug-gnulib <at> gnu.org>,
 Pádraig Brady <P <at> draigBrady.com>, 61386-done <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 6 Mar 2023 23:49:47 +0200
> On 2023-03-06, at 9:37 AM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> I recall reading somewhere in this thread that a 'split' test was failing on macOS because it doesn't let you lseek on /dev/null. I fixed that problem here:
> 
> https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=aa266f1b3dc4e12acdc46cc0f562adc03c2c0b8f
> 
> and fixed some other 'split' issues in adjacent patches, while I was in the neighborhood.

I would assume the test are irrelevant, but since I already ran then, here you go:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-09-e30af1e5840fdcabe3856f9ffcd6354b94d0a7ee-ori.txt


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Mon, 06 Mar 2023 23:49:02 GMT) Full text and rfc822 format available.

Message #232 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>,
 George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Mon, 6 Mar 2023 15:48:07 -0800
[Message part 1 (text/plain, inline)]
On 3/6/23 03:13, Pádraig Brady wrote:
> On 06/03/2023 07:37, Paul Eggert wrote:
> The lseek /dev/null issue was only in your undef SEEK_DATA test patch,
> and already addressed in your final gnulib patch in the area, as 
> discussed at:
> https://lists.gnu.org/archive/html/bug-coreutils/2023-02/msg00081.html

Sorry,  I got confused about that issue versus other 'split' issues.


> Also immediately rejecting input where we can't determine the size is a 
> feature.
> I.e. the following is the expected behavior:
> 
>    $ : | split -n l/1
>    split: -: cannot determine file size

But 'split' can easily determine the size there: it's zero. 'split' 
doesn't need to use lseek to do that; a single 'read' will do.


> With the changes we now have:
> 
>    $ : | split -n l/1  # Creates an empty file
>    $ yes | split -n l/1
>    split: -: cannot determine file size
> 
> This is inconsistent, and an insidious issue that users may introduce to 
> scripts,
> that will only fail once input data hits a certain size.

That's OK, as lots of standard utilities already have that issue and 
users are OK with this. Users can't reasonably expect 'split -n' to work 
on infinitely-sized input such as the second example. At some point 
there is a limit.

It's a bit like 'sort' if you feed 'sort' longer and longer input lines, 
eventually it will fail and give you a diagnostic.

It'd be nicer if the limit of 'split' were larger than what's in current 
Git. If you'd prefer that we raise the limit further, by copying stdin 
into a temporary file first, I can write a patch to do that.


> Also there are a few `make syntax-check` issues in the new split code.

Ouch, sorry, I'm always forgetting that. Fixed by the attached patch. 
(Two of the issues were in 'split', one was elsewhere.)


> Would it be possible to revert this change in isolation?

We could revert the patch's effect (not sure if it's a simple revert, 
but I could easily generate such a patch). I'm hoping, though, that we 
can reach consensus on extending split's functionality instead, perhaps 
along the abovementioned lines.

[dropping bug-gnulib since this is no longer relevant to Gnulib.]
[0001-maint-pacify-make-syntax-check.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 07 Mar 2023 00:32:02 GMT) Full text and rfc822 format available.

Message #235 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>, George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 7 Mar 2023 00:31:33 +0000
On 06/03/2023 23:48, Paul Eggert wrote:
> On 3/6/23 03:13, Pádraig Brady wrote:
>> On 06/03/2023 07:37, Paul Eggert wrote:
>> The lseek /dev/null issue was only in your undef SEEK_DATA test patch,
>> and already addressed in your final gnulib patch in the area, as
>> discussed at:
>> https://lists.gnu.org/archive/html/bug-coreutils/2023-02/msg00081.html
> 
> Sorry,  I got confused about that issue versus other 'split' issues.
> 
> 
>> Also immediately rejecting input where we can't determine the size is a
>> feature.
>> I.e. the following is the expected behavior:
>>
>>     $ : | split -n l/1
>>     split: -: cannot determine file size
> 
> But 'split' can easily determine the size there: it's zero. 'split'
> doesn't need to use lseek to do that; a single 'read' will do.
> 
> 
>> With the changes we now have:
>>
>>     $ : | split -n l/1  # Creates an empty file
>>     $ yes | split -n l/1
>>     split: -: cannot determine file size
>>
>> This is inconsistent, and an insidious issue that users may introduce to
>> scripts,
>> that will only fail once input data hits a certain size.
> 
> That's OK, as lots of standard utilities already have that issue and
> users are OK with this. Users can't reasonably expect 'split -n' to work
> on infinitely-sized input such as the second example. At some point
> there is a limit.
> 
> It's a bit like 'sort' if you feed 'sort' longer and longer input lines,
> eventually it will fail and give you a diagnostic.
> 
> It'd be nicer if the limit of 'split' were larger than what's in current
> Git. If you'd prefer that we raise the limit further, by copying stdin
> into a temporary file first, I can write a patch to do that.

>> Also there are a few `make syntax-check` issues in the new split code.
> 
> Ouch, sorry, I'm always forgetting that. Fixed by the attached patch.
> (Two of the issues were in 'split', one was elsewhere.)

syntax check now looks good thanks.

>> Would it be possible to revert this change in isolation?
> 
> We could revert the patch's effect (not sure if it's a simple revert,
> but I could easily generate such a patch). I'm hoping, though, that we
> can reach consensus on extending split's functionality instead, perhaps
> along the abovementioned lines.
> 
> [dropping bug-gnulib since this is no longer relevant to Gnulib.]

I see this as a more fundamental operational thing than a limit issue.
`split -n` needing the size up front alludes to the operation,
so it's obvious that `yes | split -n l/4 ...` doesn't do round robin for example
and fails immediately.

Also the main point of only supporting a small amount of data
before failing is a bad gotcha for users.

So options I see are to persist implicitly with a temp file patch,
or leave as is and have users explicitly persist the data to split.
Advantages of leaving as is, is it's obvious to users that
all data needs to be read before splitting starts.
Also given split is often used with large data,
explicit control is often desired over where and how it's persisted.

So yes a temp file patch would be better / required,
but I'm thinking a revert would still be better again.

cheers,
Pádraig.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 07 Mar 2023 21:54:01 GMT) Full text and rfc822 format available.

Message #238 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org, George Valkov <gvalkov <at> gmail.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 7 Mar 2023 13:53:35 -0800
[Message part 1 (text/plain, inline)]
On 3/6/23 16:31, Pádraig Brady wrote:

> syntax check now looks good thanks.

You're welcome. I'm getting a failure on tests/du/threshold.sh, though, 
on Fedora 37 x86-64. Are you seeing that? See attached (compressed) log.


> I see this as a more fundamental operational thing than a limit issue.
> `split -n` needing the size up front alludes to the operation,

To ameliorate a bit, we can document this (done in the attached patch).

I don't see this as being significantly more serious than the other 
utilities that read all their input (possibly from a pipe) before 
outputting anything. They all need to deal with exhausting temporary 
space, and 'split' is merely joining the company of 'sort' and 'tac'.


> given split is often used with large data,
> explicit control is often desired over where and how it's persisted.

Anybody needing that control can copy the data themselves to a temp 
file, before calling 'split'. (Like 'sort' and 'tac'.)

Since the change doesn't invalidate existing uses, it sounds like you're 
worried that users will want 'split' to work even on enormous pipe 
inputs, inputs so large that /tmp fills up. I think this unlikely in 
practice, but if it turns into a real concern I can work around most of 
the problem by using the first output file as the temporary. However, 
I'd prefer to avoid doing this unless it's necessary, as it seems overkill.

I installed the attached patch so that 'split' uses a temp file in this 
situation, so it is no longer limited to the 128 KiB size that it was 
before the attached patch. Hope this is good enough; if not please let 
me know.
[make-check-log.txt.gz (application/gzip, attachment)]
[0001-split-support-split-n-on-larger-pipe-input.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 07 Mar 2023 23:17:01 GMT) Full text and rfc822 format available.

Message #241 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, George Valkov <gvalkov <at> gmail.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 7 Mar 2023 23:16:26 +0000
On 07/03/2023 21:53, Paul Eggert wrote:
> On 3/6/23 16:31, Pádraig Brady wrote:
> 
>> syntax check now looks good thanks.
> 
> You're welcome. I'm getting a failure on tests/du/threshold.sh, though,
> on Fedora 37 x86-64. Are you seeing that? See attached (compressed) log.
> 
> 
>> I see this as a more fundamental operational thing than a limit issue.
>> `split -n` needing the size up front alludes to the operation,
> 
> To ameliorate a bit, we can document this (done in the attached patch).
> 
> I don't see this as being significantly more serious than the other
> utilities that read all their input (possibly from a pipe) before
> outputting anything. They all need to deal with exhausting temporary
> space, and 'split' is merely joining the company of 'sort' and 'tac'.
> 
> 
>> given split is often used with large data,
>> explicit control is often desired over where and how it's persisted.
> 
> Anybody needing that control can copy the data themselves to a temp
> file, before calling 'split'. (Like 'sort' and 'tac'.)
> 
> Since the change doesn't invalidate existing uses, it sounds like you're
> worried that users will want 'split' to work even on enormous pipe
> inputs, inputs so large that /tmp fills up. I think this unlikely in
> practice, but if it turns into a real concern I can work around most of
> the problem by using the first output file as the temporary. However,
> I'd prefer to avoid doing this unless it's necessary, as it seems overkill.
> 
> I installed the attached patch so that 'split' uses a temp file in this
> situation, so it is no longer limited to the 128 KiB size that it was
> before the attached patch. Hope this is good enough; if not please let
> me know.

Thanks for doing that Paul.
The implementation looks good.

I'll look at the du test later.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Mar 2023 07:30:02 GMT) Full text and rfc822 format available.

Message #244 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>,
 Pádraig Brady <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Mar 2023 09:29:07 +0200
> On 2023-03-07, at 11:53 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 3/6/23 16:31, Pádraig Brady wrote:
> 
>> syntax check now looks good thanks.
> 
> You're welcome. I'm getting a failure on tests/du/threshold.sh, though, on Fedora 37 x86-64. Are you seeing that? See attached (compressed) log.
> 
> 
>> I see this as a more fundamental operational thing than a limit issue.
>> `split -n` needing the size up front alludes to the operation,
> 
> To ameliorate a bit, we can document this (done in the attached patch).
> 
> I don't see this as being significantly more serious than the other utilities that read all their input (possibly from a pipe) before outputting anything. They all need to deal with exhausting temporary space, and 'split' is merely joining the company of 'sort' and 'tac'.
> 
> 
>> given split is often used with large data,
>> explicit control is often desired over where and how it's persisted.
> 
> Anybody needing that control can copy the data themselves to a temp file, before calling 'split'. (Like 'sort' and 'tac'.)
> 
> Since the change doesn't invalidate existing uses, it sounds like you're worried that users will want 'split' to work even on enormous pipe inputs, inputs so large that /tmp fills up. I think this unlikely in practice, but if it turns into a real concern I can work around most of the problem by using the first output file as the temporary. However, I'd prefer to avoid doing this unless it's necessary, as it seems overkill.
> 
> I installed the attached patch so that 'split' uses a temp file in this situation, so it is no longer limited to the 128 KiB size that it was before the attached patch. Hope this is good enough; if not please let me know.<make-check-log.txt.gz><0001-split-support-split-n-on-larger-pipe-input.patch>

Hi Paul!
Here are the latest test results on macOS 12.6.3:
threshold.sh: skipped test: block size of a directory is smaller than 4 bytes
SKIP: tests/du/threshold.sh
It was also skipped on macOS before the patch, so no change.

https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-10-16000805eb1bcdf25360471b8bbc8ec3f025e035-ori.txt

The folks at OpenWRT asked for an update on the conreutils-9.2 release date?
I will let them know the sparse copy fix is already on Savannah master.


Good luck!

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Sat, 11 Mar 2023 14:29:02 GMT) Full text and rfc822 format available.

Message #247 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Sat, 11 Mar 2023 14:28:15 +0000
On 11/03/2023 07:29, George Valkov wrote:
> Hi Paul!
> Here are the latest test results on macOS 12.6.3:
> threshold.sh: skipped test: block size of a directory is smaller than 4 bytes
> SKIP: tests/du/threshold.sh
> It was also skipped on macOS before the patch, so no change.
> 
> https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-10-16000805eb1bcdf25360471b8bbc8ec3f025e035-ori.txt
> 
> The folks at OpenWRT asked for an update on the conreutils-9.2 release date?
> I will let them know the sparse copy fix is already on Savannah master.

Best guess is 9.2 release is 5 days away.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 21 Mar 2023 11:56:01 GMT) Full text and rfc822 format available.

Message #250 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 21 Mar 2023 13:55:04 +0200
> On 2023-03-11, at 4:28 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 11/03/2023 07:29, George Valkov wrote:
>> Hi Paul!
>> Here are the latest test results on macOS 12.6.3:
>> threshold.sh: skipped test: block size of a directory is smaller than 4 bytes
>> SKIP: tests/du/threshold.sh
>> It was also skipped on macOS before the patch, so no change.
>> https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-10-16000805eb1bcdf25360471b8bbc8ec3f025e035-ori.txt
>> The folks at OpenWRT asked for an update on the conreutils-9.2 release date?
>> I will let them know the sparse copy fix is already on Savannah master.
> 
> Best guess is 9.2 release is 5 days away.

Thank you for your hard work on coreutils, Pádraig!

I opened a PR for OpenWRT here:
https://github.com/openwrt/openwrt/pull/12233

I remember you made changes to gnulib and m4, but I do not see a new release tag for m4, and I see that OpenWRT builds tools/m4 as a prerequisite for tools/coreutils. The version of m4 currently in use seems outdated. It is downloaded from
https://ftp.gnu.org/gnu/m4/m4-1.4.19.tar.xz
or one of the many @GNU mirrors. The latest version is from 2021-05-28 17:55. What is the current status for m4? I suppose we should instead clone and build gnulib? Is there a source download link or should be clone from the Savanna git repo? Is there a specific release of gnulib or should we checkout the commit selected by coreutils? Or perhaps we should automate the build using:
./bootstrap && ./configure && make

coreutils-9.2 test results on macOS 12.6.3:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/test-20-9.2-release.txt

Cheers, mate!

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 21 Mar 2023 14:50:01 GMT) Full text and rfc822 format available.

Message #253 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>, Pádraig Brady
 <P <at> draigBrady.com>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 21 Mar 2023 07:49:01 -0700
On 3/21/23 06:55, George Valkov wrote:
> Is there a source download link or should be clone from the Savanna git repo?

Easiest might be to build from the latest release:

https://ftp.gnu.org/pub/gnu/coreutils/coreutils-9.2.tar.xz

If you do that, you shouldn't need to run m4, and all the gnulib files 
will be there already.





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 21 Mar 2023 15:28:01 GMT) Full text and rfc822 format available.

Message #256 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 21 Mar 2023 17:27:18 +0200
> On 2023-03-21, at 4:49 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> Easiest might be to build from the latest release:
> https://ftp.gnu.org/pub/gnu/coreutils/coreutils-9.2.tar.xz
> If you do that, you shouldn't need to run m4, and all the gnulib files will be there already.


Thanks, Paul!

Yes, this is what we do, if you look at tools/coreutils/Makefile (link below), I changed PKG_VERSION and PKG_HASH.

make tools/coreutils/{clean,compile} V=sc
downloads the file you mentioned from one of the @GNU mirrors, the SHA2 hash matches 6885ff47b9cdb211de47d368c17853f406daaf98b148aaecdf10de29cc04b0b3, so it proceeds with the build.
https://github.com/httpstorm/openwrt/tree/coreutils-9.2/tools/coreutils
https://github.com/openwrt/openwrt/pull/12233/files

However since the old m4-1.4.19 in under tools/m4, coreutils builds with the legacy version:
https://github.com/httpstorm/openwrt/tree/coreutils-9.2/tools/m4

I tried removing tools/m4, and I get the following error
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/2023-03-21-openwrt-build-coreutils-9.2.txt

make tools/coreutils/{clean,compile} V=sc
…
config.status: creating po/POTFILES
config.status: creating po/Makefile
touch /Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2/.configured
CFLAGS="-O2 -I/Volumes/wrt3200/openwrt/staging_dir/host/include " CPPFLAGS="-I/Volumes/wrt3200/openwrt/staging_dir/host/include " CXXFLAGS="" LDFLAGS="-L/Volumes/wrt3200/openwrt/staging_dir/host/lib " make  -C /Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2 PROGRAMS="src/date src/readlink src/touch src/ln src/chown src/ginstall" LIBRARIES= MANS= SUBDIRS=. 
make[3]: Entering directory '/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2'
CDPATH="${ZSH_VERSION+.}:" && cd . && /usr/bin/env bash '/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2/build-aux/missing' aclocal-1.16 -I m4
autom4te: error: need GNU m4 1.4 or later: /Volumes/wrt3200/openwrt/staging_dir/host/bin/m4
aclocal.real: error: autom4te failed with exit status: 1
make[3]: *** [Makefile:8458: aclocal.m4] Error 1
make[3]: Leaving directory '/Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2'
make[2]: *** [Makefile:45: /Volumes/wrt3200/openwrt/build_dir/host/coreutils-9.2/.built] Error 2
make[2]: Leaving directory '/Volumes/wrt3200/openwrt/tools/coreutils'
time: tools/coreutils/compile#58.69#31.58#154.08
    ERROR: tools/coreutils failed to build.
make[1]: *** [tools/Makefile:219: tools/coreutils/compile] Error 1
make[1]: Leaving directory '/Volumes/wrt3200/openwrt'
make: *** [/Volumes/wrt3200/openwrt/include/toplevel.mk:231: tools/coreutils/compile] Error 2

Any ideas?


Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 21 Mar 2023 17:33:01 GMT) Full text and rfc822 format available.

Message #259 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: George Valkov <gvalkov <at> gmail.com>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 21 Mar 2023 10:32:09 -0700
On 3/21/23 08:27, George Valkov wrote:
> However since the old m4-1.4.19 in under tools/m4, coreutils builds with the legacy version:

If you're building from that tarball, coreutils doesn't need m4. I just 
now double-checked that, in an environment that didn't have m4. 
"./configure && make check" succeeded. So the m4 version should be 
irrelevant.




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 21 Mar 2023 18:10:02 GMT) Full text and rfc822 format available.

Message #262 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org, Pádraig Brady <P <at> draigBrady.com>
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 21 Mar 2023 20:08:57 +0200
> On 2023-03-21, at 7:32 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
> 
> On 3/21/23 08:27, George Valkov wrote:
>> However since the old m4-1.4.19 in under tools/m4, coreutils builds with the legacy version:
> 
> If you're building from that tarball, coreutils doesn't need m4. I just now double-checked that, in an environment that didn't have m4. "./configure && make check" succeeded. So the m4 version should be irrelevant.

Yes, I can confirm that a stand-alone build using your instructions works. I do not have enough experience with the OpenWRT build system, but what I can see is that at some point it tries to run staging_dir/host/bin/m4
And that’s why I had the impression that the legacy version of m4 is used. I might be wrong, and maybe it just needs that executable, since when tools/m4 is present, tools/coreutils builds and works. If the m4 directory from the tarball is used, then everything’s fine, we can keep tools/m4 as it is.
staging_dir/host/bin/m4 is compiled from tools/m4 and does not exist without tools/m4, so the build fails if I remove it. If I create a link to /usr/local/Cellar/m4/1.4.19/bin/m4, that part of the build succeeds, but then we also need to apply the following patch:
https://github.com/openwrt/openwrt/blob/master/tools/coreutils/patches/001-m4.patch
Do you know if the patch is applied upstream in coreutils?

Thanks Paul!

Georgi Valkov
httpstorm.com
nano RTOS





Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 21 Mar 2023 18:22:01 GMT) Full text and rfc822 format available.

Message #265 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: George Valkov <gvalkov <at> gmail.com>, Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 21 Mar 2023 18:21:49 +0000
On 21/03/2023 18:08, George Valkov wrote:
> 
>> On 2023-03-21, at 7:32 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>
>> On 3/21/23 08:27, George Valkov wrote:
>>> However since the old m4-1.4.19 in under tools/m4, coreutils builds with the legacy version:
>>
>> If you're building from that tarball, coreutils doesn't need m4. I just now double-checked that, in an environment that didn't have m4. "./configure && make check" succeeded. So the m4 version should be irrelevant.
> 
> Yes, I can confirm that a stand-alone build using your instructions works. I do not have enough experience with the OpenWRT build system, but what I can see is that at some point it tries to run staging_dir/host/bin/m4
> And that’s why I had the impression that the legacy version of m4 is used. I might be wrong, and maybe it just needs that executable, since when tools/m4 is present, tools/coreutils builds and works. If the m4 directory from the tarball is used, then everything’s fine, we can keep tools/m4 as it is.
> staging_dir/host/bin/m4 is compiled from tools/m4 and does not exist without tools/m4, so the build fails if I remove it. If I create a link to /usr/local/Cellar/m4/1.4.19/bin/m4, that part of the build succeeds, but then we also need to apply the following patch:
> https://github.com/openwrt/openwrt/blob/master/tools/coreutils/patches/001-m4.patch
> Do you know if the patch is applied upstream in coreutils?

Yes if that m4 file is patched then m4 will need to run to regen stuff.
Though the existing m4 tool should be fine for that.
I would look at figuring out why that m4 tool is not available in your env.
Perhaps you need to define an M4 env variable or something.

The openwrt m4 patch originates from
https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=c5608227
and that pattern seems to have originated from:
https://github.com/openwrt/packages/issues/14120#issuecomment-749438742
For the patch not to be needed it would need to be applied to gnulib,
but I don't know how general it is.

cheers,
Pádraig




Information forwarded to bug-coreutils <at> gnu.org:
bug#61386; Package coreutils. (Tue, 21 Mar 2023 18:49:01 GMT) Full text and rfc822 format available.

Message #268 received at 61386 <at> debbugs.gnu.org (full text, mbox):

From: George Valkov <gvalkov <at> gmail.com>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: Paul Eggert <eggert <at> cs.ucla.edu>, 61386 <at> debbugs.gnu.org
Subject: Re: bug#61386: [PATCH] cp,mv,install: Disable sparse copy on macOS
Date: Tue, 21 Mar 2023 20:48:41 +0200
> On 2023-03-21, at 8:21 PM, Pádraig Brady <P <at> draigBrady.com> wrote:
> 
> On 21/03/2023 18:08, George Valkov wrote:
>>> On 2023-03-21, at 7:32 PM, Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>>> 
>>> On 3/21/23 08:27, George Valkov wrote:
>>>> However since the old m4-1.4.19 in under tools/m4, coreutils builds with the legacy version:
>>> 
>>> If you're building from that tarball, coreutils doesn't need m4. I just now double-checked that, in an environment that didn't have m4. "./configure && make check" succeeded. So the m4 version should be irrelevant.
>> Yes, I can confirm that a stand-alone build using your instructions works. I do not have enough experience with the OpenWRT build system, but what I can see is that at some point it tries to run staging_dir/host/bin/m4
>> And that’s why I had the impression that the legacy version of m4 is used. I might be wrong, and maybe it just needs that executable, since when tools/m4 is present, tools/coreutils builds and works. If the m4 directory from the tarball is used, then everything’s fine, we can keep tools/m4 as it is.
>> staging_dir/host/bin/m4 is compiled from tools/m4 and does not exist without tools/m4, so the build fails if I remove it. If I create a link to /usr/local/Cellar/m4/1.4.19/bin/m4, that part of the build succeeds, but then we also need to apply the following patch:
>> https://github.com/openwrt/openwrt/blob/master/tools/coreutils/patches/001-m4.patch
>> Do you know if the patch is applied upstream in coreutils?
> 
> Yes if that m4 file is patched then m4 will need to run to regen stuff.
> Though the existing m4 tool should be fine for that.
> I would look at figuring out why that m4 tool is not available in your env.
> Perhaps you need to define an M4 env variable or something.
> 
> The openwrt m4 patch originates from
> https://git.openwrt.org/?p=openwrt/openwrt.git;a=commitdiff;h=c5608227
> and that pattern seems to have originated from:
> https://github.com/openwrt/packages/issues/14120#issuecomment-749438742
> For the patch not to be needed it would need to be applied to gnulib,
> but I don't know how general it is.

Thanks, Pádraig!

The OpenWRT build system is designed to work on many platforms. To achieve this, some helper host tools are compiled and used later. These tools are installed in staging_dir/host/bin, and they are used during the build instead of whatever is installed on the system. On macOS for example the default version of m4 is old and does not support the --gnu switch, so it is not suitable for the build. Hence OpenWRT compiles a working version and uses it during the build.

Without the patch, the OpenWRT tools/coreutils build fails. Build log:
https://httpstorm.com/share/.openwrt/test/2023-02-06_coreutils-9.1/2023-03-21-openwrt-build-coreutils-9.2-no-patch.txt

With the patch it builds correctly. The target cross-platform build for coreutils does not use the patch and builds correctly. A regular build outside of OpenWRT also does not need the patch. This seems like improperly configured include search path or environment on OpenWRT side. So I guess it would be best if we can fix that and go without the patch.

At this point I believe that toos/m4 is only used to produce the executable, and so it does not require any changes since the m4 directory that comes with coreutils is used.


Cheers mate!

Georgi Valkov
httpstorm.com
nano RTOS





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 19 Apr 2023 11:24:22 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 6 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.