GNU bug report logs - #28506
coreutils 8.28 test suite hangs on APFS filesystem

Previous Next

Package: coreutils;

Reported by: Jack Howarth <howarth.mailing.lists <at> gmail.com>

Date: Mon, 18 Sep 2017 20:19:01 UTC

Severity: normal

Tags: fixed

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 28506 in the body.
You can then email your comments to 28506 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Mon, 18 Sep 2017 20:19:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Jack Howarth <howarth.mailing.lists <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-coreutils <at> gnu.org. (Mon, 18 Sep 2017 20:19:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Jack Howarth <howarth.mailing.lists <at> gmail.com>
To: bug-coreutils <at> gnu.org
Subject: coreutils 8.28 test suite hangs on APFS filesystem
Date: Mon, 18 Sep 2017 16:18:37 -0400
[Message part 1 (text/plain, inline)]
The coreutils 8.28 release, when built on macOS 10.13 under the new APFS
filesystem, produces a hang during the test suite run. The hang appears to
occur in the execution of coreutils-8.28/tests/split/filter.sh at..

+ yes
+ head -n200K
+ split -b1G '--filter=head -c1 >/dev/null'
+ for mode in ''\'''\''' ''\''r/'\'''
+ FILE = -

according to the filter.log generated from executing the section of
split/filter.sh containing...

yes | head -n200K | split -b1G --filter='head -c1 >/dev/null' || fail=1

# Ensure that "endless" input is ignored when all filters finish
for mode in '' 'r/'; do
  FILE = '-'
  if test "$mode" = ''; then
    FILE = 'zero.in'
    truncate -s10T "$FILE" || continue
  fi
  for N in 1 2; do
    rm -f x??.n || framework_failure_
    timeout 10 sh -c \
      "yes | split --filter='head -c1 >\$FILE.n' -n $mode$N $FILE" || fail=1
    # Also ensure we get appropriate output from each filter
    seq 1 $N | tr '0-9' 1 > stat.exp
    stat -c%s x??.n > stat.out || framework_failure_
    compare stat.exp stat.out || fail=1
  done
done

I haven't opened a radar report yet as the Apple engineers can't look
directly at the source code for coreutils due to the GPLv3 licensing and
the test suite seems to be tangled up with the makefiles making it
impossible to extract a stand-alone test case reproducer to attach to a
radar bug report.
      Jack
ps Again, the hang seems to occur at the tail end of the log after it
emits...

+ FILE = -

Any suggestions on how reduce this to a simpler test case? I would note
that the new APFS filesystem produces a failure in the python test suite...

https://bugs.python.org/issue31380

which is due to APFS not allowing files to be created with filenames that
contain unassigned codepoints in the Unicode 9.0 standard, whereas HFS+
does. So perhaps the coreutils hang might be a similar issue?
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Mon, 18 Sep 2017 21:09:02 GMT) Full text and rfc822 format available.

Message #8 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Jack Howarth <howarth.mailing.lists <at> gmail.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Mon, 18 Sep 2017 14:08:32 -0700
On Mon, Sep 18, 2017 at 1:18 PM, Jack Howarth
<howarth.mailing.lists <at> gmail.com> wrote:
> The coreutils 8.28 release, when built on macOS 10.13 under the new APFS
> filesystem, produces a hang during the test suite run. The hang appears to
> occur in the execution of coreutils-8.28/tests/split/filter.sh at..
>
> + yes
> + head -n200K
> + split -b1G '--filter=head -c1 >/dev/null'
> + for mode in ''\'''\''' ''\''r/'\'''
> + FILE = -
>
> according to the filter.log generated from executing the section of
> split/filter.sh containing...
>
> yes | head -n200K | split -b1G --filter='head -c1 >/dev/null' || fail=1
>
> # Ensure that "endless" input is ignored when all filters finish
> for mode in '' 'r/'; do
>   FILE = '-'
>   if test "$mode" = ''; then
>     FILE = 'zero.in'
>     truncate -s10T "$FILE" || continue
>   fi
>   for N in 1 2; do
>     rm -f x??.n || framework_failure_
>     timeout 10 sh -c \
>       "yes | split --filter='head -c1 >\$FILE.n' -n $mode$N $FILE" || fail=1
>     # Also ensure we get appropriate output from each filter
>     seq 1 $N | tr '0-9' 1 > stat.exp
>     stat -c%s x??.n > stat.out || framework_failure_
>     compare stat.exp stat.out || fail=1
>   done
> done
>
> I haven't opened a radar report yet as the Apple engineers can't look
> directly at the source code for coreutils due to the GPLv3 licensing and
> the test suite seems to be tangled up with the makefiles making it
> impossible to extract a stand-alone test case reproducer to attach to a
> radar bug report.
>       Jack
> ps Again, the hang seems to occur at the tail end of the log after it
> emits...
>
> + FILE = -
>
> Any suggestions on how reduce this to a simpler test case? I would note
> that the new APFS filesystem produces a failure in the python test suite...
>
> https://bugs.python.org/issue31380
>
> which is due to APFS not allowing files to be created with filenames that
> contain unassigned codepoints in the Unicode 9.0 standard, whereas HFS+
> does. So perhaps the coreutils hang might be a similar issue?

Thank you for the testing and for the report.

Is there any chance your failing test was via a python2 framework? I'm
asking (on Pádraig's behalf) because there is a known problem whereby
SIGPIPE is mishandled in that case, and that might explain this
failure, since the data-generation phase relies on SIGPIPE killing
this test's "yes" command.




Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Mon, 18 Sep 2017 23:27:01 GMT) Full text and rfc822 format available.

Message #11 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jack Howarth <howarth.mailing.lists <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Mon, 18 Sep 2017 19:26:20 -0400
[Message part 1 (text/plain, inline)]
On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net> wrote:

> On Mon, Sep 18, 2017 at 1:18 PM, Jack Howarth
> <howarth.mailing.lists <at> gmail.com> wrote:
> > The coreutils 8.28 release, when built on macOS 10.13 under the new APFS
> > filesystem, produces a hang during the test suite run. The hang appears
> to
> > occur in the execution of coreutils-8.28/tests/split/filter.sh at..
> >
> > + yes
> > + head -n200K
> > + split -b1G '--filter=head -c1 >/dev/null'
> > + for mode in ''\'''\''' ''\''r/'\'''
> > + FILE = -
> >
> > according to the filter.log generated from executing the section of
> > split/filter.sh containing...
> >
> > yes | head -n200K | split -b1G --filter='head -c1 >/dev/null' || fail=1
> >
> > # Ensure that "endless" input is ignored when all filters finish
> > for mode in '' 'r/'; do
> >   FILE = '-'
> >   if test "$mode" = ''; then
> >     FILE = 'zero.in'
> >     truncate -s10T "$FILE" || continue
> >   fi
> >   for N in 1 2; do
> >     rm -f x??.n || framework_failure_
> >     timeout 10 sh -c \
> >       "yes | split --filter='head -c1 >\$FILE.n' -n $mode$N $FILE" ||
> fail=1
> >     # Also ensure we get appropriate output from each filter
> >     seq 1 $N | tr '0-9' 1 > stat.exp
> >     stat -c%s x??.n > stat.out || framework_failure_
> >     compare stat.exp stat.out || fail=1
> >   done
> > done
> >
> > I haven't opened a radar report yet as the Apple engineers can't look
> > directly at the source code for coreutils due to the GPLv3 licensing and
> > the test suite seems to be tangled up with the makefiles making it
> > impossible to extract a stand-alone test case reproducer to attach to a
> > radar bug report.
> >       Jack
> > ps Again, the hang seems to occur at the tail end of the log after it
> > emits...
> >
> > + FILE = -
> >
> > Any suggestions on how reduce this to a simpler test case? I would note
> > that the new APFS filesystem produces a failure in the python test
> suite...
> >
> > https://bugs.python.org/issue31380
> >
> > which is due to APFS not allowing files to be created with filenames that
> > contain unassigned codepoints in the Unicode 9.0 standard, whereas HFS+
> > does. So perhaps the coreutils hang might be a similar issue?
>
> Thank you for the testing and for the report.
>
> Is there any chance your failing test was via a python2 framework? I'm
> asking (on Pádraig's behalf) because there is a known problem whereby
> SIGPIPE is mishandled in that case, and that might explain this
> failure, since the data-generation phase relies on SIGPIPE killing
> this test's "yes" command.
>

I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
formatted volume.
              Jack
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Mon, 18 Sep 2017 23:42:02 GMT) Full text and rfc822 format available.

Message #14 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Jack Howarth <howarth.mailing.lists <at> gmail.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Mon, 18 Sep 2017 16:40:50 -0700
On Mon, Sep 18, 2017 at 4:26 PM, Jack Howarth
<howarth.mailing.lists <at> gmail.com> wrote:
> On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net> wrote:
...
>> Is there any chance your failing test was via a python2 framework? I'm
>> asking (on Pádraig's behalf) because there is a known problem whereby
>> SIGPIPE is mishandled in that case, and that might explain this
>> failure, since the data-generation phase relies on SIGPIPE killing
>> this test's "yes" command.
>
> I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
> formatted volume.

How did you run the tests?




Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Tue, 19 Sep 2017 01:08:01 GMT) Full text and rfc822 format available.

Message #17 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jack Howarth <howarth.mailing.lists <at> gmail.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Mon, 18 Sep 2017 21:07:32 -0400
[Message part 1 (text/plain, inline)]
On Mon, Sep 18, 2017 at 7:40 PM, Jim Meyering <jim <at> meyering.net> wrote:

> On Mon, Sep 18, 2017 at 4:26 PM, Jack Howarth
> <howarth.mailing.lists <at> gmail.com> wrote:
> > On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net> wrote:
> ...
> >> Is there any chance your failing test was via a python2 framework? I'm
> >> asking (on Pádraig's behalf) because there is a known problem whereby
> >> SIGPIPE is mishandled in that case, and that might explain this
> >> failure, since the data-generation phase relies on SIGPIPE killing
> >> this test's "yes" command.
> >
> > I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
> > formatted volume.
>
> How did you run the tests?
>

Actually, I forgot to mention that the coreutils test suite hang only
occurred on the APFS volumes when the coreutils built against the gettext
and libiconv from fink. A build outside of fink which didn't build against
those packages didn't show the hang in the coreutils test suite. The fink
gettext and libiconv packages that I am using are those from...

https://sourceforge.net/p/fink/package-submissions/4955/

and

https://sourceforge.net/p/fink/package-submissions/5004/

which are both patched for the format string strictness in High Sierra. I
found that using --disable-nls in configuring coreutils was insufficient to
suppress the test suite hang which I assume is due to the presence of...

#define HAVE_LIBINTL_H 1

in the generated ./lib/config.h

despite the presence of...

/* #undef HAVE_DCGETTEXT */
/* #undef HAVE_GETTEXT */

when --disable-nls is used so it still could be a Unicode related change in
APFS, no?
      Jack
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Thu, 21 Sep 2017 05:21:03 GMT) Full text and rfc822 format available.

Message #20 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jack Howarth <howarth.mailing.lists <at> gmail.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Wed, 20 Sep 2017 22:20:06 -0700
[Message part 1 (text/plain, inline)]
On 18/09/17 18:07, Jack Howarth wrote:
> On Mon, Sep 18, 2017 at 7:40 PM, Jim Meyering <jim <at> meyering.net> wrote:
> 
>> On Mon, Sep 18, 2017 at 4:26 PM, Jack Howarth
>> <howarth.mailing.lists <at> gmail.com> wrote:
>>> On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net> wrote:
>> ...
>>>> Is there any chance your failing test was via a python2 framework? I'm
>>>> asking (on Pádraig's behalf) because there is a known problem whereby
>>>> SIGPIPE is mishandled in that case, and that might explain this
>>>> failure, since the data-generation phase relies on SIGPIPE killing
>>>> this test's "yes" command.
>>>
>>> I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
>>> formatted volume.
>>
>> How did you run the tests?
>>
> 
> Actually, I forgot to mention that the coreutils test suite hang only
> occurred on the APFS volumes when the coreutils built against the gettext
> and libiconv from fink. A build outside of fink which didn't build against
> those packages didn't show the hang in the coreutils test suite. The fink
> gettext and libiconv packages that I am using are those from...
> 
> https://sourceforge.net/p/fink/package-submissions/4955/
> 
> and
> 
> https://sourceforge.net/p/fink/package-submissions/5004/
> 
> which are both patched for the format string strictness in High Sierra. I
> found that using --disable-nls in configuring coreutils was insufficient to
> suppress the test suite hang which I assume is due to the presence of...
> 
> #define HAVE_LIBINTL_H 1
> 
> in the generated ./lib/config.h
> 
> despite the presence of...
> 
> /* #undef HAVE_DCGETTEXT */
> /* #undef HAVE_GETTEXT */
> 
> when --disable-nls is used so it still could be a Unicode related change in
> APFS, no?
>       Jack

The libintl bit reminded me of https://lists.gnu.org/archive/html/bug-gnulib/2014-10/msg00014.html
I.E. on OSX enabling those libs creates implicit threads I think.
Perhaps that's messing with SIGPIPE handling and only the implicit
thread gets it, thus not killing the main yes(1) thread.
However the yes(1) is also protected with a timeout(1) call.
Perhaps timeout(1) is a silent noop. We should support OSX through DYLD_INSERT_LIBRARIES,
but perhaps there is something preventing that on your system?
But then would the timeout tests fail. Could you check the timeout tests with:

  make SUBDIRS=. TESTS=tests/misc/filter.sh check

In any case we should protect calls to timeout(1) to ensure it's supported.
The attached does that at least.

cheers,
Pádraig.
[require_timeout.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Thu, 21 Sep 2017 06:03:02 GMT) Full text and rfc822 format available.

Message #23 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 28506 <at> debbugs.gnu.org, Jack Howarth <howarth.mailing.lists <at> gmail.com>
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Wed, 20 Sep 2017 23:02:07 -0700
On Wed, Sep 20, 2017 at 10:20 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
> On 18/09/17 18:07, Jack Howarth wrote:
>> On Mon, Sep 18, 2017 at 7:40 PM, Jim Meyering <jim <at> meyering.net> wrote:
>>
>>> On Mon, Sep 18, 2017 at 4:26 PM, Jack Howarth
>>> <howarth.mailing.lists <at> gmail.com> wrote:
>>>> On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net> wrote:
>>> ...
>>>>> Is there any chance your failing test was via a python2 framework? I'm
>>>>> asking (on Pádraig's behalf) because there is a known problem whereby
>>>>> SIGPIPE is mishandled in that case, and that might explain this
>>>>> failure, since the data-generation phase relies on SIGPIPE killing
>>>>> this test's "yes" command.
>>>>
>>>> I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
>>>> formatted volume.
>>>
>>> How did you run the tests?
>>>
>>
>> Actually, I forgot to mention that the coreutils test suite hang only
>> occurred on the APFS volumes when the coreutils built against the gettext
>> and libiconv from fink. A build outside of fink which didn't build against
>> those packages didn't show the hang in the coreutils test suite. The fink
>> gettext and libiconv packages that I am using are those from...
>>
>> https://sourceforge.net/p/fink/package-submissions/4955/
>>
>> and
>>
>> https://sourceforge.net/p/fink/package-submissions/5004/
>>
>> which are both patched for the format string strictness in High Sierra. I
>> found that using --disable-nls in configuring coreutils was insufficient to
>> suppress the test suite hang which I assume is due to the presence of...
>>
>> #define HAVE_LIBINTL_H 1
>>
>> in the generated ./lib/config.h
>>
>> despite the presence of...
>>
>> /* #undef HAVE_DCGETTEXT */
>> /* #undef HAVE_GETTEXT */
>>
>> when --disable-nls is used so it still could be a Unicode related change in
>> APFS, no?
>>       Jack
>
> The libintl bit reminded me of https://lists.gnu.org/archive/html/bug-gnulib/2014-10/msg00014.html
> I.E. on OSX enabling those libs creates implicit threads I think.
> Perhaps that's messing with SIGPIPE handling and only the implicit
> thread gets it, thus not killing the main yes(1) thread.
> However the yes(1) is also protected with a timeout(1) call.
> Perhaps timeout(1) is a silent noop. We should support OSX through DYLD_INSERT_LIBRARIES,
> but perhaps there is something preventing that on your system?
> But then would the timeout tests fail. Could you check the timeout tests with:
>
>   make SUBDIRS=. TESTS=tests/misc/filter.sh check
>
> In any case we should protect calls to timeout(1) to ensure it's supported.
> The attached does that at least.

Good idea.
Do you think there should be a syntax-check rule to ensure that any
timeout-using test first calls require_timeout_? This makes me wonder
if we should make timeout a function that does that job (the first
time only), and then exec's the real timeout command.




Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Fri, 22 Sep 2017 00:24:01 GMT) Full text and rfc822 format available.

Message #26 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jack Howarth <howarth.mailing.lists <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Thu, 21 Sep 2017 20:23:11 -0400
[Message part 1 (text/plain, inline)]
On Thu, Sep 21, 2017 at 1:20 AM, Pádraig Brady <P <at> draigbrady.com> wrote:

> On 18/09/17 18:07, Jack Howarth wrote:
> > On Mon, Sep 18, 2017 at 7:40 PM, Jim Meyering <jim <at> meyering.net> wrote:
> >
> >> On Mon, Sep 18, 2017 at 4:26 PM, Jack Howarth
> >> <howarth.mailing.lists <at> gmail.com> wrote:
> >>> On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net>
> wrote:
> >> ...
> >>>> Is there any chance your failing test was via a python2 framework? I'm
> >>>> asking (on Pádraig's behalf) because there is a known problem whereby
> >>>> SIGPIPE is mishandled in that case, and that might explain this
> >>>> failure, since the data-generation phase relies on SIGPIPE killing
> >>>> this test's "yes" command.
> >>>
> >>> I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
> >>> formatted volume.
> >>
> >> How did you run the tests?
> >>
> >
> > Actually, I forgot to mention that the coreutils test suite hang only
> > occurred on the APFS volumes when the coreutils built against the gettext
> > and libiconv from fink. A build outside of fink which didn't build
> against
> > those packages didn't show the hang in the coreutils test suite. The fink
> > gettext and libiconv packages that I am using are those from...
> >
> > https://sourceforge.net/p/fink/package-submissions/4955/
> >
> > and
> >
> > https://sourceforge.net/p/fink/package-submissions/5004/
> >
> > which are both patched for the format string strictness in High Sierra. I
> > found that using --disable-nls in configuring coreutils was insufficient
> to
> > suppress the test suite hang which I assume is due to the presence of...
> >
> > #define HAVE_LIBINTL_H 1
> >
> > in the generated ./lib/config.h
> >
> > despite the presence of...
> >
> > /* #undef HAVE_DCGETTEXT */
> > /* #undef HAVE_GETTEXT */
> >
> > when --disable-nls is used so it still could be a Unicode related change
> in
> > APFS, no?
> >       Jack
>
> The libintl bit reminded me of https://lists.gnu.org/archive/
> html/bug-gnulib/2014-10/msg00014.html
> I.E. on OSX enabling those libs creates implicit threads I think.
> Perhaps that's messing with SIGPIPE handling and only the implicit
> thread gets it, thus not killing the main yes(1) thread.
> However the yes(1) is also protected with a timeout(1) call.
> Perhaps timeout(1) is a silent noop. We should support OSX through
> DYLD_INSERT_LIBRARIES,
> but perhaps there is something preventing that on your system?
> But then would the timeout tests fail. Could you check the timeout tests
> with:
>
>   make SUBDIRS=. TESTS=tests/misc/filter.sh check
>
> In any case we should protect calls to timeout(1) to ensure it's supported.
> The attached does that at least.
>
> cheers,
> Pádraig.
>

Pádraig,
     The hang on APFS volumes doesn't seem to be related to CoreFoundation
threading. If I repeat the steps that I used to track down a similar issue
in make 4.0/4.1 by rebuilding libiconv with --disable-nls and coreutils
with the same --disable-nls so that neither are linked against
CoreFoundation, the test suite hang still occurs. Also, for the stock
build, adding your proposed timeout changes doesn't eliminate the hang in
the test suite either.
            Jack
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Sat, 23 Sep 2017 03:05:02 GMT) Full text and rfc822 format available.

Message #29 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jim Meyering <jim <at> meyering.net>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Fri, 22 Sep 2017 20:04:02 -0700
On 20/09/17 23:02, Jim Meyering wrote:
> On Wed, Sep 20, 2017 at 10:20 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
>> On 18/09/17 18:07, Jack Howarth wrote:
>>> On Mon, Sep 18, 2017 at 7:40 PM, Jim Meyering <jim <at> meyering.net> wrote:
>>>
>>>> On Mon, Sep 18, 2017 at 4:26 PM, Jack Howarth
>>>> <howarth.mailing.lists <at> gmail.com> wrote:
>>>>> On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net> wrote:
>>>> ...
>>>>>> Is there any chance your failing test was via a python2 framework? I'm
>>>>>> asking (on Pádraig's behalf) because there is a known problem whereby
>>>>>> SIGPIPE is mishandled in that case, and that might explain this
>>>>>> failure, since the data-generation phase relies on SIGPIPE killing
>>>>>> this test's "yes" command.
>>>>>
>>>>> I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
>>>>> formatted volume.
>>>>
>>>> How did you run the tests?
>>>>
>>>
>>> Actually, I forgot to mention that the coreutils test suite hang only
>>> occurred on the APFS volumes when the coreutils built against the gettext
>>> and libiconv from fink. A build outside of fink which didn't build against
>>> those packages didn't show the hang in the coreutils test suite. The fink
>>> gettext and libiconv packages that I am using are those from...
>>>
>>> https://sourceforge.net/p/fink/package-submissions/4955/
>>>
>>> and
>>>
>>> https://sourceforge.net/p/fink/package-submissions/5004/
>>>
>>> which are both patched for the format string strictness in High Sierra. I
>>> found that using --disable-nls in configuring coreutils was insufficient to
>>> suppress the test suite hang which I assume is due to the presence of...
>>>
>>> #define HAVE_LIBINTL_H 1
>>>
>>> in the generated ./lib/config.h
>>>
>>> despite the presence of...
>>>
>>> /* #undef HAVE_DCGETTEXT */
>>> /* #undef HAVE_GETTEXT */
>>>
>>> when --disable-nls is used so it still could be a Unicode related change in
>>> APFS, no?
>>>       Jack
>>
>> The libintl bit reminded me of https://lists.gnu.org/archive/html/bug-gnulib/2014-10/msg00014.html
>> I.E. on OSX enabling those libs creates implicit threads I think.
>> Perhaps that's messing with SIGPIPE handling and only the implicit
>> thread gets it, thus not killing the main yes(1) thread.
>> However the yes(1) is also protected with a timeout(1) call.
>> Perhaps timeout(1) is a silent noop. We should support OSX through DYLD_INSERT_LIBRARIES,
>> but perhaps there is something preventing that on your system?
>> But then would the timeout tests fail. Could you check the timeout tests with:
>>
>>   make SUBDIRS=. TESTS=tests/misc/filter.sh check
>>
>> In any case we should protect calls to timeout(1) to ensure it's supported.
>> The attached does that at least.
> 
> Good idea.
> Do you think there should be a syntax-check rule to ensure that any
> timeout-using test first calls require_timeout_? This makes me wonder
> if we should make timeout a function that does that job (the first
> time only), and then exec's the real timeout command.

Yes that would be better.
Also functions for sleep, printf etc. would be useful in
avoiding the need for explicit env and giving greater test coverage.
I'll work on that





Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Sat, 23 Sep 2017 03:08:01 GMT) Full text and rfc822 format available.

Message #32 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jack Howarth <howarth.mailing.lists <at> gmail.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Fri, 22 Sep 2017 20:07:10 -0700
On 21/09/17 17:23, Jack Howarth wrote:
> On Thu, Sep 21, 2017 at 1:20 AM, Pádraig Brady <P <at> draigbrady.com> wrote:
> 
>> On 18/09/17 18:07, Jack Howarth wrote:
>>> On Mon, Sep 18, 2017 at 7:40 PM, Jim Meyering <jim <at> meyering.net> wrote:
>>>
>>>> On Mon, Sep 18, 2017 at 4:26 PM, Jack Howarth
>>>> <howarth.mailing.lists <at> gmail.com> wrote:
>>>>> On Mon, Sep 18, 2017 at 5:08 PM, Jim Meyering <jim <at> meyering.net>
>> wrote:
>>>> ...
>>>>>> Is there any chance your failing test was via a python2 framework? I'm
>>>>>> asking (on Pádraig's behalf) because there is a known problem whereby
>>>>>> SIGPIPE is mishandled in that case, and that might explain this
>>>>>> failure, since the data-generation phase relies on SIGPIPE killing
>>>>>> this test's "yes" command.
>>>>>
>>>>> I doubt it as the hang doesn't happen under 10.13 when run on a JHFS
>>>>> formatted volume.
>>>>
>>>> How did you run the tests?
>>>>
>>>
>>> Actually, I forgot to mention that the coreutils test suite hang only
>>> occurred on the APFS volumes when the coreutils built against the gettext
>>> and libiconv from fink. A build outside of fink which didn't build
>> against
>>> those packages didn't show the hang in the coreutils test suite. The fink
>>> gettext and libiconv packages that I am using are those from...
>>>
>>> https://sourceforge.net/p/fink/package-submissions/4955/
>>>
>>> and
>>>
>>> https://sourceforge.net/p/fink/package-submissions/5004/
>>>
>>> which are both patched for the format string strictness in High Sierra. I
>>> found that using --disable-nls in configuring coreutils was insufficient
>> to
>>> suppress the test suite hang which I assume is due to the presence of...
>>>
>>> #define HAVE_LIBINTL_H 1
>>>
>>> in the generated ./lib/config.h
>>>
>>> despite the presence of...
>>>
>>> /* #undef HAVE_DCGETTEXT */
>>> /* #undef HAVE_GETTEXT */
>>>
>>> when --disable-nls is used so it still could be a Unicode related change
>> in
>>> APFS, no?
>>>       Jack
>>
>> The libintl bit reminded me of https://lists.gnu.org/archive/
>> html/bug-gnulib/2014-10/msg00014.html
>> I.E. on OSX enabling those libs creates implicit threads I think.
>> Perhaps that's messing with SIGPIPE handling and only the implicit
>> thread gets it, thus not killing the main yes(1) thread.
>> However the yes(1) is also protected with a timeout(1) call.
>> Perhaps timeout(1) is a silent noop. We should support OSX through
>> DYLD_INSERT_LIBRARIES,
>> but perhaps there is something preventing that on your system?
>> But then would the timeout tests fail. Could you check the timeout tests
>> with:
>>
>>   make SUBDIRS=. TESTS=tests/misc/filter.sh check
>>
>> In any case we should protect calls to timeout(1) to ensure it's supported.
>> The attached does that at least.
>>
>> cheers,
>> Pádraig.
>>
> 
> Pádraig,
>      The hang on APFS volumes doesn't seem to be related to CoreFoundation
> threading. If I repeat the steps that I used to track down a similar issue
> in make 4.0/4.1 by rebuilding libiconv with --disable-nls and coreutils
> with the same --disable-nls so that neither are linked against
> CoreFoundation, the test suite hang still occurs. Also, for the stock
> build, adding your proposed timeout changes doesn't eliminate the hang in
> the test suite either.

Is is a wait or a cpu spin?
Could you use the equivalent of strace on your platform to see what's happening?

thanks,
Pádraig





Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Sun, 24 Sep 2017 02:48:02 GMT) Full text and rfc822 format available.

Message #35 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Pádraig Brady <P <at> draigBrady.com>
To: Jack Howarth <howarth.mailing.lists <at> gmail.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Sat, 23 Sep 2017 19:47:28 -0700
[Message part 1 (text/plain, inline)]
On 22/09/17 20:07, Pádraig Brady wrote:
> Is is a wait or a cpu spin?
> Could you use the equivalent of strace on your platform to see what's happening?

Offlist Jack sent a profile showing /usr/bin/FILE was waiting on input.
That was the result of a silly typo in the script, which the attached
should fix.  I don't know what that command does, nor why it's specifically
a problem on APFS, but hopefully this fixes things.

cheers,
Pádraig.

[filter-test-hang-macos.patch (text/x-patch, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Sun, 24 Sep 2017 06:15:01 GMT) Full text and rfc822 format available.

Message #38 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Andreas Schwab <schwab <at> linux-m68k.org>
To: Pádraig Brady <P <at> draigBrady.com>
Cc: 28506 <at> debbugs.gnu.org, Jack Howarth <howarth.mailing.lists <at> gmail.com>
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Sun, 24 Sep 2017 08:14:11 +0200
On Sep 23 2017, Pádraig Brady <P <at> draigBrady.com> wrote:

> Offlist Jack sent a profile showing /usr/bin/FILE was waiting on input.
> That was the result of a silly typo in the script, which the attached
> should fix.  I don't know what that command does,

That's file(1) trying to analyze '-'.

> nor why it's specifically a problem on APFS,

Presumably APFS is case insensitive.

Andreas.

-- 
Andreas Schwab, schwab <at> linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."




Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Sun, 24 Sep 2017 17:17:02 GMT) Full text and rfc822 format available.

Message #41 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jack Howarth <howarth.mailing.lists <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Sun, 24 Sep 2017 13:16:13 -0400
[Message part 1 (text/plain, inline)]
On Sat, Sep 23, 2017 at 10:47 PM, Pádraig Brady <P <at> draigbrady.com> wrote:

> On 22/09/17 20:07, Pádraig Brady wrote:
> > Is is a wait or a cpu spin?
> > Could you use the equivalent of strace on your platform to see what's
> happening?
>
> Offlist Jack sent a profile showing /usr/bin/FILE was waiting on input.
> That was the result of a silly typo in the script, which the attached
> should fix.  I don't know what that command does, nor why it's specifically
> a problem on APFS, but hopefully this fixes things.
>
> cheers,
> Pádraig.
>
>
Pádraig.
    Thanks. I can confirm that eliminates testsuite hang seen on 10.13 with
APFS volumes. FYI, the stock APFS is still case-insensitive on darwin17.
        Jack
ps The only failure seen in the test suite is...

FAIL: tests/touch/trailing-slash.sh
[Message part 2 (text/html, inline)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Sun, 24 Sep 2017 17:35:02 GMT) Full text and rfc822 format available.

Message #44 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jack Howarth <howarth.mailing.lists <at> gmail.com>
To: Pádraig Brady <P <at> draigbrady.com>
Cc: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Sun, 24 Sep 2017 13:34:02 -0400
[Message part 1 (text/plain, inline)]
On Sun, Sep 24, 2017 at 1:16 PM, Jack Howarth <
howarth.mailing.lists <at> gmail.com> wrote:

>
>
> On Sat, Sep 23, 2017 at 10:47 PM, Pádraig Brady <P <at> draigbrady.com> wrote:
>
>> On 22/09/17 20:07, Pádraig Brady wrote:
>> > Is is a wait or a cpu spin?
>> > Could you use the equivalent of strace on your platform to see what's
>> happening?
>>
>> Offlist Jack sent a profile showing /usr/bin/FILE was waiting on input.
>> That was the result of a silly typo in the script, which the attached
>> should fix.  I don't know what that command does, nor why it's
>> specifically
>> a problem on APFS, but hopefully this fixes things.
>>
>> cheers,
>> Pádraig.
>>
>>
> Pádraig.
>     Thanks. I can confirm that eliminates testsuite hang seen on 10.13
> with APFS volumes. FYI, the stock APFS is still case-insensitive on
> darwin17.
>         Jack
> ps The only failure seen in the test suite is...
>
> FAIL: tests/touch/trailing-slash.sh
>
>
>
Pádraig,
    Attached are the tests/touch/trailing-slash.log and
tests/touch/trailing-slash.trs files generated from a build on an APFS
volume running 10.13 in case you can identify why that test is failing.
          Jack
[Message part 2 (text/html, inline)]
[trailing-slash.log (application/octet-stream, attachment)]
[trailing-slash.trs (application/octet-stream, attachment)]

Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Mon, 25 Sep 2017 17:16:01 GMT) Full text and rfc822 format available.

Message #47 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Jim Meyering <jim <at> meyering.net>
To: Jack Howarth <howarth.mailing.lists <at> gmail.com>
Cc: Pádraig Brady <P <at> draigbrady.com>, 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Mon, 25 Sep 2017 10:14:37 -0700
On Sun, Sep 24, 2017 at 10:34 AM, Jack Howarth
<howarth.mailing.lists <at> gmail.com> wrote:
> On Sun, Sep 24, 2017 at 1:16 PM, Jack Howarth <
...

>     Attached are the tests/touch/trailing-slash.log and
> tests/touch/trailing-slash.trs files generated from a build on an APFS
> volume running 10.13 in case you can identify why that test is failing.

That test is failing because your system allows "touch
symlink-to-file-specified-with-trailing-slash/" to succeed, e.g., here
is how it's supposed to work, but on your system touch (mistakenly)
succeeds:

$ : > k && ln -s k j && touch j/
touch: setting times of 'j/': Not a directory

When a non-directory name is specified with a trailing slash, many
interfaces are required by POSIX to fail with ENOTDIR. It looks like
one of those on your system goes ahead and performs the requested
operation as if that slash were not present.

We can probably teach gnulib to detect and work around this flaw.




Information forwarded to bug-coreutils <at> gnu.org:
bug#28506; Package coreutils. (Tue, 30 Oct 2018 01:17:01 GMT) Full text and rfc822 format available.

Message #50 received at 28506 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: 28506 <at> debbugs.gnu.org
Subject: Re: bug#28506: coreutils 8.28 test suite hangs on APFS filesystem
Date: Mon, 29 Oct 2018 19:16:35 -0600
tags 28506 fixed
close 28506
stop

(triaging old bugs)

On 2017-09-23 8:47 p.m., Pádraig Brady wrote:
> On 22/09/17 20:07, Pádraig Brady wrote:
>> Is is a wait or a cpu spin?
>> Could you use the equivalent of strace on your platform to see what's happening?
> 
> Offlist Jack sent a profile showing /usr/bin/FILE was waiting on input.
> That was the result of a silly typo in the script, which the attached
> should fix.  I don't know what that command does, nor why it's specifically
> a problem on APFS, but hopefully this fixes things.

Pushed here:
https://git.savannah.gnu.org/cgit/coreutils.git/commit/?id=63d2f05f5283c88f6c60ebe6de7a26ce6b9e4ee8

so closing as "fixed".

-assaf





Added tag(s) fixed. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 01:17:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 28506 <at> debbugs.gnu.org and Jack Howarth <howarth.mailing.lists <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Tue, 30 Oct 2018 01:17:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Tue, 27 Nov 2018 12:24:04 GMT) Full text and rfc822 format available.

This bug report was last modified 5 years and 122 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.