GNU bug report logs - #27666
[grep on GPFS filesystem] SEEK_HOLE problem

Previous Next

Package: grep;

Reported by: Moyard John <John.Moyard <at> cnes.fr>

Date: Wed, 12 Jul 2017 11:58:02 UTC

Severity: normal

Done: Paul Eggert <eggert <at> cs.ucla.edu>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 27666 in the body.
You can then email your comments to 27666 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Wed, 12 Jul 2017 11:58:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Moyard John <John.Moyard <at> cnes.fr>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 12 Jul 2017 11:58:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Moyard John <John.Moyard <at> cnes.fr>
To: "bug-grep <at> gnu.org" <bug-grep <at> gnu.org>
Subject: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Wed, 12 Jul 2017 09:27:50 +0000
[Message part 1 (text/plain, inline)]
Hi,

I use GPFS file system and I have sometimes an issue using grep command.
When issue occurs with the following message "Binary file <myfile> matches"
But "<myfile>" is an ASCII one, not a binary file.
The problem seems to deals with lseek(SEEK_HOLE) command and a file not completely flushed after close.
It could take several seconds to save the entire file on the disk.

So could grep command have another way to determine if input file is binary/ASCII instead using lseek(SEEK_HOLE) ?

Best regards
john


[Message part 2 (text/html, inline)]

Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Wed, 12 Jul 2017 14:11:01 GMT) Full text and rfc822 format available.

Message #8 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Moyard John <John.Moyard <at> cnes.fr>, 27666 <at> debbugs.gnu.org
Subject: Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Wed, 12 Jul 2017 09:10:50 -0500
[Message part 1 (text/plain, inline)]
On 07/12/2017 04:27 AM, Moyard John wrote:
> Hi,
> 
> I use GPFS file system and I have sometimes an issue using grep command.
> When issue occurs with the following message "Binary file <myfile> matches"
> But "<myfile>" is an ASCII one, not a binary file.
> The problem seems to deals with lseek(SEEK_HOLE) command and a file not completely flushed after close.

If lseek(SEEK_HOLE) returns a mid-file offset when the file is first
created, but not later after the file has been synced, then that is a
bug in the filesystem which should be reported to the appropriate
filesystem/kernel folks.  SEEK_HOLE is only allowed to return a mid-file
offset if reading the file at that point in time would read NUL bytes,
and NUL bytes are indeed binary data.

> It could take several seconds to save the entire file on the disk.

Does running 'sync' prior to grep solve the problem?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Wed, 12 Jul 2017 14:23:01 GMT) Full text and rfc822 format available.

Message #11 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Moyard John <John.Moyard <at> cnes.fr>
To: Eric Blake <eblake <at> redhat.com>, "27666 <at> debbugs.gnu.org"
 <27666 <at> debbugs.gnu.org>
Subject: RE: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Wed, 12 Jul 2017 14:21:50 +0000
Hi,

This is the kind of answer obtained from the development file system team :
---
(close(2) manpage reference)
A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes.  It is not common for a filesystem to flush the buffers when the stream is closed.  If you need to be sure that the data is physically stored, use fsync(2).  (It will depend on the disk hardware at this point).
---

So running 'sync' prior to grep should solve the problem.
I don't try it yet.
Another solution found to use grep in this issue is to use activate ' --binary-files=text' grep option.

Best regards,
john



-----Message d'origine-----
De : Eric Blake [mailto:eblake <at> redhat.com] 
Envoyé : mercredi 12 juillet 2017 16:11
À : Moyard John; 27666 <at> debbugs.gnu.org
Objet : Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem

On 07/12/2017 04:27 AM, Moyard John wrote:
> Hi,
> 
> I use GPFS file system and I have sometimes an issue using grep command.
> When issue occurs with the following message "Binary file <myfile> matches"
> But "<myfile>" is an ASCII one, not a binary file.
> The problem seems to deals with lseek(SEEK_HOLE) command and a file not completely flushed after close.

If lseek(SEEK_HOLE) returns a mid-file offset when the file is first created, but not later after the file has been synced, then that is a bug in the filesystem which should be reported to the appropriate filesystem/kernel folks.  SEEK_HOLE is only allowed to return a mid-file offset if reading the file at that point in time would read NUL bytes, and NUL bytes are indeed binary data.

> It could take several seconds to save the entire file on the disk.

Does running 'sync' prior to grep solve the problem?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Thu, 13 Jul 2017 09:14:01 GMT) Full text and rfc822 format available.

Message #14 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Moyard John <John.Moyard <at> cnes.fr>
To: Eric Blake <eblake <at> redhat.com>, "27666 <at> debbugs.gnu.org"
 <27666 <at> debbugs.gnu.org>
Subject: RE: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Thu, 13 Jul 2017 09:13:37 +0000
Hi,

I forgot to precise that activate  '--binary-files=text'  grep option or integrate a synchronization step before a grep are not lasting solutions in all my shell scripts.
That's why I was asking about another way to identify a binary file instead using 'seek(SEEK_HOLE)' : do you think that it could possible?

Best regards
john



-----Message d'origine-----
De : Eric Blake [mailto:eblake <at> redhat.com] 
Envoyé : mercredi 12 juillet 2017 16:11
À : Moyard John; 27666 <at> debbugs.gnu.org
Objet : Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem

On 07/12/2017 04:27 AM, Moyard John wrote:
> Hi,
> 
> I use GPFS file system and I have sometimes an issue using grep command.
> When issue occurs with the following message "Binary file <myfile> matches"
> But "<myfile>" is an ASCII one, not a binary file.
> The problem seems to deals with lseek(SEEK_HOLE) command and a file not completely flushed after close.

If lseek(SEEK_HOLE) returns a mid-file offset when the file is first created, but not later after the file has been synced, then that is a bug in the filesystem which should be reported to the appropriate filesystem/kernel folks.  SEEK_HOLE is only allowed to return a mid-file offset if reading the file at that point in time would read NUL bytes, and NUL bytes are indeed binary data.

> It could take several seconds to save the entire file on the disk.

Does running 'sync' prior to grep solve the problem?

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Thu, 13 Jul 2017 20:44:01 GMT) Full text and rfc822 format available.

Message #17 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Moyard John <John.Moyard <at> cnes.fr>, Eric Blake <eblake <at> redhat.com>,
 "27666 <at> debbugs.gnu.org" <27666 <at> debbugs.gnu.org>
Subject: Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Thu, 13 Jul 2017 13:43:49 -0700
On 07/13/2017 02:13 AM, Moyard John wrote:
> That's why I was asking about another way to identify a binary file instead using 'seek(SEEK_HOLE)' : do you think that it could possible?

If there is a reasonable (i.e., cheap) way for grep to determine that 
SEEK_HOLE is buggy for the current file, I suppose grep could do that. 
Do you know of any such method?

Really, the bug here is in the file system, not in grep. Have you filed 
a bug with the GPFS maintainers?





Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Tue, 18 Jul 2017 11:24:01 GMT) Full text and rfc822 format available.

Message #20 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Moyard John <John.Moyard <at> cnes.fr>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Eric Blake <eblake <at> redhat.com>,
 "27666 <at> debbugs.gnu.org" <27666 <at> debbugs.gnu.org>
Subject: RE: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Tue, 18 Jul 2017 11:23:24 +0000
GPFS maintainers give me the answer including manpage close(2) : nothing will be done.
On the web, same problems has been identified  for ZFS or perhaps NFS v4 ? :
 https://utcc.utoronto.ca/~cks/space/blog/linux/GrepBinaryFileReason
https://github.com/zfsonlinux/zfs/issues/6050
https://lists.gnu.org/archive/html/bug-grep/2012-07/msg00022.html
I will not file a bug on each file system maintainers : I should obtain the same answer.
Or perhaps I will obtain an extract of manpage lseek(2), i.e. http://man7.org/linux/man-pages/man2/lseek.2.html :
     However, a filesystem is not obliged to report holes, so
    these operations are not a guaranteed mechanism for mapping the
    storage space actually allocated to a file
It's not a bug in file system.

So, is-it possible for you to modify something about the way to test binary file ?

john

-----Message d'origine-----
De : Paul Eggert [mailto:eggert <at> cs.ucla.edu] 
Envoyé : jeudi 13 juillet 2017 22:44
À : Moyard John; Eric Blake; 27666 <at> debbugs.gnu.org
Objet : Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem

On 07/13/2017 02:13 AM, Moyard John wrote:
> That's why I was asking about another way to identify a binary file instead using 'seek(SEEK_HOLE)' : do you think that it could possible?

If there is a reasonable (i.e., cheap) way for grep to determine that SEEK_HOLE is buggy for the current file, I suppose grep could do that. 
Do you know of any such method?

Really, the bug here is in the file system, not in grep. Have you filed a bug with the GPFS maintainers?


Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Tue, 18 Jul 2017 11:31:02 GMT) Full text and rfc822 format available.

Message #23 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Moyard John <John.Moyard <at> cnes.fr>, Paul Eggert <eggert <at> cs.ucla.edu>,
 "27666 <at> debbugs.gnu.org" <27666 <at> debbugs.gnu.org>
Subject: Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Tue, 18 Jul 2017 06:30:37 -0500
[Message part 1 (text/plain, inline)]
On 07/18/2017 06:23 AM, Moyard John wrote:
> GPFS maintainers give me the answer including manpage close(2) : nothing will be done.
> On the web, same problems has been identified  for ZFS or perhaps NFS v4 ? :
>  https://utcc.utoronto.ca/~cks/space/blog/linux/GrepBinaryFileReason
> https://github.com/zfsonlinux/zfs/issues/6050
> https://lists.gnu.org/archive/html/bug-grep/2012-07/msg00022.html
> I will not file a bug on each file system maintainers : I should obtain the same answer.
> Or perhaps I will obtain an extract of manpage lseek(2), i.e. http://man7.org/linux/man-pages/man2/lseek.2.html :
>      However, a filesystem is not obliged to report holes, so
>     these operations are not a guaranteed mechanism for mapping the
>     storage space actually allocated to a file
> It's not a bug in file system.

A file system is not obliged to report holes, but IS obliged to NOT
report holes if a read() on that range will not see zeroes.  I still
think GPFS has a bug.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Wed, 19 Jul 2017 10:48:01 GMT) Full text and rfc822 format available.

Message #26 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Moyard John <John.Moyard <at> cnes.fr>, Eric Blake <eblake <at> redhat.com>,
 "27666 <at> debbugs.gnu.org" <27666 <at> debbugs.gnu.org>
Subject: Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Wed, 19 Jul 2017 03:47:36 -0700
Moyard John wrote:
> GPFS maintainers give me the answer including manpage close(2) : nothing will be done.

Sorry, I don't follow (why is close(2) involved?).

Is your correspondence with the GPFS maintainers public? It sounds like they do 
not understand the issue.

Anyway, as Eric said, GPFS is clearly buggy. True, a file system is not obliged 
to report holes. But if it reports a hole, the hole must contain NUL bytes.

> On the web, same problems has been identified  for ZFS or perhaps NFS v4 ? :
>   https://utcc.utoronto.ca/~cks/space/blog/linux/GrepBinaryFileReason
> https://github.com/zfsonlinux/zfs/issues/6050

These URLs talk about a ZFS-on-Linux bug that has been fixed, apparently.  Good.

> https://lists.gnu.org/archive/html/bug-grep/2012-07/msg00022.html

This is the inverse issue, which doesn't cause the problem you mentioned.

> I will not file a bug on each file system maintainers : I should obtain the same answer.

I don't see why. Only GPFS has the problem, as far as we know. And this is 
probably just a communication problem with its developers.

> So, is-it possible for you to modify something about the way to test binary file ?

Programs other than 'grep' use SEEK_HOLE. Even if we changed 'grep' to stop 
using SEEK_HOLE, the other programs would still be broken on GPFS. Plus, 'grep' 
would likely be slower everywhere, just to work around the bug on GPFS.

Really, GPFS needs to be fixed. If GPFS can't support SEEK_HOLE correctly, it 
should simply have lseek with SEEK_HOLE go to end-of-file; that will work with 
'grep' (albeit more slowly), and is the documented way that SEEK_HOLE is 
supposed to work on file systems that cannot support SEEK_HOLE directly.




Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Thu, 20 Jul 2017 09:04:02 GMT) Full text and rfc822 format available.

Message #29 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Moyard John <John.Moyard <at> cnes.fr>
To: Paul Eggert <eggert <at> cs.ucla.edu>, Eric Blake <eblake <at> redhat.com>,
 "27666 <at> debbugs.gnu.org" <27666 <at> debbugs.gnu.org>
Subject: RE: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Thu, 20 Jul 2017 09:03:11 +0000
Thank your very much for your detailed answer.
"close(2)" is involved because a test case made to reproduce the problem use a cp, initially a fortran code to make a copy, follow by a grep.

I clearly understand your point of view about 
     reporting hole and NUL bytes
     GPFS incompatibility with others programs/commands that could use SEEK_HOLE
I try to take a quick look about this last point and don't find yet any system command using it.
Do you have an example of other command using SEEK_HOLE?

In POSIX point of view, lseek(2) manpage precise this :
SEEK_DATA and SEEK_HOLE are nonstandard extensions also present in Solaris, FreeBSD, and DragonFly BSD
They are proposed for inclusion in the next  POSIX  revision   (Issue 8)
Do you have any information about it?
Does compile 'grep' mechanism could avoid the use of SEEK_HOLE test ?
I just try to obtain a grep command with a default behavior in respect of POSIX standard.



-----Message d'origine-----
De : Paul Eggert [mailto:eggert <at> cs.ucla.edu] 
Envoyé : mercredi 19 juillet 2017 12:48
À : Moyard John; Eric Blake; 27666 <at> debbugs.gnu.org
Objet : Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem

Moyard John wrote:
> GPFS maintainers give me the answer including manpage close(2) : nothing will be done.

Sorry, I don't follow (why is close(2) involved?).

Is your correspondence with the GPFS maintainers public? It sounds like they do not understand the issue.

Anyway, as Eric said, GPFS is clearly buggy. True, a file system is not obliged to report holes. But if it reports a hole, the hole must contain NUL bytes.

> On the web, same problems has been identified  for ZFS or perhaps NFS v4 ? :
>   https://utcc.utoronto.ca/~cks/space/blog/linux/GrepBinaryFileReason
> https://github.com/zfsonlinux/zfs/issues/6050

These URLs talk about a ZFS-on-Linux bug that has been fixed, apparently.  Good.

> https://lists.gnu.org/archive/html/bug-grep/2012-07/msg00022.html

This is the inverse issue, which doesn't cause the problem you mentioned.

> I will not file a bug on each file system maintainers : I should obtain the same answer.

I don't see why. Only GPFS has the problem, as far as we know. And this is probably just a communication problem with its developers.

> So, is-it possible for you to modify something about the way to test binary file ?

Programs other than 'grep' use SEEK_HOLE. Even if we changed 'grep' to stop using SEEK_HOLE, the other programs would still be broken on GPFS. Plus, 'grep' 
would likely be slower everywhere, just to work around the bug on GPFS.

Really, GPFS needs to be fixed. If GPFS can't support SEEK_HOLE correctly, it should simply have lseek with SEEK_HOLE go to end-of-file; that will work with 'grep' (albeit more slowly), and is the documented way that SEEK_HOLE is supposed to work on file systems that cannot support SEEK_HOLE directly.

Information forwarded to bug-grep <at> gnu.org:
bug#27666; Package grep. (Thu, 20 Jul 2017 12:47:01 GMT) Full text and rfc822 format available.

Message #32 received at 27666 <at> debbugs.gnu.org (full text, mbox):

From: Eric Blake <eblake <at> redhat.com>
To: Moyard John <John.Moyard <at> cnes.fr>, Paul Eggert <eggert <at> cs.ucla.edu>,
 "27666 <at> debbugs.gnu.org" <27666 <at> debbugs.gnu.org>
Subject: Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Thu, 20 Jul 2017 07:46:26 -0500
[Message part 1 (text/plain, inline)]
On 07/20/2017 04:03 AM, Moyard John wrote:
> Thank your very much for your detailed answer.
> "close(2)" is involved because a test case made to reproduce the problem use a cp, initially a fortran code to make a copy, follow by a grep.
> 
> I clearly understand your point of view about 
>      reporting hole and NUL bytes
>      GPFS incompatibility with others programs/commands that could use SEEK_HOLE
> I try to take a quick look about this last point and don't find yet any system command using it.
> Do you have an example of other command using SEEK_HOLE?

More and more commands are starting to make optimizations based on
SEEK_HOLE.  cp, tar, diff, grep, etc.  Programs like qemu-img REQUIRE a
working SEEK_HOLE for efficiently managing sparse virtual machine disk
images.

> 
> In POSIX point of view, lseek(2) manpage precise this :
> SEEK_DATA and SEEK_HOLE are nonstandard extensions also present in Solaris, FreeBSD, and DragonFly BSD
> They are proposed for inclusion in the next  POSIX  revision   (Issue 8)
> Do you have any information about it?

Here's the proposed POSIX wording:
http://austingroupbugs.net/view.php?id=415

Requiring close() to occur before SEEK_HOLE is accurate is a bug in GPFS
(if any other process can read() non-zero data but lseek(SEEK_HOLE)
still claims that section of the file is a hole, then the file system is
buggy, per the wording POSIX will be adding).


> Does compile 'grep' mechanism could avoid the use of SEEK_HOLE test ?

No. Avoiding a buggy SEEK_HOLE in grep won't fix all the other programs
(like cp, tar, diff) that are also negatively impacted by the buggy
SEEK_HOLE.  Fix the GPFS bug, and then all of the user-space apps will
no longer be impacted by the bug.

[By the way, top-posting is frowned on for technical lists].  I agree
with Paul's conclusion:

> Really, GPFS needs to be fixed. If GPFS can't support SEEK_HOLE correctly, it should simply have lseek with SEEK_HOLE go to end-of-file; that will work with 'grep' (albeit more slowly), and is the documented way that SEEK_HOLE is supposed to work on file systems that cannot support SEEK_HOLE directly.
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[signature.asc (application/pgp-signature, attachment)]

Reply sent to Paul Eggert <eggert <at> cs.ucla.edu>:
You have taken responsibility. (Tue, 31 Dec 2019 19:16:02 GMT) Full text and rfc822 format available.

Notification sent to Moyard John <John.Moyard <at> cnes.fr>:
bug acknowledged by developer. (Tue, 31 Dec 2019 19:16:02 GMT) Full text and rfc822 format available.

Message #37 received at 27666-done <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Moyard John <John.Moyard <at> cnes.fr>
Cc: 27666-done <at> debbugs.gnu.org, Eric Blake <eblake <at> redhat.com>
Subject: Re: bug#27666: [grep on GPFS filesystem] SEEK_HOLE problem
Date: Tue, 31 Dec 2019 11:15:30 -0800
The GPFS SEEK_HOLE bug appears to have been fixed by IBM a few years ago, as
reported here:

https://www.spectrumscale.org/pipermail/gpfsug-discuss/2018-February/004595.html
https://www.spectrumscale.org/pipermail/gpfsug-discuss/2018-February/004596.html
https://www-01.ibm.com/support/docview.wss?uid=isg1IV87385

so I am closing the grep bug report.




bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 29 Jan 2020 12:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 89 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.