GNU bug report logs - #41687
regex search for indexed files

Previous Next

Package: grep;

Reported by: Peng Yu <pengyu.ut <at> gmail.com>

Date: Wed, 3 Jun 2020 14:28:02 UTC

Severity: normal

Tags: notabug

Done: Assaf Gordon <assafgordon <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 41687 in the body.
You can then email your comments to 41687 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#41687; Package grep. (Wed, 03 Jun 2020 14:28:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Peng Yu <pengyu.ut <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Wed, 03 Jun 2020 14:28:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Peng Yu <pengyu.ut <at> gmail.com>
To: bug-grep <bug-grep <at> gnu.org>
Subject: regex search for indexed files
Date: Wed, 3 Jun 2020 09:27:52 -0500
Hi,

grep can do regex search but it needs to scan each file. When the
number of files are large, it can be slow.

Is there an alternative tool that can do regex search in the indexed
files (including .docx .pdf and other commonly used file formats that
can be converted to text) so that the search can be fast?

I see this. But it is too old and doesn't support formats like pdf and docx.

https://github.com/google/codesearch

-- 
Regards,
Peng




Information forwarded to bug-grep <at> gnu.org:
bug#41687; Package grep. (Sun, 07 Jun 2020 04:46:02 GMT) Full text and rfc822 format available.

Message #8 received at 41687 <at> debbugs.gnu.org (full text, mbox):

From: Assaf Gordon <assafgordon <at> gmail.com>
To: Peng Yu <pengyu.ut <at> gmail.com>, 41687 <at> debbugs.gnu.org
Cc: control <at> debbugs.gnu.org
Subject: Re: bug#41687: regex search for indexed files
Date: Sat, 6 Jun 2020 22:45:03 -0600
tag 41687 notabug
close 41687
stop

Hello,

On 2020-06-03 8:27 a.m., Peng Yu wrote:
> grep can do regex search but it needs to scan each file. When the
> number of files are large, it can be slow.
> 
> Is there an alternative tool that can do regex search in the indexed
> files (including .docx .pdf and other commonly used file formats that
> can be converted to text) so that the search can be fast?

It seems you are mixing several questions together.

1. If you want "grep" to search only specific set of files,
use the "--include" or "--exclude" options.
Or better yet, use find+xargs+grep .

2. If you want to search in non-text files, use appropriate programs
that understand the file format (e.g. "pdfgrep")
or programs that can convert the custom format to text (e.g. "antiword" 
and "wv").

3. You've mentioned "indexed files" - if you're looking for a program
that scans files and indexes them, and then allows you to search the 
index, look for "Desktop search" programs, e.g. 
https://en.wikipedia.org/wiki/List_of_search_engines#Desktop_search_engines
https://en.wikipedia.org/wiki/Recoll
https://en.wikipedia.org/wiki/Tracker_(search_software)

---

Lastly,
For all of these topics, a simple internet search would have given you
the above results. PLEASE respect everyone's time by first doing 
searching for answers yourself, before posting questions on a public 
mailing list.

---

Since this is not a bug in grep, I'm marking this as "closed".

regards,
 - assaf







Added tag(s) notabug. Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 07 Jun 2020 04:46:02 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 41687 <at> debbugs.gnu.org and Peng Yu <pengyu.ut <at> gmail.com> Request was from Assaf Gordon <assafgordon <at> gmail.com> to control <at> debbugs.gnu.org. (Sun, 07 Jun 2020 04:46:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 05 Jul 2020 11:24:05 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 289 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.