GNU bug report logs - #51727
add an optional flag to -P to disable JIT

Previous Next

Package: grep;

Reported by: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>

Date: Tue, 9 Nov 2021 19:05:02 UTC

Severity: wishlist

To reply to this bug, email your comments to 51727 AT debbugs.gnu.org.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-grep <at> gnu.org:
bug#51727; Package grep. (Tue, 09 Nov 2021 19:05:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Carlo Marcelo Arenas Belón <carenas <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-grep <at> gnu.org. (Tue, 09 Nov 2021 19:05:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>
To: bug-grep <at> gnu.org
Subject: add an optional flag to -P to disable JIT
Date: Tue, 9 Nov 2021 11:04:31 -0800
[Message part 1 (text/plain, inline)]
Severity: wishlist

There are times, when the expression is too simple or will not be used too
often to justify the extra time in -P that is required for JIT compilation.

Make it simpler for users to pass flags to the PCRE backend, and start
with a flag to disable JIT (enabled by default)
[0001-pcre-add-a-flag-to-disable-JIT.patch (text/x-patch, inline)]
From caeca5e806fe1b2e368833f05bb4cfb75763d1b3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Carlo=20Marcelo=20Arenas=20Bel=C3=B3n?= <carenas <at> gmail.com>
Date: Sat, 16 Oct 2021 01:38:11 -0700
Subject: [PATCH] pcre: add a flag to disable JIT
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Mainly useful for performance testing.

Signed-off-by: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>
---
 doc/grep.texi     |  4 +++-
 src/grep.c        | 13 +++++++++++--
 src/grep.h        |  1 +
 src/pcresearch.c  |  6 ++++--
 tests/Makefile.am |  1 +
 tests/pcre-nojit  | 22 ++++++++++++++++++++++
 6 files changed, 42 insertions(+), 5 deletions(-)
 create mode 100755 tests/pcre-nojit

diff --git a/doc/grep.texi b/doc/grep.texi
index e5b9fd8..8fb41ac 100644
--- a/doc/grep.texi
+++ b/doc/grep.texi
@@ -1138,7 +1138,7 @@ Interpret patterns as fixed strings, not regular expressions.
 (@option{-F} is specified by POSIX.)
 
 @item -P
-@itemx --perl-regexp
+@itemx --perl-regexp[=@var{FLAG}]
 @opindex -P
 @opindex --perl-regexp
 @cindex matching Perl-compatible regular expressions
@@ -1146,6 +1146,8 @@ Interpret patterns as Perl-compatible regular expressions (PCREs).
 PCRE support is here to stay, but consider this option experimental when
 combined with the @option{-z} (@option{--null-data}) option, and note that
 @samp{grep@ -P} may warn of unimplemented features.
+The optional flag 'no-jit' could be used to disable JIT, and only use the
+slower PCRE's interpreter.
 @xref{Other Options}.
 
 @end table
diff --git a/src/grep.c b/src/grep.c
index a55194c..44e21b7 100644
--- a/src/grep.c
+++ b/src/grep.c
@@ -508,7 +508,7 @@ static struct option const long_options[] =
   {"extended-regexp", no_argument, NULL, 'E'},
   {"fixed-regexp",    no_argument, NULL, 'F'},
   {"fixed-strings",   no_argument, NULL, 'F'},
-  {"perl-regexp",     no_argument, NULL, 'P'},
+  {"perl-regexp", optional_argument, NULL, 'P'},
   {"after-context", required_argument, NULL, 'A'},
   {"before-context", required_argument, NULL, 'B'},
   {"binary-files", required_argument, NULL, BINARY_FILES_OPTION},
@@ -563,6 +563,7 @@ bool match_icase;
 bool match_words;
 bool match_lines;
 char eolbyte;
+bool pcre_jit = true;
 
 /* For error messages. */
 /* The input file name, or (if standard input) null or a --label argument.  */
@@ -1987,7 +1988,8 @@ Pattern selection and interpretation:\n"), getprogname ());
   -E, --extended-regexp     PATTERNS are extended regular expressions\n\
   -F, --fixed-strings       PATTERNS are strings\n\
   -G, --basic-regexp        PATTERNS are basic regular expressions\n\
-  -P, --perl-regexp         PATTERNS are Perl regular expressions\n"));
+  -P, --perl-regexp[=FLAG]  PATTERNS are Perl regular expressions\n\
+                            FLAG is 'no-jit' (JIT enabled by default)\n"));
   /* -X is deliberately undocumented.  */
       printf (_("\
   -e, --regexp=PATTERNS     use PATTERNS for matching\n\
@@ -2545,6 +2547,13 @@ main (int argc, char **argv)
 
       case 'P':
         matcher = setmatcher ("perl", matcher);
+        if (optarg)
+          {
+            if (STREQ (optarg, "no-jit"))
+              pcre_jit = false;
+            else
+              die (EXIT_TROUBLE, 0, _("unknown PCRE flag"));
+          }
         break;
 
       case 'G':
diff --git a/src/grep.h b/src/grep.h
index 04c15dd..263e98c 100644
--- a/src/grep.h
+++ b/src/grep.h
@@ -29,6 +29,7 @@ extern bool match_icase;	/* -i */
 extern bool match_words;	/* -w */
 extern bool match_lines;	/* -x */
 extern char eolbyte;		/* -z */
+extern bool pcre_jit;		/* --perl-regexp=no-jit */
 
 extern char const *pattern_file_name (idx_t, idx_t *);
 
diff --git a/src/pcresearch.c b/src/pcresearch.c
index 09f92c8..988d753 100644
--- a/src/pcresearch.c
+++ b/src/pcresearch.c
@@ -180,7 +180,9 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
   if (!pc->cre)
     die (EXIT_TROUBLE, 0, "%s", ep);
 
-  int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED | PCRE_STUDY_JIT_COMPILE;
+  int pcre_study_flags = PCRE_STUDY_EXTRA_NEEDED;
+  if (pcre_jit)
+    pcre_study_flags |= PCRE_STUDY_JIT_COMPILE;
   pc->extra = pcre_study (pc->cre, pcre_study_flags, &ep);
   if (ep)
     die (EXIT_TROUBLE, 0, "%s", ep);
@@ -191,7 +193,7 @@ Pcompile (char *pattern, idx_t size, reg_syntax_t ignored, bool exact)
 
   /* The PCRE documentation says that a 32 KiB stack is the default.  */
   if (e)
-    pc->jit_stack_size = 32 << 10;
+    pc->jit_stack_size = (pcre_jit) ? 32 << 10 : 0;
 #endif
 
   free (re);
diff --git a/tests/Makefile.am b/tests/Makefile.am
index c84cdc0..cd83e00 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -135,6 +135,7 @@ TESTS =						\
   null-byte					\
   options					\
   pcre						\
+  pcre-nojit					\
   pcre-abort					\
   pcre-context					\
   pcre-count					\
diff --git a/tests/pcre-nojit b/tests/pcre-nojit
new file mode 100755
index 0000000..e752f33
--- /dev/null
+++ b/tests/pcre-nojit
@@ -0,0 +1,22 @@
+#! /bin/sh
+# Simple PCRE tests with JIT disabled.
+#
+# Copyright (C) 2001, 2006, 2009-2021 Free Software Foundation, Inc.
+#
+# Copying and distribution of this file, with or without modification,
+# are permitted in any medium without royalty provided the copyright
+# notice and this notice are preserved.
+
+. "${srcdir=.}/init.sh"; path_prepend_ ../src
+require_pcre_
+
+fail=0
+
+echo | grep --perl-regex=no-jit '\s*$' || fail=1
+echo | grep -z --perl-regex=no-jit '\s$' || fail=1
+echo '.ab' | returns_ 1 grep --perl-regex=no-jit -wx ab || fail=1
+echo x | grep --perl-regex=no-jit -z '[^a]' || fail=1
+printf 'x\n\0' | returns_ 1 grep -z --perl-regex=no-jit 'x$' || fail=1
+printf 'a\nb\0' | grep -zx --perl-regex=no-jit a && fail=1
+
+Exit $fail
-- 
2.34.0.rc1.349.g8f33748433


Information forwarded to bug-grep <at> gnu.org:
bug#51727; Package grep. (Wed, 10 Nov 2021 00:42:01 GMT) Full text and rfc822 format available.

Message #8 received at 51727 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Carlo Marcelo Arenas Belón <carenas <at> gmail.com>
Cc: 51727 <at> debbugs.gnu.org
Subject: Re: bug#51727: add an optional flag to -P to disable JIT
Date: Tue, 9 Nov 2021 16:40:56 -0800
On 11/9/21 11:04, Carlo Marcelo Arenas Belón wrote:
> Severity: wishlist
> 
> There are times, when the expression is too simple or will not be used too
> often to justify the extra time in -P that is required for JIT compilation.

How much extra time are we talking about? I would expect users would 
bother thinking about this flag only when it significantly helps 
performance, which sorta implies large inputs, which means the 
expression will be used often, which means users won't want to use nojit.

Anyway, if we do this sort of thing I suggest waiting for PCRE2 and 
doing it then. Also, our flags should use the same spelling as PCRE2 
(which suggests that we use 'no-jit' rather than 'nojit'). And what 
other PCRE2 flags might users want to specify?




Information forwarded to bug-grep <at> gnu.org:
bug#51727; Package grep. (Wed, 10 Nov 2021 12:12:02 GMT) Full text and rfc822 format available.

Message #11 received at 51727 <at> debbugs.gnu.org (full text, mbox):

From: Carlo Arenas <carenas <at> gmail.com>
To: Paul Eggert <eggert <at> cs.ucla.edu>
Cc: 51727 <at> debbugs.gnu.org
Subject: Re: bug#51727: add an optional flag to -P to disable JIT
Date: Wed, 10 Nov 2021 04:11:00 -0800
On Tue, Nov 9, 2021 at 4:40 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:
>
> On 11/9/21 11:04, Carlo Marcelo Arenas Belón wrote:
> > Severity: wishlist
> >
> > There are times, when the expression is too simple or will not be used too
> > often to justify the extra time in -P that is required for JIT compilation.
>
> How much extra time are we talking about? I would expect users would
> bother thinking about this flag only when it significantly helps
> performance, which sorta implies large inputs, which means the
> expression will be used often, which means users won't want to use nojit.

good point; I have to admit its main use, for me at least, was to
actually see how much time I am saving by using jit, and to avoid
hitting buggy jit code paths I might have introduced myself ;), which
is what the commit message kind of implies.

The main point though, was to allow the user the flexibility to decide
for themselves, from any option, while keeping a reasonable default
(which is why keeping JIT enabled by default was done here as well).

> Anyway, if we do this sort of thing I suggest waiting for PCRE2 and
> doing it then. Also, our flags should use the same spelling as PCRE2
> (which suggests that we use 'no-jit' rather than 'nojit').

fair enough, I will make sure to use no-jit if I didn't already.

> And what
> other PCRE2 flags might users want to specify?

a couple that I think might be interesting, is to enable extended
mode, so that users that have really complex expressions can write
them as multiline strings with comments, and one to tell PCRE to skip
the UTF-8 validation if your content is know to be safe, which could
increase performance by more than ~10% if I recall correctly from what
we did in git (which was more aggressive as well and probably didn't
get the results I would expect from grep)

Carlo




Information forwarded to bug-grep <at> gnu.org:
bug#51727; Package grep. (Fri, 12 Nov 2021 23:28:02 GMT) Full text and rfc822 format available.

Message #14 received at 51727 <at> debbugs.gnu.org (full text, mbox):

From: Paul Eggert <eggert <at> cs.ucla.edu>
To: Carlo Arenas <carenas <at> gmail.com>
Cc: 51727 <at> debbugs.gnu.org
Subject: Re: bug#51727: add an optional flag to -P to disable JIT
Date: Fri, 12 Nov 2021 15:27:48 -0800
On 11/10/21 04:11, Carlo Arenas wrote:
> On Tue, Nov 9, 2021 at 4:40 PM Paul Eggert <eggert <at> cs.ucla.edu> wrote:

> its main use, for me at least, was to
> actually see how much time I am saving by using jit, and to avoid
> hitting buggy jit code paths I might have introduced myself ;), which
> is what the commit message kind of implies.

This sounds esoteric enough that we needn't support/document it for 
'grep' users. (Their lives are complicated enough already....)


> enable extended
> mode, so that users that have really complex expressions can write
> them as multiline strings with comments,

This wouldn't be just a PCRE thing; even ordinary BREs and EREs could 
benefit from being able to have multiline regexps, in which a newline 
means "|". The GNU regular expression compiler supports this. Also, we'd 
need to disable grep's ordinary activity of reordering and removing 
duplicates in regular-expression patterns (in the current syntax where a 
newline always starts a new pattern). So I expect this wouldn't be a new 
-P suboption; it'd be a more-general option that applies to all regexp 
syntax options (though it'd be a no-op for -F).


> and one to tell PCRE to skip
> the UTF-8 validation if your content is know to be safe

... and let grep dump core (or worse) otherwise? I'm not sure I would 
like to to head in that direction....




This bug report was last modified 3 years and 11 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.