GNU bug report logs - #36237
Support (rx (and (regexp EXPR) (regexp-quote EXPR)))

Previous Next

Package: emacs;

Reported by: Noam Postavsky <npostavs <at> gmail.com>

Date: Sat, 15 Jun 2019 23:44:02 UTC

Severity: wishlist

Tags: fixed, patch

Merged with 6985

Fixed in version 27.1

Done: Noam Postavsky <npostavs <at> gmail.com>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 36237 in the body.
You can then email your comments to 36237 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to kevin.legouguec <at> gmail.com, monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sat, 15 Jun 2019 23:44:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Noam Postavsky <npostavs <at> gmail.com>:
New bug report received and forwarded. Copy sent to kevin.legouguec <at> gmail.com, monnier <at> iro.umontreal.ca, bug-gnu-emacs <at> gnu.org. (Sat, 15 Jun 2019 23:44:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sat, 15 Jun 2019 19:43:30 -0400
[Message part 1 (text/plain, inline)]
X-Debbugs-CC: Kévin Le Gouguec <kevin.legouguec <at> gmail.com>, Stefan Monnier <monnier <at> iro.umontreal.ca>
Severity: wishlist
Tags: patch

Currently, if you want to construct a regexp which includes a runtime
values using rx, there are two options:

- Use the (eval FORM) subform.  But if using using the rx macro, FORM is
  evaluated at macroexpansion time, which is awkward.  If using
  rx-to-string, then FORM can't access the lexical environment, which is
  also awkward.
- Build a list at runtime and pass to rx-to-string.  This requires the
  whole rx translation infrastructure at runtime, which is sad.

The patch below allows the rx macro to generate a concat expression
instead of just a plain string.  So the example from
https://debbugs.gnu.org/35564#53 would become

    (let ((start (max 0 (1- pos)))
          (char (string (aref command pos)))) ; need string for `regexp-quote'.
      (and (string-match
            (rx (or (seq (or bos blank)
                         (group-n 1 (regexp-quote char))
                         (or eos blank))
                    (seq ?` (group-n 1 (regexp-quote char)) ?`)))
            command start)
           (= pos (match-beginning 1))))

The rx call in the above macroexpands into:

    (concat "\\(?:\\`\\|[[:blank:]]\\)" "\\(?" "1" ":"
            (regexp-quote char)
            "\\)" "\\(?:\\'\\|[[:blank:]]\\)" "\\|" "`" "\\(?" "1" ":"
            (regexp-quote char)
            "\\)" "`")

Which will be optimal once we apply the patch from #14769 "optimize
`concat's literals".

[0001-Support-rx-and-regexp-EXPR-regexp-quote-EXPR.patch (text/x-diff, inline)]
From 6b6c6d8997d02236a4e53ccbe1f6a4b362d9b86c Mon Sep 17 00:00:00 2001
From: Noam Postavsky <npostavs <at> gmail.com>
Date: Fri, 14 Jun 2019 08:43:17 -0400
Subject: [PATCH] Support (rx (and (regexp EXPR) (regexp-quote EXPR)))

* lisp/emacs-lisp/rx.el (rx-regexp): Allow non-string forms.
(rx-constituents): Add regexp-quote constituent, which is like a plain
STRING form, but allows arbitrary lisp expressions.
(rx-regexp-quote): New function.
(rx-compile-to-lisp): New variable.
(rx-subforms): New helper function for handling subforms, including
non-constant case.
(rx-group-if, rx-and, rx-or, rx-=, rx->=, rx-repeat, rx-submatch)
(rx-submatch-n, rx-kleene, rx-atomic-p): Use it to handle non-constant
subforms.
(rx): Document new form, wrap non-constant forms with concat call.
* test/lisp/emacs-lisp/rx-tests.el (rx-tests--match): New macro.
(rx-nonstring-expr, rx-nonstring-expr-non-greedy): New tests.
* etc/NEWS: Announce changes.
---
 etc/NEWS                         |   6 ++
 lisp/emacs-lisp/rx.el            | 189 +++++++++++++++++++++++++--------------
 test/lisp/emacs-lisp/rx-tests.el |  41 +++++++++
 3 files changed, 171 insertions(+), 65 deletions(-)

diff --git a/etc/NEWS b/etc/NEWS
index 723f0a0fb0..bce755a211 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1380,12 +1380,18 @@ when given in a string.  Previously, '(any "\x80-\xff")' would match
 characters U+0080...U+00FF.  Now the expression matches raw bytes in
 the 128...255 range, as expected.
 
+---
 *** The rx 'or' and 'seq' forms no longer require any arguments.
 (or) produces a regexp that never matches anything, while (seq)
 matches the empty string, each being an identity for the operation.
 This also works for their aliases: '|' for 'or'; ':', 'and' and
 'sequence' for 'seq'.
 
+---
+*** 'regexp' and new 'regexp-quote' accept arbirtray lisp as arguments.
+In this case, 'rx' will generate code which produces a regexp string
+at runtime, instead of a constant string.
+
 ** Frames
 
 +++
diff --git a/lisp/emacs-lisp/rx.el b/lisp/emacs-lisp/rx.el
index 8ef78fd69e..0b7765322b 100644
--- a/lisp/emacs-lisp/rx.el
+++ b/lisp/emacs-lisp/rx.el
@@ -47,9 +47,11 @@
 
 ;; Rx translates a sexp notation for regular expressions into the
 ;; usual string notation.  The translation can be done at compile-time
-;; by using the `rx' macro.  It can be done at run-time by calling
-;; function `rx-to-string'.  See the documentation of `rx' for a
-;; complete description of the sexp notation.
+;; by using the `rx' macro.  The `regexp' and `regexp-quote' accept
+;; non-constant expressions, in which case `rx' will translate to a
+;; `concat' expression.  Translation can be done fully at run-time by
+;; calling function `rx-to-string'.  See the documentation of `rx' for
+;; a complete description of the sexp notation.
 ;;
 ;; Some examples of string regexps and their sexp counterparts:
 ;;
@@ -78,8 +80,8 @@
 ;;	    (+ (? ?\n)) (any " \t"))
 ;;
 ;; (concat "^\\(?:" something-else "\\)")
-;; (rx (and line-start (eval something-else))), statically or
-;; (rx-to-string '(and line-start ,something-else)), dynamically.
+;; (rx (and line-start (regexp something-else))), statically or
+;; (rx-to-string `(and line-start ,something-else)), dynamically.
 ;;
 ;; (regexp-opt '(STRING1 STRING2 ...))
 ;; (rx (or STRING1 STRING2 ...)), or in other words, `or' automatically
@@ -176,6 +178,7 @@ (defvar rx-constituents              ;Not `const' because some modes extend it.
     (not-syntax		. (rx-not-syntax 1 1)) ; sregex
     (category		. (rx-category 1 1 rx-check-category))
     (eval		. (rx-eval 1 1))
+    (regexp-quote	. (rx-regexp-quote 1 1 stringp))
     (regexp		. (rx-regexp 1 1 stringp))
     (regex		. regexp)       ; sregex
     (digit		. "[[:digit:]]")
@@ -302,6 +305,10 @@ (defvar rx-greedy-flag t
   "Non-nil means produce greedy regular expressions for `zero-or-one',
 `zero-or-more', and `one-or-more'.  Dynamically bound.")
 
+(defvar rx-compile-to-lisp nil
+  "Nil means return a regexp as a string.
+Non-nil means we may return a lisp form which produces a
+string (used for `rx' macro).")
 
 (defun rx-info (op head)
   "Return parsing/code generation info for OP.
@@ -344,7 +351,7 @@ (defun rx-check (form)
 	       (> nargs max-args))
       (error "rx form `%s' accepts at most %d args"
 	     (car form) max-args))
-    (when (not (null type-pred))
+    (when type-pred
       (dolist (sub-form (cdr form))
 	(unless (funcall type-pred sub-form)
 	  (error "rx form `%s' requires args satisfying `%s'"
@@ -360,19 +367,21 @@ (defun rx-group-if (regexp group)
    ;; for concatenation
    ((eq group ':)
     (if (rx-atomic-p
-	 (if (string-match
-	      "\\(?:[?*+]\\??\\|\\\\{[0-9]*,?[0-9]*\\\\}\\)\\'" regexp)
-	     (substring regexp 0 (match-beginning 0))
-	   regexp))
-	(setq group nil)))
+         (if (and (stringp regexp)
+                  (string-match
+                   "\\(?:[?*+]\\??\\|\\\\{[0-9]*,?[0-9]*\\\\}\\)\\'" regexp))
+             (substring regexp 0 (match-beginning 0))
+           regexp))
+        (setq group nil)))
    ;; for OR
    ((eq group '|) (setq group nil))
    ;; do anyway
    ((eq group t))
    ((rx-atomic-p regexp t) (setq group nil)))
-  (if group
-      (concat "\\(?:" regexp "\\)")
-    regexp))
+  (cond ((and group (stringp regexp))
+         (concat "\\(?:" regexp "\\)"))
+        (group `("\\(?:" ,@regexp "\\)"))
+        (t regexp)))
 
 
 (defvar rx-parent)
@@ -384,7 +393,7 @@ (defun rx-and (form)
 FORM is of the form `(and FORM1 ...)'."
   (rx-check form)
   (rx-group-if
-   (mapconcat (lambda (x) (rx-form x ':)) (cdr form) nil)
+   (rx-subforms (cdr form) ':)
    (and (memq rx-parent '(* t)) rx-parent)))
 
 
@@ -396,7 +405,7 @@ (defun rx-or (form)
     ((null (cdr form)) regexp-unmatchable)
     ((cl-every #'stringp (cdr form))
      (regexp-opt (cdr form) nil t))
-    (t (mapconcat (lambda (x) (rx-form x '|)) (cdr form) "\\|")))
+    (t (rx-subforms (cdr form) '| "\\|")))
    (and (memq rx-parent '(: * t)) rx-parent)))
 
 
@@ -669,7 +678,10 @@ (defun rx-= (form)
   (unless (and (integerp (nth 1 form))
 	       (> (nth 1 form) 0))
     (error "rx `=' requires positive integer first arg"))
-  (format "%s\\{%d\\}" (rx-form (nth 2 form) '*) (nth 1 form)))
+  (let ((subform (rx-form (nth 2 form) '*)))
+    (if (stringp subform)
+        (format "%s\\{%d\\}" subform (nth 1 form))
+      `(,@subform ,(format "\\{%d\\}" (nth 1 form))))))
 
 
 (defun rx->= (form)
@@ -679,7 +691,10 @@ (defun rx->= (form)
   (unless (and (integerp (nth 1 form))
 	       (> (nth 1 form) 0))
     (error "rx `>=' requires positive integer first arg"))
-  (format "%s\\{%d,\\}" (rx-form (nth 2 form) '*) (nth 1 form)))
+  (let ((subform (rx-form (nth 2 form) '*)))
+    (if (stringp subform)
+        (format "%s\\{%d,\\}" subform (nth 1 form))
+      `(,@subform ,(format "\\{%d,\\}" (nth 1 form))))))
 
 
 (defun rx-** (form)
@@ -700,7 +715,10 @@ (defun rx-repeat (form)
 	 (unless (and (integerp (nth 1 form))
 		      (> (nth 1 form) 0))
 	   (error "rx `repeat' requires positive integer first arg"))
-	 (format "%s\\{%d\\}" (rx-form (nth 2 form) '*) (nth 1 form)))
+         (let ((subform (rx-form (nth 2 form) '*)))
+           (if (stringp subform)
+               (format "%s\\{%d\\}" subform (nth 1 form))
+             `(,@subform ,(format "\\{%d\\}" (nth 1 form))))))
 	((or (not (integerp (nth 2 form)))
 	     (< (nth 2 form) 0)
 	     (not (integerp (nth 1 form)))
@@ -708,30 +726,26 @@ (defun rx-repeat (form)
 	     (< (nth 2 form) (nth 1 form)))
 	 (error "rx `repeat' range error"))
 	(t
-	 (format "%s\\{%d,%d\\}" (rx-form (nth 3 form) '*)
-		 (nth 1 form) (nth 2 form)))))
+         (let ((subform (rx-form (nth 3 form) '*)))
+           (if (stringp subform)
+               (format "%s\\{%d,%d\\}" subform (nth 1 form) (nth 2 form))
+             `(,@subform ,(format "\\{%d,%d\\}" (nth 1 form) (nth 2 form))))))))
 
 
 (defun rx-submatch (form)
   "Parse and produce code from FORM, which is `(submatch ...)'."
-  (concat "\\("
-          (if (= 2 (length form))
-              ;; Only one sub-form.
-              (rx-form (cadr form))
-            ;; Several sub-forms implicitly concatenated.
-            (mapconcat (lambda (re) (rx-form re ':)) (cdr form) nil))
-          "\\)"))
+  (let ((subforms (rx-subforms (cdr form) ':)))
+    (if (stringp subforms)
+        (concat "\\(" subforms "\\)")
+      `("\\(" ,@subforms "\\)"))))
 
 (defun rx-submatch-n (form)
   "Parse and produce code from FORM, which is `(submatch-n N ...)'."
-  (let ((n (nth 1 form)))
-    (concat "\\(?" (number-to-string n) ":"
-	    (if (= 3 (length form))
-		;; Only one sub-form.
-		(rx-form (nth 2 form))
-	      ;; Several sub-forms implicitly concatenated.
-	      (mapconcat (lambda (re) (rx-form re ':)) (cddr form) nil))
-	    "\\)")))
+  (let ((n (nth 1 form))
+        (subforms (rx-subforms (cddr form) ':)))
+    (if (stringp subforms)
+        (concat "\\(?" (number-to-string n) ":" subforms "\\)")
+      `("\\(?" ,(number-to-string n) ":" ,@subforms "\\)"))))
 
 (defun rx-backref (form)
   "Parse and produce code from FORM, which is `(backref N)'."
@@ -759,9 +773,12 @@ (defun rx-kleene (form)
 		      (t "?")))
 	(op (cond ((memq (car form) '(* *? 0+ zero-or-more)) "*")
 		  ((memq (car form) '(+ +? 1+ one-or-more))  "+")
-		  (t "?"))))
+                  (t "?")))
+        (subform (rx-form (cadr form) '*)))
     (rx-group-if
-     (concat (rx-form (cadr form) '*) op suffix)
+     (if (stringp subform)
+         (concat subform op suffix)
+       `(,@subform ,(concat op suffix)))
      (and (memq rx-parent '(t *)) rx-parent))))
 
 
@@ -789,15 +806,18 @@ (defun rx-atomic-p (r &optional lax)
 be detected without much effort.  A guarantee of no false
 negatives would require a theoretic specification of the set
 of all atomic regexps."
-  (let ((l (length r)))
-    (cond
-     ((<= l 1))
-     ((= l 2) (= (aref r 0) ?\\))
-     ((= l 3) (string-match "\\`\\(?:\\\\[cCsS_]\\|\\[[^^]\\]\\)" r))
-     ((null lax)
+  (if (and rx-compile-to-lisp
+           (not (stringp r)))
+      nil ;; Runtime value, we must assume non-atomic.
+    (let ((l (length r)))
       (cond
-       ((string-match "\\`\\[\\^?]?\\(?:\\[:[a-z]+:]\\|[^]]\\)*]\\'" r))
-       ((string-match "\\`\\\\(\\(?:[^\\]\\|\\\\[^)]\\)*\\\\)\\'" r)))))))
+       ((<= l 1))
+       ((= l 2) (= (aref r 0) ?\\))
+       ((= l 3) (string-match "\\`\\(?:\\\\[cCsS_]\\|\\[[^^]\\]\\)" r))
+       ((null lax)
+        (cond
+         ((string-match "\\`\\[\\^?]?\\(?:\\[:[a-z]+:]\\|[^]]\\)*]\\'" r))
+         ((string-match "\\`\\\\(\\(?:[^\\]\\|\\\\[^)]\\)*\\\\)\\'" r))))))))
 
 
 (defun rx-syntax (form)
@@ -853,9 +873,23 @@ (defun rx-greedy (form)
 
 (defun rx-regexp (form)
   "Parse and produce code from FORM, which is `(regexp STRING)'."
-  (rx-check form)
-  (rx-group-if (cadr form) rx-parent))
-
+  (cond ((stringp form)
+         (rx-group-if (cadr form) rx-parent))
+        (rx-compile-to-lisp
+         ;; Always group non string forms, since we can't be sure they
+         ;; are atomic.
+         (rx-group-if (cdr form) t))
+        (t (rx-check form))))
+
+(defun rx-regexp-quote (form)
+  "Parse and produce code from FORM, which is `(regexp-quote STRING-EXP)'."
+  (cond ((stringp form)
+         ;; This is allowed(?), but makes little sense, you could just
+         ;; use STRING directly.
+         (rx-group-if (regexp-quote (cadr form)) rx-parent))
+        (rx-compile-to-lisp
+         (rx-group-if (list form) rx-parent))
+        (t (rx-check form))))
 
 (defun rx-form (form &optional parent)
   "Parse and produce code for regular expression FORM.
@@ -886,6 +920,27 @@ (defun rx-form (form &optional parent)
      (t
       (error "rx syntax error at `%s'" form)))))
 
+(defun rx-subforms (subforms &optional parent regexp-op)
+  (let ((listify (lambda (x)
+                   (if (listp x) (copy-sequence x)
+                     (list x))))
+        (subregexps (cond ((cdr subforms)
+                           (mapcar (lambda (x) (rx-form x parent)) subforms))
+                          (subforms
+                           ;; Single form, no need for grouping.
+                           (list (rx-form (car subforms))))
+                          ;; Zero forms.
+                          (t ""))))
+    (cond ((or (not rx-compile-to-lisp)
+               (cl-every #'stringp subregexps))
+           (mapconcat #'identity subregexps regexp-op))
+          (regexp-op
+           (nconc (funcall listify (car subregexps))
+                  (cl-mapcan (lambda (x)
+                               (cons regexp-op (funcall listify x)))
+                             (cdr subregexps))))
+          (t (cl-mapcan listify subregexps)))))
+
 
 ;;;###autoload
 (defun rx-to-string (form &optional no-group)
@@ -901,8 +956,12 @@ (defmacro rx (&rest regexps)
 REGEXPS is a non-empty sequence of forms of the sort listed below.
 
 Note that `rx' is a Lisp macro; when used in a Lisp program being
-compiled, the translation is performed by the compiler.
-See `rx-to-string' for how to do such a translation at run-time.
+compiled, the translation is performed by the compiler.  The
+`regexp-quote' and `regexp' accept forms that will evaluate to
+strings, in addition to constant strings.  If REGEXPS include
+such forms, then the result is an expression which returns a
+regexp string, rather than a regexp string directly.  See
+`rx-to-string' for performing translation completely at run-time.
 
 The following are valid subforms of regular expressions in sexp
 notation.
@@ -910,6 +969,10 @@ (defmacro rx (&rest regexps)
 STRING
      matches string STRING literally.
 
+`(regexp-quote STRING)'
+     matches STRING literally, where STRING is any lisp
+     expression that evaluates to a string.
+
 CHAR
      matches character CHAR literally.
 
@@ -1208,12 +1271,16 @@ (defmacro rx (&rest regexps)
 
 `(regexp REGEXP)'
      include REGEXP in string notation in the result."
-  (cond ((null regexps)
-	 (error "No regexp"))
-	((cdr regexps)
-	 (rx-to-string `(and ,@regexps) t))
-	(t
-	 (rx-to-string (car regexps) t))))
+  (let* ((rx-compile-to-lisp t)
+         (re (cond ((null regexps)
+                    (error "No regexp"))
+                   ((cdr regexps)
+                    (rx-to-string `(and ,@regexps) t))
+                   (t
+                    (rx-to-string (car regexps) t)))))
+    (if (stringp re)
+        re
+      `(concat ,@re))))
 
 
 (pcase-defmacro rx (&rest regexps)
@@ -1275,14 +1342,6 @@ (pcase-defmacro rx (&rest regexps)
                      for var in vars
                      collect `(app (match-string ,i) ,var)))))
 
-;; ;; sregex.el replacement
-
-;; ;;;###autoload (provide 'sregex)
-;; ;;;###autoload (autoload 'sregex "rx")
-;; (defalias 'sregex 'rx-to-string)
-;; ;;;###autoload (autoload 'sregexq "rx" nil nil 'macro)
-;; (defalias 'sregexq 'rx)
-
 (provide 'rx)
 
 ;;; rx.el ends here
diff --git a/test/lisp/emacs-lisp/rx-tests.el b/test/lisp/emacs-lisp/rx-tests.el
index 6f392d616d..d457f6919d 100644
--- a/test/lisp/emacs-lisp/rx-tests.el
+++ b/test/lisp/emacs-lisp/rx-tests.el
@@ -115,5 +115,46 @@ (ert-deftest rx-seq ()
   ;; Test zero-argument `seq'.
   (should (equal (rx (seq)) "")))
 
+(defmacro rx-tests--match (regexp string &optional match)
+  (macroexp-let2 nil strexp string
+    `(ert-info ((format "Matching %S to %S" ',regexp ,strexp))
+       (should (string-match ,regexp ,strexp))
+       ,@(when match
+           `((should (equal (match-string 0 ,strexp) ,match)))))))
+
+(ert-deftest rx-nonstring-expr ()
+  (let ((bee "b")
+        (vowel "[aeiou]"))
+    (rx-tests--match (rx "a" (regexp-quote bee) "c") "abc")
+    (rx-tests--match (rx "a" (regexp bee) "c") "abc")
+    (rx-tests--match (rx "a" (or (regexp bee) "xy") "c") "abc")
+    (rx-tests--match (rx "a" (or "xy" (regexp bee)) "c") "abc")
+    (should-not (string-match (rx (or (regexp bee) "xy")) ""))
+    (rx-tests--match (rx "a" (= 3 (regexp bee)) "c") "abbbc")
+    (rx-tests--match (rx "x" (= 3 (regexp vowel)) "z") "xeoez")
+    (should-not (string-match (rx "x" (= 3 (regexp vowel)) "z") "xe[]z"))
+    (rx-tests--match (rx "x" (= 3 (regexp-quote vowel)) "z")
+                     "x[aeiou][aeiou][aeiou]z")
+    (rx-tests--match (rx "x" (repeat 1 (regexp vowel)) "z") "xaz")
+    (rx-tests--match (rx "x" (repeat 1 2 (regexp vowel)) "z") "xaz")
+    (rx-tests--match (rx "x" (repeat 1 2 (regexp vowel)) "z") "xauz")
+    (rx-tests--match (rx "x" (>= 1 (regexp vowel)) "z") "xaiiz")
+    (rx-tests--match (rx "x" (** 1 2 (regexp vowel)) "z") "xaiz")
+    (rx-tests--match (rx "x" (group (regexp vowel)) "z") "xaz")
+    (rx-tests--match (rx "x" (group-n 1 (regexp vowel)) "z") "xaz")
+    (rx-tests--match (rx "x" (? (regexp vowel)) "z") "xz")))
+
+(ert-deftest rx-nonstring-expr-non-greedy ()
+  "`rx's greediness can't affect runtime regexp parts."
+  (let ((ad-min "[ad]*?")
+        (ad-max "[ad]*")
+        (ad "[ad]"))
+    (rx-tests--match (rx "c" (regexp ad-min) "a") "cdaaada" "cda")
+    (rx-tests--match (rx "c" (regexp ad-max) "a") "cdaaada" "cdaaada")
+    (rx-tests--match (rx "c" (minimal-match (regexp ad-max)) "a") "cdaaada" "cdaaada")
+    (rx-tests--match (rx "c" (maximal-match (regexp ad-min)) "a") "cdaaada" "cda")
+    (rx-tests--match (rx "c" (minimal-match (0+ (regexp ad))) "a") "cdaaada" "cda")
+    (rx-tests--match (rx "c" (maximal-match (0+ (regexp ad))) "a") "cdaaada" "cdaaada")))
+
 (provide 'rx-tests)
 ;; rx-tests.el ends here.
-- 
2.11.0


Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 00:04:02 GMT) Full text and rfc822 format available.

Message #8 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Michael Heerdegen <michael_heerdegen <at> web.de>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: 36237 <at> debbugs.gnu.org, stefan monnier <monnier <at> iro.umontreal.ca>,
 kévin le gouguec <kevin.legouguec <at> gmail.com>
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 02:03:40 +0200
Noam Postavsky <npostavs <at> gmail.com> writes:

> +---
> +*** 'regexp' and new 'regexp-quote' accept arbirtray lisp as arguments.
                                              ^^^^^^^^^ (typo)

Cool.

I only wondered whether it wouldn't have been better if `eval' would
have been upgraded - but that wouldn't be able to regexp-quote when I
wanted it, right?

Does it make sense to keep `eval'?  There are so many rx-form types,
would be good if we could make it obsolete (in some sense).

Michael.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 00:29:01 GMT) Full text and rfc822 format available.

Message #11 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Michael Heerdegen <michael_heerdegen <at> web.de>
Cc: 36237 <at> debbugs.gnu.org, stefan monnier <monnier <at> iro.umontreal.ca>,
 kévin le gouguec <kevin.legouguec <at> gmail.com>
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sat, 15 Jun 2019 20:28:24 -0400
Michael Heerdegen <michael_heerdegen <at> web.de> writes:

>> +*** 'regexp' and new 'regexp-quote' accept arbirtray lisp as arguments.
>                                               ^^^^^^^^^ (typo)

Oops.

> I only wondered whether it wouldn't have been better if `eval' would
> have been upgraded

(rx (eval EXPR)) vs (rx (regexp-quote EXPR)) have different semantics
with respect to scoping and evaluation time, so I think that would be a
problem with respect to backwards compatibility.

> Does it make sense to keep `eval'?  There are so many rx-form types,
> would be good if we could make it obsolete (in some sense).

Yeah, I think it would make sense to obsolete `eval'.  Also `and': it
sounds like it should be the logical-and counterpart to `or', but it's
just a synonym for `seq' (which is a much clearer name).  I used `and'
in the bug title just for punning reasons :)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 10:05:01 GMT) Full text and rfc822 format available.

Message #14 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Noam Postavsky <npostavs <at> gmail.com>, 36237 <at> debbugs.gnu.org
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 12:03:53 +0200
Thank you! Definitely agree with the need for something like this, although I haven't read the patch in detail yet.
Some notes:

- A more suitable name is needed; `regexp-quote' makes no sense at all in the context of rx. What about `literal'? For simplicity, I'm proceeding with that name below.

- The docs probably need to be explicit about the differences between `literal' and `eval' w.r.t. evaluation time and environment.

- What is now the correct way of including a compile-time regexp expression, such as a defconst? (regexp (eval-when-compile EXPR))? Still a mouthful, but perhaps outside the scope of this bug.

- Thanks for mentioning bug#14769; I'll give it a go.

- I like rx a lot and use it extensively, but the implementation... could be improved (as you no doubt noticed). And so I have: I'm sitting on a full rewrite, code-named `ry'. It's shorter, much cleaner, and about twice as fast (usually more). The only thing still missing is compatibility with the old crusty `rx-constituents' extension mechanism.

The plan was to replace rx with ry entirely when complete. I'll see what it would take to add `literal'.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 11:36:02 GMT) Full text and rfc822 format available.

Message #17 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Noam Postavsky <npostavs <at> gmail.com>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 07:34:49 -0400
First, thanks Noam, this is a very welcome improvement.
[ BTW, please use "--" for rx-compile-to-lisp since I believe it's
  internal.  ]

What do regexp and regexp-quote do in rx-to-string?

> - A more suitable name is needed; `regexp-quote' makes no sense at all in
> the context of rx. What about `literal'? For simplicity, I'm proceeding with
> that name below.

I'll let others figure that one out.

> - What is now the correct way of including a compile-time regexp expression,
> such as a defconst? (regexp (eval-when-compile EXPR))? Still a mouthful, but
> perhaps outside the scope of this bug.

FWIW, I have the impression that in most cases where this could be
useful, a better solution would be to provide something like
`rx-defmacro` and/or `rx-macrolet`.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 12:27:02 GMT) Full text and rfc822 format available.

Message #20 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Mattias Engdegård <mattiase <at> acm.org>, 36237 <at> debbugs.gnu.org,
 kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 08:25:46 -0400
> [ BTW, please use "--" for rx-compile-to-lisp since I believe it's
>   internal.  ]

I'm not sure that it is, e.g., python-rx might bind it too (if it
weren't for the fact that python.el needs to maintain backwards
compatibility with older Emacs).

> What do regexp and regexp-quote do in rx-to-string?

regexp does exactly what it did before, i.e., it accepts only a constant
string.  Likewise regexp-quote accepts only a constant string, which
makes it pointless to use in rx-to-string (just use a plain STRING
directly), but I didn't disallow it.

Hmm, I think I had meant to update rx-to-string's docstring, but forgot
about it.

>> - A more suitable name is needed; `regexp-quote' makes no sense at all in
>> the context of rx. What about `literal'? For simplicity, I'm proceeding with
>> that name below.
>
> I'll let others figure that one out.

Probably `literal' makes sense.  I originally used regexp-quote, just
because I was thinking of it as a short form of

    (rx (regexp (regexp-quote EXPR)))

>> - What is now the correct way of including a compile-time regexp expression,
>> such as a defconst? (regexp (eval-when-compile EXPR))? Still a mouthful, but
>> perhaps outside the scope of this bug.

Oh, hmm.  That might be a reason to keep using `eval'.

> FWIW, I have the impression that in most cases where this could be
> useful, a better solution would be to provide something like
> `rx-defmacro` and/or `rx-macrolet`.

I guess that could replace the "old crusty" rx-constituents thing too.

>> I have: I'm sitting on a full rewrite, code-named `ry'. It's shorter,
>> much cleaner, and about twice as fast (usually more). The only thing
>> still missing is compatibility with the old crusty `rx-constituents'
>> extension mechanism.
>> 
>> The plan was to replace rx with ry entirely when complete.

How far away is this?  Would it make sense to delay this bug until "ry"
comes in?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 12:36:01 GMT) Full text and rfc822 format available.

Message #23 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Mattias Engdegård <mattiase <at> acm.org>,
 36237 <at> debbugs.gnu.org, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 08:35:01 -0400
> I'm not sure that it is, e.g., python-rx might bind it too (if it
> weren't for the fact that python.el needs to maintain backwards
> compatibility with older Emacs).

I think that would still be "python-rx using internals of rx".
And that could be fixed if we can rewrite it with rx-macrolet, right?

> regexp does exactly what it did before, i.e., it accepts only a constant
> string.  Likewise regexp-quote accepts only a constant string, which
> makes it pointless to use in rx-to-string (just use a plain STRING
> directly), but I didn't disallow it.

Good, thanks.

>>> - What is now the correct way of including a compile-time regexp expression,
>>> such as a defconst? (regexp (eval-when-compile EXPR))? Still a mouthful, but
>>> perhaps outside the scope of this bug.
> Oh, hmm.  That might be a reason to keep using `eval'.

It could make people reluctant to change, yes, but that still wouldn't
be a valid reason in my book.

>> FWIW, I have the impression that in most cases where this could be
>> useful, a better solution would be to provide something like
>> `rx-defmacro` and/or `rx-macrolet`.
> I guess that could replace the "old crusty" rx-constituents thing too.

That's the idea, yes.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 19:52:02 GMT) Full text and rfc822 format available.

Message #26 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Mattias Engdegård <mattiase <at> acm.org>,
 36237 <at> debbugs.gnu.org, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 15:50:57 -0400
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> I'm not sure that it is, e.g., python-rx might bind it too (if it
>> weren't for the fact that python.el needs to maintain backwards
>> compatibility with older Emacs).
>
> I think that would still be "python-rx using internals of rx".
> And that could be fixed if we can rewrite it with rx-macrolet, right?

Yeah, I can buy that.

Mattias, how far is this "ry" thing from being finished?  Would it make
sense to land it before adding this new feature?





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 20:06:01 GMT) Full text and rfc822 format available.

Message #29 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Mattias Engdegård <mattiase <at> acm.org>,
 36237 <at> debbugs.gnu.org, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 16:04:59 -0400
>>> I'm not sure that it is, e.g., python-rx might bind it too (if it
>>> weren't for the fact that python.el needs to maintain backwards
>>> compatibility with older Emacs).
>> I think that would still be "python-rx using internals of rx".
>> And that could be fixed if we can rewrite it with rx-macrolet, right?
> Yeah, I can buy that.

In this case I think it's OK to use "--" to signal that it's an internal
name, and then use this internal name in python-rx.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 20:26:02 GMT) Full text and rfc822 format available.

Message #32 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Noam Postavsky <npostavs <at> gmail.com>, Stefan Monnier
 <monnier <at> iro.umontreal.ca>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Mattias Engdegård <mattiase <at> acm.org>,
 36237 <at> debbugs.gnu.org, kevin.legouguec <at> gmail.com
Subject: RE: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 13:25:00 -0700 (PDT)
> how far is this "ry" thing from being finished?

Please excuse this interruption from someone not
really following this thread.  Naive question:
Are you really thinking about using `ry' as the
user-visible name - e.g. as a replacement for,
or alternative to, `rx'?  (Until Noam posed that
question I was thinking it was tongue-in-cheek.)

I kinda hope not (but again, I'm ignorant of the
context).  If this is meant to stand for a regexp
thingie, and you want an alternative to `rx',
please try for something that suggests "regexp"
or "regular expression" - `rx2', `rex', `rgx',
`rg' ... - something suggestive.

(We've had 45+ years of `C' as the successor to
`B'.  Surely we can do better. ;-))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 20:35:02 GMT) Full text and rfc822 format available.

Message #35 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 kevin.legouguec <at> gmail.com, Noam Postavsky <npostavs <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 22:34:09 +0200
16 juni 2019 kl. 22.25 skrev Drew Adams <drew.adams <at> oracle.com>:
> 
>> how far is this "ry" thing from being finished?

Not far, I hope. A few days, maybe, depending on available time.

> Are you really thinking about using `ry' as the
> user-visible name - e.g. as a replacement for,
> or alternative to, `rx'?

No, it's just a working title, to avoid clashes with the current rx. It will be renamed to `rx' when done.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 16 Jun 2019 21:11:01 GMT) Full text and rfc822 format available.

Message #38 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 kevin.legouguec <at> gmail.com, Noam Postavsky <npostavs <at> gmail.com>,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: RE: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 16 Jun 2019 14:09:52 -0700 (PDT)
> > Are you really thinking about using `ry' as the
> > user-visible name - e.g. as a replacement for,
> > or alternative to, `rx'?
> 
> No, it's just a working title, to avoid clashes with the current rx. It
> will be renamed to `rx' when done.

Great; thx.  Sorry for the noise.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Mon, 17 Jun 2019 22:14:03 GMT) Full text and rfc822 format available.

Message #41 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Mattias Engdegård <mattiase <at> acm.org>,
 Noam Postavsky <npostavs <at> gmail.com>, Stefan Monnier <monnier <at> iro.umontreal.ca>,
 kevin.legouguec <at> gmail.com, 36237 <at> debbugs.gnu.org
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Mon, 17 Jun 2019 23:57:53 +0300
> I kinda hope not (but again, I'm ignorant of the
> context).  If this is meant to stand for a regexp
> thingie, and you want an alternative to `rx',
> please try for something that suggests "regexp"
> or "regular expression" - `rx2', `rex', `rgx',
> `rg' ... - something suggestive.
>
> (We've had 45+ years of `C' as the successor to
> `B'.  Surely we can do better. ;-))

rx++




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Tue, 18 Jun 2019 19:47:01 GMT) Full text and rfc822 format available.

Message #44 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Tue, 18 Jun 2019 21:45:57 +0200
16 juni 2019 kl. 14.25 skrev Noam Postavsky <npostavs <at> gmail.com>:

>>> The plan was to replace rx with ry entirely when complete.
> 
> How far away is this?  Would it make sense to delay this bug until "ry"
> comes in?

The function-complete ry has been put at https://gitlab.com/mattiase/ry for the time being. It should now be entirely compatible, including support for `rx-constituents'. Your proposed `literal' was also added, which was instructive; I needed to know how it would fit in.

Bootstrapping Emacs with it (ry renamed to rx, of course) works, including python-mode which uses `rx-constituents' for extension. It also has fewer of the known bugs of the current rx, and a much more extensive test suite.

There are two things that perhaps need to be resolved before replacing the current rx:

1. Is there externally developed elisp code that makes use of internal rx functions and variables, perhaps for extension purposes, and do we need to worry about breaking it? The code is old enough not to delineate internal and public symbols clearly (no double-dash), which means that people may have gone and used all sort of things.

2. What would a good extension mechanism look like, and should one be put into place right away, so that we can point to a decent replacement for the internal toys that users relied upon?

Noam, unless the consensus is that ry is unequivocally as good or better than rx, you could just as well apply your patch (suitably fixed up). Even if later replaced, there is nothing fundamentally wrong with the design; let's not hold it hostage.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Wed, 19 Jun 2019 01:36:02 GMT) Full text and rfc822 format available.

Message #47 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Tue, 18 Jun 2019 21:34:52 -0400
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:

> The function-complete ry has been put at
> https://gitlab.com/mattiase/ry for the time being. It should now be
> entirely compatible, including support for `rx-constituents'. Your
> proposed `literal' was also added, which was instructive; I needed to
> know how it would fit in.

Cool, I'll take a look.

> Noam, unless the consensus is that ry is unequivocally as good or
> better than rx, you could just as well apply your patch (suitably
> fixed up). Even if later replaced, there is nothing fundamentally
> wrong with the design; let's not hold it hostage.

Sure.  Here's the patch with regexp-quote change to literal, and
rx--compile-to-lisp renamed.  I'll wait a bit more and push this weekend
if there are no more comments.

[0001-Support-rx-and-regexp-EXPR-literal-EXPR-Bug-36237.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Wed, 19 Jun 2019 15:44:01 GMT) Full text and rfc822 format available.

Message #50 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Wed, 19 Jun 2019 17:42:57 +0200
19 juni 2019 kl. 03.34 skrev Noam Postavsky <npostavs <at> gmail.com>:
> 
> Sure.  Here's the patch with regexp-quote change to literal, and
> rx--compile-to-lisp renamed.  I'll wait a bit more and push this weekend
> if there are no more comments.

Thank you. Some comments:

-;; (rx (and line-start (eval something-else))), statically or
-;; (rx-to-string '(and line-start ,something-else)), dynamically.
+;; (rx (and line-start (regexp something-else))), statically or
+;; (rx-to-string `(and line-start ,something-else)), dynamically.

With your patch, the rx-to-string example should no longer be recommended, but eval is still of interest for compile-time substitution. What about:

;; (rx (and line-start (eval something-else))), statically or
;; (rx (and line-start (regexp something-else))), dynamically.

(extra points if you change `and' into `seq')

+  (cond ((stringp form)
+         (rx-group-if (cadr form) rx-parent))
+        (rx--compile-to-lisp
+         ;; Always group non string forms, since we can't be sure they

"non-string forms"

+(defun rx-literal (form)
+  "Parse and produce code from FORM, which is `(literal STRING-EXP)'."
+  (cond ((stringp form)
+         ;; This is allowed(?), but makes little sense, you could just
+         ;; use STRING directly.

Yes, I did the same in ry. Maybe `literal' should be disallowed entirely in rx-to-string, since it's more likely to be a misunderstanding on the user's part.
 
+(defun rx-subforms (subforms &optional parent regexp-op)

REGEXP-OP is perhaps better named SEPARATOR?

+                  (cl-mapcan (lambda (x)
+                               (cons regexp-op (funcall listify x)))
+                             (cdr subregexps))))
+          (t (cl-mapcan listify subregexps)))))

Any reason for using cl-mapcan instead of straight mapcan?
Not that it matters much.

+NO-GROUP non-nil means don't put shy groups around the result.
+Note that unlike for the `rx' macro, subforms `literal' and
+`regexp' will not accept non-string arguments (so (literal
+STRING) becomes just a more verbose version of STRING)."

Try not breaking the line inside (literal STRING).






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Thu, 20 Jun 2019 00:30:02 GMT) Full text and rfc822 format available.

Message #53 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Wed, 19 Jun 2019 20:29:44 -0400
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:

> +;; (rx (and line-start (regexp something-else))), statically or
> +;; (rx-to-string `(and line-start ,something-else)), dynamically.
>
> With your patch, the rx-to-string example should no longer be
> recommended, but eval is still of interest for compile-time
> substitution. What about:
>
> ;; (rx (and line-start (eval something-else))), statically or
> ;; (rx (and line-start (regexp something-else))), dynamically.

Not sure that we really want to get into the subtleties of static eval
in the intro examples.  I'm thinking we just drop the rx-to-string
example, without replacement.

> +         ;; Always group non string forms, since we can't be sure they
>
> "non-string forms"

Right.

> +(defun rx-literal (form)
> +  "Parse and produce code from FORM, which is `(literal STRING-EXP)'."
> +  (cond ((stringp form)
> +         ;; This is allowed(?), but makes little sense, you could just
> +         ;; use STRING directly.
>
> Yes, I did the same in ry. Maybe `literal' should be disallowed
> entirely in rx-to-string, since it's more likely to be a
> misunderstanding on the user's part.

I think disallowing it could potentially be annoying during development,
e.g., building incrementally in re-builder.

> +(defun rx-subforms (subforms &optional parent regexp-op)
>
> REGEXP-OP is perhaps better named SEPARATOR?

Yeah, especially since it's just the one "\\|" operator.

> +                  (cl-mapcan (lambda (x)
> +                               (cons regexp-op (funcall listify x)))
> +                             (cdr subregexps))))
> +          (t (cl-mapcan listify subregexps)))))
>
> Any reason for using cl-mapcan instead of straight mapcan?
> Not that it matters much.

I, um, didn't realize mapcan was builtin (when I saw mapcan elsewhere I
just assumed it was from cl.el).  In my defence, it's new since 26.1 :p

(In addition to the above two points, I've renamed this function to
rx--subforms, and re-arranged the code a bit)

> +`regexp' will not accept non-string arguments (so (literal
> +STRING) becomes just a more verbose version of STRING)."
>
> Try not breaking the line inside (literal STRING).

Right.

[0001-Support-rx-and-regexp-EXPR-literal-EXPR-Bug-36237.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Thu, 20 Jun 2019 10:27:01 GMT) Full text and rfc822 format available.

Message #56 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Thu, 20 Jun 2019 12:26:21 +0200
20 juni 2019 kl. 02.29 skrev Noam Postavsky <npostavs <at> gmail.com>:
> 
> <0001-Support-rx-and-regexp-EXPR-literal-EXPR-Bug-36237.patch>

Things I didn't notice earlier:

-;; (rx (and line-start (eval something-else))), statically or
-;; (rx-to-string '(and line-start ,something-else)), dynamically.
+;; (rx (seq line-start (regexp something-else)))

You can actually drop the `seq' form entirely, since it's implicit in `rx'.
It was only needed for `rx-to-string' which is now gone.

+`(literal STRING)'
+     matches STRING literally, where STRING is any lisp
+     expression that evaluates to a string.

It's better to name the metavariable EXPR, STRING-EXPR or LISP-EXPR to make it clear that it's an arbitrary lisp expression, especially since STRING is used for a constant string just above.

The same goes for `regexp', since it can now be a lisp expression; this should be mentioned in the describing paragraph, using a similar phrasing. The `literal' item should probably be moved next to `regexp', since they are closely related.

The paragraph on `eval' uses FORM, which is too generic; presumably it should become EXPR too (but not STRING-EXPR, since it can be any rx value).





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sat, 22 Jun 2019 22:07:01 GMT) Full text and rfc822 format available.

Message #59 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sat, 22 Jun 2019 18:05:58 -0400
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:

> -;; (rx (and line-start (eval something-else))), statically or
> -;; (rx-to-string '(and line-start ,something-else)), dynamically.
> +;; (rx (seq line-start (regexp something-else)))
>
> You can actually drop the `seq' form entirely, since it's implicit in `rx'.
> It was only needed for `rx-to-string' which is now gone.

Yeah, that applies to most of the examples actually.  Updated (and I
found a couple of mistakes in them).

> +`(literal STRING)'
> +     matches STRING literally, where STRING is any lisp
> +     expression that evaluates to a string.
>
> It's better to name the metavariable EXPR, STRING-EXPR or LISP-EXPR to
> make it clear that it's an arbitrary lisp expression, especially since
> STRING is used for a constant string just above.

Sure.

> The same goes for `regexp', since it can now be a lisp expression;
> this should be mentioned in the describing paragraph, using a similar
> phrasing. The `literal' item should probably be moved next to
> `regexp', since they are closely related.

Yeah, I wasn't entirely sure whether `literal' should be considered more
related to `regexp' or STRING.  I guess since I've added a mention of
`literal' and `regexp' in the paragraphs above it makes sense to put
them at the end together.

> The paragraph on `eval' uses FORM, which is too generic

No, it's not generic, see (info "(elisp) Intro Eval"):

       A Lisp object that is intended for evaluation is called a "form" or
    "expression"(1).

[0001-Support-rx-and-regexp-EXPR-literal-EXPR-Bug-36237.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 23 Jun 2019 11:10:02 GMT) Full text and rfc822 format available.

Message #62 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 23 Jun 2019 13:09:46 +0200
23 juni 2019 kl. 00.05 skrev Noam Postavsky <npostavs <at> gmail.com>:
> 
> Yeah, that applies to most of the examples actually.  Updated (and I
> found a couple of mistakes in them).

Very good, thank you! I double-checked them with xr and only found one error (see below).

>> The paragraph on `eval' uses FORM, which is too generic
> 
> No, it's not generic, see (info "(elisp) Intro Eval"):
> 
>       A Lisp object that is intended for evaluation is called a "form" or
>    "expression"(1).

You are entirely correct, of course; what I meant is that the docs frequently use "form" for the `rx' whatchamacallits even though they aren't Lisp expressions. The terminology is a mess; use whatever you find understandable.

>squash! Support (rx (and (regexp EXPR) (literal EXPR))) (Bug#36237)

Remnants of rebase editing?

 ;; "[ \t\n]*:\\([^:]+\\|$\\)"
-;; (rx (and (zero-or-more (in " \t\n")) ":"
-;;          (submatch (or line-end (one-or-more (not (any ?:)))))))
+;; (rx (* (in " \t\n")) ":"
+;;     (submatch (or line-end (+ (not (in ?:))))))

The correct translation of the `or'-pattern is

  (or (+ (not (any ":"))) eol)

since the order of the branches matters. Maybe it's the regexp string that should be the other way around; hard to tell without any context.

 ;; "^;;\\s-*\n\\|^\n"
-;; (rx (or (and line-start ";;" (0+ space) ?\n)
-;;         (and line-start ?\n)))
+;; (rx (or (seq line-start ";;" (0+ space) ?\n)
+;;         (seq line-start ?\n)))

This should be correct. The regexp compiler translates `[[:space:]]` and `\s-` to different bytecodes, but as far as I can tell they end up having identical semantics in the end. Same goes for `[[:word:]]' vs `\sw' (alias `\w'), and so on.

+`(literal STRING-EXPR)'
+     matches STRING-EXPR literally, where STRING-EXPR is any lisp
+     expression that evaluates to a string.
+
+`(regexp REGEXP-EXPR)'
+     include REGEXP-EXPR in string notation in the result, where
+     REGEXP-EXPR is any lisp expression that evaluates a string
+     containing a valid regexp.

Missed "to" after "evaluate"?

I'm happy with the patch after the obvious fixes.





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 23 Jun 2019 14:46:02 GMT) Full text and rfc822 format available.

Message #65 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Noam Postavsky <npostavs <at> gmail.com>, Mattias Engdegård
 <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: RE: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 23 Jun 2019 07:45:26 -0700 (PDT)
Minor:

 "at run-time" -> "at run time"

Emacs docs seem to use "runtime" as
adjective and "run time" as noun, which
is fairly conventional.  Sometimes,
outside Emacs, "runtime" is used for both.






Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Sun, 23 Jun 2019 15:47:02 GMT) Full text and rfc822 format available.

Message #68 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 23 Jun 2019 11:46:04 -0400
[Message part 1 (text/plain, inline)]
Mattias Engdegård <mattiase <at> acm.org> writes:
>
> what I meant is that the docs frequently use "form" for the `rx'
> whatchamacallits even though they aren't Lisp expressions. The
> terminology is a mess; use whatever you find understandable.

Well, the rest of the rx docstring uses SEXP for the `rx'
whatchamacallits, so I think leaving it as (eval FORM) should be fine.
And hopefully we'll be able to deprecate eval soon enough so it won't
matter too much.

>>squash! Support (rx (and (regexp EXPR) (literal EXPR))) (Bug#36237)
>
> Remnants of rebase editing?

Oops, yes.  I wish git would comment out the "squash!..." line
automatically.

> since the order of the branches matters. Maybe it's the regexp string
> that should be the other way around; hard to tell without any context.

Yeah, I switched the regexp string instead, on the grounds that
otherwise "$" would almost never match (except at end of buffer) since
[^:] already matches \n.

> +`(regexp REGEXP-EXPR)'
> +     include REGEXP-EXPR in string notation in the result, where
> +     REGEXP-EXPR is any lisp expression that evaluates a string
> +     containing a valid regexp.
>
> Missed "to" after "evaluate"?

Oops.

> I'm happy with the patch after the obvious fixes.

I'll wait a few more days in case something else comes up.

Drew Adams <drew.adams <at> oracle.com> writes:

> Minor:
>
>  "at run-time" -> "at run time"
>
> Emacs docs seem to use "runtime" as
> adjective and "run time" as noun, which
> is fairly conventional.  Sometimes,
> outside Emacs, "runtime" is used for both.

The elisp manual has a couple of "run-time" as well, but more cases of
"run time" so I went with that.

[0001-Support-rx-and-regexp-EXPR-literal-EXPR-Bug-36237.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Mon, 24 Jun 2019 03:51:01 GMT) Full text and rfc822 format available.

Message #71 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Noam Postavsky <npostavs <at> gmail.com>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>,
 Mattias Engdegård <mattiase <at> acm.org>,
 36237 <at> debbugs.gnu.org, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Sun, 23 Jun 2019 23:50:00 -0400
> Well, the rest of the rx docstring uses SEXP for the `rx'
> whatchamacallits,

FWIW, I think `RX` would make sense for them.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Mon, 24 Jun 2019 10:54:02 GMT) Full text and rfc822 format available.

Message #74 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Mattias Engdegård <mattiase <at> acm.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Noam Postavsky <npostavs <at> gmail.com>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Mon, 24 Jun 2019 12:52:57 +0200
24 juni 2019 kl. 05.50 skrev Stefan Monnier <monnier <at> iro.umontreal.ca>:
> 
>> Well, the rest of the rx docstring uses SEXP for the `rx'
>> whatchamacallits,
> 
> FWIW, I think `RX` would make sense for them.

Agreed; ry uses RX in its doc strings.





Merged 6985 36237. Request was from Mattias Engdegård <mattiase <at> acm.org> to control <at> debbugs.gnu.org. (Tue, 25 Jun 2019 11:11:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Wed, 26 Jun 2019 02:09:02 GMT) Full text and rfc822 format available.

Message #79 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Mattias Engdegård <mattiase <at> acm.org>
Cc: Michael Heerdegen <michael_heerdegen <at> web.de>, 36237 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>, kevin.legouguec <at> gmail.com
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Tue, 25 Jun 2019 22:07:53 -0400
tags 36237 fixed
close 36237 27.1
quit

> I'll wait a few more days in case something else comes up.

Pushed to master.

b59ffd2290 2019-06-25T22:00:03-04:00 "Support (rx (and (regexp EXPR) (literal EXPR))) (Bug#36237)"
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=b59ffd2290ff744ca4e7cc2748ba6b66fb2f99f1





Added tag(s) fixed. Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 26 Jun 2019 02:09:03 GMT) Full text and rfc822 format available.

bug marked as fixed in version 27.1, send any further explanations to 36237 <at> debbugs.gnu.org and Noam Postavsky <npostavs <at> gmail.com> Request was from Noam Postavsky <npostavs <at> gmail.com> to control <at> debbugs.gnu.org. (Wed, 26 Jun 2019 02:09:03 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Wed, 26 Jun 2019 12:25:01 GMT) Full text and rfc822 format available.

Message #86 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Wed, 26 Jun 2019 13:23:53 +0100
On Tue 25 Jun 2019, Noam Postavsky wrote:

> tags 36237 fixed
> close 36237 27.1
> quit
>
>> I'll wait a few more days in case something else comes up.
>
> Pushed to master.
>
> b59ffd2290 2019-06-25T22:00:03-04:00 "Support (rx (and (regexp EXPR) (literal EXPR))) (Bug#36237)"
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=b59ffd2290ff744ca4e7cc2748ba6b66fb2f99f1

With "emacs -Q" from emacs-26:

  (setq foo (rx (group (or "abc" "def"))))
  => "\\(\\(?:abc\\|def\\)\\)"

  (setq bar (rx-to-string `(regexp ,foo)))
  => "\\(?:\\(\\(?:abc\\|def\\)\\)\\)"

  (setq bar (rx (regexp foo)))
  => error "rx form ‘regexp’ requires args satisfying ‘stringp’"

With "emacs -Q from master:

  (setq foo (rx (group (or "abc" "def"))))
  => "\\(\\(?:abc\\|def\\)\\)"

  (setq bar (rx-to-string `(regexp ,foo)))
  => ("\\(?:" "\\)")       ; unexpected result

  (setq bar (rx (regexp foo)))
  => "\\(?:\\(\\(?:abc\\|def\\)\\)\\)"

Please fix this regression.

    AndyM





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Wed, 26 Jun 2019 12:57:05 GMT) Full text and rfc822 format available.

Message #89 received at 36237 <at> debbugs.gnu.org (full text, mbox):

From: Noam Postavsky <npostavs <at> gmail.com>
To: Andy Moreton <andrewjmoreton <at> gmail.com>
Cc: 36237 <at> debbugs.gnu.org
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Wed, 26 Jun 2019 08:56:11 -0400
Andy Moreton <andrewjmoreton <at> gmail.com> writes:

>   (setq bar (rx-to-string `(regexp ,foo)))
>   => ("\\(?:" "\\)")       ; unexpected result

Oops, should be fixed now.

9233865b70 2019-06-26T08:50:27-04:00 "Fix (rx-to-string (and (literal STR) (regexp STR)) regression"
https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=9233865b7005831e63755eb84ae7da060f878a55





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#36237; Package emacs. (Wed, 26 Jun 2019 13:09:01 GMT) Full text and rfc822 format available.

Message #92 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Andy Moreton <andrewjmoreton <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: bug#36237: Support (rx (and (regexp EXPR) (regexp-quote EXPR)))
Date: Wed, 26 Jun 2019 14:08:47 +0100
On Wed 26 Jun 2019, Noam Postavsky wrote:

> Andy Moreton <andrewjmoreton <at> gmail.com> writes:
>
>>   (setq bar (rx-to-string `(regexp ,foo)))
>>   => ("\\(?:" "\\)")       ; unexpected result
>
> Oops, should be fixed now.
>
> 9233865b70 2019-06-26T08:50:27-04:00 "Fix (rx-to-string (and (literal STR) (regexp STR)) regression"
> https://git.savannah.gnu.org/cgit/emacs.git/commit/?id=9233865b7005831e63755eb84ae7da060f878a55

Thanks Noam, that's all back to normal again.

    AndyM





bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 25 Jul 2019 11:24:09 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 286 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.