GNU bug report logs - #64939
30.0.50; The default auto-mode-interpreter-regexp does not match env with flags

Previous Next

Package: emacs;

Reported by: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>

Date: Sat, 29 Jul 2023 20:30:02 UTC

Severity: normal

Found in version 30.0.50

Done: Eli Zaretskii <eliz <at> gnu.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 64939 in the body.
You can then email your comments to 64939 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sat, 29 Jul 2023 20:30:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 29 Jul 2023 20:30:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 30.0.50; The default auto-mode-interpreter-regexp does not match
 env with flags
Date: Sat, 29 Jul 2023 22:08:19 +0200
A file without an extension will load ruby-mode if the first line 
is:

   #!/usr/bin/env ruby

but not when the first line is:

   #!/usr/bin/env -S ruby -e 'puts 123'

Is there any reason why the latter should not be matched by the 
default
`auto-mode-interpreter-regexp' value in 'files.el'?


A more useful example I stumbled on today while working on a 
language
server after adding:

`(add-to-list 'interpreter-mode-alist '("elixir" 
. elixir-ts-mode))'


   #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding 
   latin1"

   Node.start(:"next-ls-#{System.system_time()}", :shortnames)
   ....



Wilhelm




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sat, 29 Jul 2023 21:45:01 GMT) Full text and rfc822 format available.

Message #8 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org
Subject: Re: 30.0.50; The default auto-mode-interpreter-regexp does not
 match env with flags
Date: Sat, 29 Jul 2023 23:38:07 +0200
> A file without an extension will load ruby-mode if the first 
> line is:
>
>    #!/usr/bin/env ruby
>
> but not when the first line is:
>
>    #!/usr/bin/env -S ruby -e 'puts 123'
>
> Is there any reason why the latter should not be matched by the
> default
> `auto-mode-interpreter-regexp' value in 'files.el'?
>
>
> A more useful example I stumbled on today while working on a 
> language
> server after adding:
>
> `(add-to-list 'interpreter-mode-alist '("elixir" 
> . elixir-ts-mode))'
>
>
>    #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding
>      latin1"
>
>    Node.start(:"next-ls-#{System.system_time()}", :shortnames)
>    ....
>
>
>
> Wilhelm

This is a very naive solution to the above, but I am probably 
missing some
knowledge here and will break for anyone setting the var to 
something
custom.

modified   lisp/files.el
@@ -3243,7 +3243,7 @@ inhibit-local-variables-p

(defvar auto-mode-interpreter-regexp
  (purecopy "#![ \t]?\\([^ \t\n]*\
-/bin/env[ \t]\\)?\\([^ \t\n]+\\)")
+/bin/env[ \t]\\)?\\(-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)?\\([^ 
\t\n]+\\)")
  "Regexp matching interpreters, for file mode determination.
This regular expression is matched against the first line of a 
file
to determine the file's mode in `set-auto-mode'.  If it matches, 
the file
@@ -3445,7 +3445,7 @@ set-auto-mode
	 (setq mode (save-excursion
		      (goto-char (point-min))
		      (if (looking-at 
auto-mode-interpreter-regexp)
-			  (match-string 2))))
+			  (match-string 3))))
	 ;; Map interpreter name to a mode, signaling we're done 
at the
	 ;; same time.
	 (setq done (assoc-default

modified   lisp/progmodes/sh-script.el
@@ -1481,7 +1481,7 @@ sh--guess-shell
  (cond ((save-excursion
           (goto-char (point-min))
           (looking-at auto-mode-interpreter-regexp))
-         (match-string 2))
+         (match-string 3))
        ((not buffer-file-name) sh-shell-file)
        ;; Checks that use `buffer-file-name' follow.
        ((string-match "\\.m?spec\\'" buffer-file-name) "rpm")




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sun, 30 Jul 2023 04:54:01 GMT) Full text and rfc822 format available.

Message #11 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50;
 The default auto-mode-interpreter-regexp does not match env with flags
Date: Sun, 30 Jul 2023 07:53:42 +0300
> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
> Date: Sat, 29 Jul 2023 22:08:19 +0200
> 
> 
> A file without an extension will load ruby-mode if the first line 
> is:
> 
>     #!/usr/bin/env ruby
> 
> but not when the first line is:
> 
>     #!/usr/bin/env -S ruby -e 'puts 123'
> 
> Is there any reason why the latter should not be matched by the 
> default
> `auto-mode-interpreter-regexp' value in 'files.el'?

That line _is_ matched by auto-mode-interpreter-regexp:

  (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env -S ruby -e 'puts 123'") => 0

The problem is how to find the name of the interpreter if the text
after "/usr/bin/env" includes more than one word?  Once we start using
command-line switches and their arguments, and take into consideration
that many GNU/Linux programs can freely intersperse options and
non-option arguments on the command line in any order, where does this
end?

> A more useful example I stumbled on today while working on a 
> language
> server after adding:
> 
> `(add-to-list 'interpreter-mode-alist '("elixir" 
> . elixir-ts-mode))'
> 
> 
>     #!/usr/bin/env -S elixir --erl "-kernel standard_io_encoding 
>     latin1"

How about making a script that invokes elixir with those arguments,
and then augment interpreter-mode-alist to name that script instead?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sun, 30 Jul 2023 05:05:02 GMT) Full text and rfc822 format available.

Message #14 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50;
 The default auto-mode-interpreter-regexp does not match env with flags
Date: Sun, 30 Jul 2023 08:04:53 +0300
> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
> Date: Sat, 29 Jul 2023 23:38:07 +0200
> 
> This is a very naive solution to the above, but I am probably
> missing some knowledge here and will break for anyone setting the
> var to something custom.

Feel free to make this change locally, but I don't see how this can be
general enough for us to install it as the default value.

For starters, 'env' can be invoked with several options, not just with
one.  Also, some 'env' options accept arguments, and how do we know if
the word that follows "env -OPTION" is the command to check against
interpreter-mode-alist or an argument of an option?

IOW, I don't think this is a problem for a regexp-based solution.  If
we want to support such complex shebang lines (btw, does the Posix or
GNU/Linux shell support them?), we should analyze the text after "env"
to find the candidate interpreter.  Not sure whether even that will
provide a robust solution.

Btw, can't you satisfy your needs via file-local variables?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sun, 30 Jul 2023 09:38:02 GMT) Full text and rfc822 format available.

Message #17 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Sun, 30 Jul 2023 10:28:14 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
>> Date: Sat, 29 Jul 2023 22:08:19 +0200
>> 
>> 
>> A file without an extension will load ruby-mode if the first 
>> line 
>> is:
>> 
>>     #!/usr/bin/env ruby
>> 
>> but not when the first line is:
>> 
>>     #!/usr/bin/env -S ruby -e 'puts 123'
>> 
>> Is there any reason why the latter should not be matched by the 
>> default
>> `auto-mode-interpreter-regexp' value in 'files.el'?
>
> That line _is_ matched by auto-mode-interpreter-regexp:
>
>   (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env 
>   -S ruby -e 'puts 123'") => 0
>
> The problem is how to find the name of the interpreter if the 
> text
> after "/usr/bin/env" includes more than one word?  Once we start 
> using
> command-line switches and their arguments, and take into 
> consideration
> that many GNU/Linux programs can freely intersperse options and
> non-option arguments on the command line in any order, where 
> does this
> end?
>

I am hoping there is some way of effectively matching command-line
switches for '/usr/bin/env', but sounds like it is perhaps too 
complex?

This probably ends with a match to some common command-line 
variations
for the '/usr/bin/env' program.  It is not complete now, so make 
it
guess slightly better is perhaps appropriate?

>> A more useful example I stumbled on today while working on a 
>> language
>> server after adding:
>> 
>> `(add-to-list 'interpreter-mode-alist '("elixir" 
>> . elixir-ts-mode))'
>> 
>> 
>>     #!/usr/bin/env -S elixir --erl "-kernel 
>>     standard_io_encoding 
>>     latin1"
>
> How about making a script that invokes elixir with those 
> arguments,
> and then augment interpreter-mode-alist to name that script 
> instead?

That is an option, but then I have to convince the maintainers of
all the packages I work on to make this change, vs my editor 
handling this.
Even locally changing the regexp means that the match index has 
changed, so
can't see this working without a patch.

If this is perhaps too much work and/or uncertainty for negligible
benefit to users, then it won't cause distress on my side :).

Wilhelm




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sun, 30 Jul 2023 09:45:01 GMT) Full text and rfc822 format available.

Message #20 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Sun, 30 Jul 2023 11:38:06 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
>> Date: Sat, 29 Jul 2023 23:38:07 +0200
>> 
>> This is a very naive solution to the above, but I am probably
>> missing some knowledge here and will break for anyone setting 
>> the
>> var to something custom.
>
> Feel free to make this change locally, but I don't see how this 
> can be
> general enough for us to install it as the default value.
>

The problem is that even with a local change, the match group 2 is 
hard
coded. 

> For starters, 'env' can be invoked with several options, not 
> just with
> one.  Also, some 'env' options accept arguments, and how do we 
> know if
> the word that follows "env -OPTION" is the command to check 
> against
> interpreter-mode-alist or an argument of an option?
>

Understand, it is a bit complex perhaps. 

> IOW, I don't think this is a problem for a regexp-based 
> solution.  If
> we want to support such complex shebang lines (btw, does the 
> Posix or
> GNU/Linux shell support them?), we should analyze the text after 
> "env"
> to find the candidate interpreter.  Not sure whether even that 
> will
> provide a robust solution.
>

I can perhaps have a look if there is something concrete about how 
this
can be interpreted. 

> Btw, can't you satisfy your needs via file-local variables?

Not without convincing the project maintainers to add Emacs 
specific
lines to the code, which I don't really think is appropriate in 
some
cases.

It is not a common occurrence to run into this issue, but thought 
is
strange to not work as expected.  If I am the only one, then happy 
to
close this and just keep a local patch.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sun, 30 Jul 2023 10:04:01 GMT) Full text and rfc822 format available.

Message #23 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Sun, 30 Jul 2023 13:03:27 +0300
> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
> Cc: 64939 <at> debbugs.gnu.org
> Date: Sun, 30 Jul 2023 10:28:14 +0200
> 
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > That line _is_ matched by auto-mode-interpreter-regexp:
> >
> >   (string-match-p auto-mode-interpreter-regexp "#!/usr/bin/env 
> >   -S ruby -e 'puts 123'") => 0
> >
> > The problem is how to find the name of the interpreter if the text
> > after "/usr/bin/env" includes more than one word?  Once we start
> > using command-line switches and their arguments, and take into
> > consideration that many GNU/Linux programs can freely intersperse
> > options and non-option arguments on the command line in any order,
> > where does this end?
> 
> I am hoping there is some way of effectively matching command-line
> switches for '/usr/bin/env', but sounds like it is perhaps too 
> complex?

How can we do that without incurring non-trivial maintenance costs?
'env' is being actively developed; e.g., the old version I have where
I'm typing this doesn't even have the -S option.  Are we supposed to
track every new command-line option added to 'env', and update our
code accordingly?  That is even impractical, because someone could use
2-year old Emacs with Coreutils released just yesterday.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sun, 30 Jul 2023 10:05:02 GMT) Full text and rfc822 format available.

Message #26 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Sun, 30 Jul 2023 13:04:42 +0300
> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
> Cc: 64939 <at> debbugs.gnu.org
> Date: Sun, 30 Jul 2023 11:38:06 +0200
> 
> > Btw, can't you satisfy your needs via file-local variables?
> 
> Not without convincing the project maintainers to add Emacs specific
> lines to the code, which I don't really think is appropriate in some
> cases.

Why not?  Many projects add such variables, both for Emacs and for
Vim.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Mon, 31 Jul 2023 07:11:01 GMT) Full text and rfc822 format available.

Message #29 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Sun, 30 Jul 2023 12:27:38 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
>> Cc: 64939 <at> debbugs.gnu.org
>> Date: Sun, 30 Jul 2023 10:28:14 +0200
>> 
>> 
>> Eli Zaretskii <eliz <at> gnu.org> writes:
>> 
>> > That line _is_ matched by auto-mode-interpreter-regexp:
>> >
>> >   (string-match-p auto-mode-interpreter-regexp 
>> >   "#!/usr/bin/env 
>> >   -S ruby -e 'puts 123'") => 0
>> >
>> > The problem is how to find the name of the interpreter if the 
>> > text
>> > after "/usr/bin/env" includes more than one word?  Once we 
>> > start
>> > using command-line switches and their arguments, and take 
>> > into
>> > consideration that many GNU/Linux programs can freely 
>> > intersperse
>> > options and non-option arguments on the command line in any 
>> > order,
>> > where does this end?
>> 
>> I am hoping there is some way of effectively matching 
>> command-line
>> switches for '/usr/bin/env', but sounds like it is perhaps too 
>> complex?
>
> How can we do that without incurring non-trivial maintenance 
> costs?
> 'env' is being actively developed; e.g., the old version I have 
> where
> I'm typing this doesn't even have the -S option.  Are we 
> supposed to
> track every new command-line option added to 'env', and update 
> our
> code accordingly?  That is even impractical, because someone 
> could use
> 2-year old Emacs with Coreutils released just yesterday.

No, I don't think we should track every command-line option added, 
but
just allow to match command-line options for env ( maybe this is 
the
non-trivial part).  I don't understand why it will have an impact 
for
someone who is using a version 5+ year old version of Coreutils.

If it was possible for a user to add configuration to 
set-auto-mode
script files like the ones mentioned above I think this will be 
less of
an issue.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Mon, 31 Jul 2023 07:17:01 GMT) Full text and rfc822 format available.

Message #32 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Mon, 31 Jul 2023 09:11:49 +0200
Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
>> Cc: 64939 <at> debbugs.gnu.org
>> Date: Sun, 30 Jul 2023 11:38:06 +0200
>> 
>> > Btw, can't you satisfy your needs via file-local variables?
>> 
>> Not without convincing the project maintainers to add Emacs 
>> specific
>> lines to the code, which I don't really think is appropriate in 
>> some
>> cases.
>
> Why not?  Many projects add such variables, both for Emacs and 
> for
> Vim.

Sure, but its certainly not common on projects I work on where the 
vast
majority of the contributors don't care about Emacs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Mon, 31 Jul 2023 17:46:02 GMT) Full text and rfc822 format available.

Message #35 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Juri Linkov <juri <at> linkov.net>
To: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
Cc: 64939 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Mon, 31 Jul 2023 20:38:00 +0300
> The problem is that even with a local change, the match group 2 is hard
> coded.

You could use a shy group \(?: ... \)
in your customized value of 'auto-mode-interpreter-regexp'
that doesn't change the match index from 2 to 3.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Tue, 01 Aug 2023 06:22:02 GMT) Full text and rfc822 format available.

Message #38 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: Juri Linkov <juri <at> linkov.net>
Cc: 64939 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Tue, 01 Aug 2023 08:20:49 +0200
Juri Linkov <juri <at> linkov.net> writes:

>> The problem is that even with a local change, the match group 2 
>> is hard
>> coded.
>
> You could use a shy group \(?: ... \)
> in your customized value of 'auto-mode-interpreter-regexp'
> that doesn't change the match index from 2 to 3.


Fantastic, thanks.  I was not aware that existed. 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Thu, 01 Feb 2024 05:19:02 GMT) Full text and rfc822 format available.

Message #41 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Malcolm Cook <malcolm.cook <at> gmail.com>
To: 64939 <at> debbugs.gnu.org
Subject: bug#64939
Date: Wed, 31 Jan 2024 13:52:09 -0600
[Message part 1 (text/plain, inline)]
I find allowing the shy regexp to match zero or more times
(using a '*' instead of '?') solves not only the use case of
including -S as an option, but also can support other options to env.

I have this now in my init.el:

(setq auto-mode-interpreter-regexp
;; support -S and other options to /bin/env.  See
;; https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
  (purecopy "#![ \t]?\\([^ \t\n]*\
/bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
\t\n]+\\)"))

Hooray?

Why not patch lisp/files.el accordingly?

~ Malcolm Cook
[Message part 2 (text/html, inline)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Thu, 01 Feb 2024 18:54:01 GMT) Full text and rfc822 format available.

Message #44 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Malcolm Cook <malcolm.cook <at> gmail.com>
To: bug-gnu-emacs <at> gnu.org, 64939 <at> debbugs.gnu.org
Subject: bug#64939
Date: Thu, 1 Feb 2024 12:52:39 -0600
Regarding [1] allowing emacs to recognize shebang lines containing
calls to /bin/env with options (such as -S as allowed in new core
utils [2])...

I prefer allowing the proposed "shy" regexp to match zero or more
times (using a '*' instead of '?').

To wit, I have this now in my init.el:

(setq auto-mode-interpreter-regexp
      ;; Support shbang line calling `/bin/env` with `-S` (and/or
other options).
      ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
      (purecopy "#![ \t]?\\([^ \t\n]*\
/bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
\t\n]+\\)"))

[1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
[2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation

YMMV?

~ Malcolm Cook




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Thu, 01 Feb 2024 18:54:02 GMT) Full text and rfc822 format available.

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sat, 10 Feb 2024 08:56:01 GMT) Full text and rfc822 format available.

Message #50 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Malcolm Cook <malcolm.cook <at> gmail.com>,
 Kévin Le Gouguec <kevin.legouguec <at> gmail.com>
Cc: 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939:
Date: Sat, 10 Feb 2024 10:27:10 +0200
> From: Malcolm Cook <malcolm.cook <at> gmail.com>
> Date: Thu, 1 Feb 2024 12:52:39 -0600
> 
> Regarding [1] allowing emacs to recognize shebang lines containing
> calls to /bin/env with options (such as -S as allowed in new core
> utils [2])...
> 
> I prefer allowing the proposed "shy" regexp to match zero or more
> times (using a '*' instead of '?').
> 
> To wit, I have this now in my init.el:
> 
> (setq auto-mode-interpreter-regexp
>       ;; Support shbang line calling `/bin/env` with `-S` (and/or
> other options).
>       ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
>       (purecopy "#![ \t]?\\([^ \t\n]*\
> /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
> \t\n]+\\)"))
> 
> [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
> [2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation
> 
> YMMV?

Kevin, any comments about the proposals in this bug report?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sat, 10 Feb 2024 10:24:02 GMT) Full text and rfc822 format available.

Message #53 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Kévin Le Gouguec <kevin.legouguec <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>,
 Malcolm Cook <malcolm.cook <at> gmail.com>, 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Sat, 10 Feb 2024 11:23:01 +0100
Thanks for the CC, this report had completely slipped past my notice
when I worked on bug#66902, and so did Malcolm's follow-ups.

Boldly adding Wilhelm as well, since I am not 100% sure Debbugs sends a
copy of every message in a report to their OP.

Comments below.

Eli Zaretskii <eliz <at> gnu.org> writes:

>> From: Malcolm Cook <malcolm.cook <at> gmail.com>
>> Date: Thu, 1 Feb 2024 12:52:39 -0600
>> 
>> Regarding [1] allowing emacs to recognize shebang lines containing
>> calls to /bin/env with options (such as -S as allowed in new core
>> utils [2])...
>> 
>> I prefer allowing the proposed "shy" regexp to match zero or more
>> times (using a '*' instead of '?').
>> 
>> To wit, I have this now in my init.el:
>> 
>> (setq auto-mode-interpreter-regexp
>>       ;; Support shbang line calling `/bin/env` with `-S` (and/or
>> other options).
>>       ;; c.f. https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
>>       (purecopy "#![ \t]?\\([^ \t\n]*\
>> /bin/env[ \t]\\)?\\(?:-\\{1,2\\}[a-zA-Z1-9=]+[ \t]+\\)*\\([^
>> \t\n]+\\)"))

IIUC this would be a more lax variant of what we installed for
bug#66902, can you confirm Malcolm?  This is what the current regexp
looks like on the master branch:

  (purecopy
   (concat
    "#![ \t]*"
    ;; Optional group 1: env(1) invocation.
    "\\("
    "[^ \t\n]*/bin/env[ \t]*"
    "\\(?:-S[ \t]*\\|--split-string\\(?:=\\|[ \t]*\\)\\)?"
    "\\)?"
    ;; Group 2: interpreter.
    "\\([^ \t\n]+\\)"))

And the corresponding test cases:

(ert-deftest files-tests-auto-mode-interpreter ()
  "Test that `set-auto-mode' deduces correct modes from shebangs."
  (files-tests--check-shebang "#!/bin/bash" 'sh-mode)
  (files-tests--check-shebang "#!/usr/bin/env bash" 'sh-mode)
  (files-tests--check-shebang "#!/usr/bin/env python" 'python-base-mode)
  (files-tests--check-shebang "#!/usr/bin/env python3" 'python-base-mode)
  (files-tests--check-shebang "#!/usr/bin/env -S awk -v FS=\"\\t\" -v OFS=\"\\t\" -f" 'awk-mode)
  (files-tests--check-shebang "#!/usr/bin/env -S make -f" 'makefile-mode)
  (files-tests--check-shebang "#!/usr/bin/make -f" 'makefile-mode))

Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we
sophisticate the regexp further?  FWIW, in no particular order:

(a) env(1) does seem to support mixing up arbitrary options with -S¹, so
    in principle it would make sense to support that;

(b) Eli did not seem too found of the regexp hammer², so I don't know
    which direction we'd want to go between maximally correct (accept
    all arguments, _as long as_ -S|--split-string is in there) or good
    enough (just skip over --everything --that --looks --like -a
    --switch).

(c) FWIW the "maximally correct" regexp might not be _that_ ugly, since
    "-[v]S[OPTION]" must be the *first* token after env; in other words
    no need to support --some-option --split-string --more-options.

>> [1] https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939
>> [2] https://www.gnu.org/software/coreutils/manual/html_node/env-invocation.html#env-invocation
>> 
>> YMMV?
>
> Kevin, any comments about the proposals in this bug report?

Comments above; footnotes below.  Again, thanks for the heads up.

¹ $ cat demo.sh
  #!/usr/bin/env -vS -uFOOBAR bash -eux

  echo hi
  echo $FOOBAR
  echo bye
  $ FOOBAR=totally-set ./demo.sh
  split -S:  ‘ -uFOOBAR bash -eux’
   into:    ‘-uFOOBAR’
       &    ‘bash’
       &    ‘-eux’
  unset:    FOOBAR
  executing: bash
     arg[0]= ‘bash’
     arg[1]= ‘-eux’
     arg[2]= ‘./foo.sh’
  + echo hi
  hi
  ./foo.sh: line 4: FOOBAR: unbound variable

² https://debbugs.gnu.org/cgi/bugreport.cgi?bug=64939#14




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sat, 10 Feb 2024 17:09:01 GMT) Full text and rfc822 format available.

Message #56 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Kévin Le Gouguec <kevin.legouguec <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>,
 Malcolm Cook <malcolm.cook <at> gmail.com>, 64939 <at> debbugs.gnu.org
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp
 does not match env with flags
Date: Sat, 10 Feb 2024 18:08:18 +0100
[Message part 1 (text/plain, inline)]
Kévin Le Gouguec <kevin.legouguec <at> gmail.com> writes:

> Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we
> sophisticate the regexp further?  FWIW, in no particular order:
>
> (a) env(1) does seem to support mixing up arbitrary options with -S¹, so
>     in principle it would make sense to support that;
>
> (b) Eli did not seem too found of the regexp hammer², so I don't know
>     which direction we'd want to go between maximally correct (accept
>     all arguments, _as long as_ -S|--split-string is in there) or good
>     enough (just skip over --everything --that --looks --like -a
>     --switch).
>
> (c) FWIW the "maximally correct" regexp might not be _that_ ugly, since
>     "-[v]S[OPTION]" must be the *first* token after env; in other words
>     no need to support --some-option --split-string --more-options.

Well, sorry, couldn't resist.  How do the attached patches look?  The
new testcases should tell the whole story.

('make && make -C test files-tests' seems none the worse for wear)

[0001-Refine-shebang-tests-bug-64939.patch (text/x-patch, attachment)]
[0002-Support-more-complex-env-invocations-in-shebang-line.patch (text/x-patch, attachment)]
[0003-Support-shebang-lines-with-amended-environment.patch (text/x-patch, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Sat, 10 Feb 2024 17:25:01 GMT) Full text and rfc822 format available.

Message #59 received at 64939 <at> debbugs.gnu.org (full text, mbox):

From: Malcolm Cook <malcolm.cook <at> gmail.com>
To: Kévin Le Gouguec <kevin.legouguec <at> gmail.com>
Cc: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>, 64939 <at> debbugs.gnu.org,
 Eli Zaretskii <eliz <at> gnu.org>
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does
 not match env with flags
Date: Sat, 10 Feb 2024 11:23:11 -0600
Hooray - thanks - all seems perfect to me

On Sat, Feb 10, 2024 at 11:08 AM Kévin Le Gouguec
<kevin.legouguec <at> gmail.com> wrote:
>
> Kévin Le Gouguec <kevin.legouguec <at> gmail.com> writes:
>
> > Is this Good Enough™ for your purposes (Malcolm, Wilhelm), or should we
> > sophisticate the regexp further?  FWIW, in no particular order:
> >
> > (a) env(1) does seem to support mixing up arbitrary options with -S¹, so
> >     in principle it would make sense to support that;
> >
> > (b) Eli did not seem too found of the regexp hammer², so I don't know
> >     which direction we'd want to go between maximally correct (accept
> >     all arguments, _as long as_ -S|--split-string is in there) or good
> >     enough (just skip over --everything --that --looks --like -a
> >     --switch).
> >
> > (c) FWIW the "maximally correct" regexp might not be _that_ ugly, since
> >     "-[v]S[OPTION]" must be the *first* token after env; in other words
> >     no need to support --some-option --split-string --more-options.
>
> Well, sorry, couldn't resist.  How do the attached patches look?  The
> new testcases should tell the whole story.
>
> ('make && make -C test files-tests' seems none the worse for wear)
>


-- 
~ Malcolm Cook




Reply sent to Eli Zaretskii <eliz <at> gnu.org>:
You have taken responsibility. (Sat, 17 Feb 2024 08:34:02 GMT) Full text and rfc822 format available.

Notification sent to Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>:
bug acknowledged by developer. (Sat, 17 Feb 2024 08:34:02 GMT) Full text and rfc822 format available.

Message #64 received at 64939-done <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Malcolm Cook <malcolm.cook <at> gmail.com>
Cc: wkirschbaum <at> gmail.com, 64939-done <at> debbugs.gnu.org,
 kevin.legouguec <at> gmail.com
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does
 not match env with flags
Date: Sat, 17 Feb 2024 10:33:26 +0200
> From: Malcolm Cook <malcolm.cook <at> gmail.com>
> Date: Sat, 10 Feb 2024 11:23:11 -0600
> Cc: Eli Zaretskii <eliz <at> gnu.org>, Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>, 64939 <at> debbugs.gnu.org
> 
> Hooray - thanks - all seems perfect to me

Thanks, so I've now installed these changes on the master branch, and
I'm therefore closing this bug.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#64939; Package emacs. (Wed, 28 Feb 2024 18:11:01 GMT) Full text and rfc822 format available.

Message #67 received at 64939-done <at> debbugs.gnu.org (full text, mbox):

From: Wilhelm Kirschbaum <wkirschbaum <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Malcolm Cook <malcolm.cook <at> gmail.com>, 64939-done <at> debbugs.gnu.org,
 kevin.legouguec <at> gmail.com
Subject: Re: bug#64939: 30.0.50; The default auto-mode-interpreter-regexp does
 not match env with flags
Date: Wed, 28 Feb 2024 19:57:26 +0200
[Message part 1 (text/plain, inline)]
> Thanks, so I've now installed these changes on the master branch, and
> I'm therefore closing this bug.
>

Thank you, this works! ( I am not sure if it's fine for me to reply to a
closed bug, but just wanted to express my gratitude :) ).
[Message part 2 (text/html, inline)]

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 28 Mar 2024 11:24:08 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 41 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.