GNU bug report logs - #51832
Piping unicode text in `shell-command'

Previous Next

Package: emacs;

Reported by: Tor Kringeland <tor.a.s.kringeland <at> ntnu.no>

Date: Sun, 14 Nov 2021 07:06:02 UTC

Severity: normal

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 51832 in the body.
You can then email your comments to 51832 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 07:06:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Tor Kringeland <tor.a.s.kringeland <at> ntnu.no>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sun, 14 Nov 2021 07:06:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Tor Kringeland <tor.a.s.kringeland <at> ntnu.no>
To: bug-gnu-emacs <at> gnu.org
Subject: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 04:10:10 +0100
Running

  (shell-command "echo -n '悟' | pbcopy")

or

  (shell-command "echo -n 'øøøø' | pbcopy")

fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
the same commands in a terminal emulator outside Emacs I get back the
original input.  The same happens if I run the same shell commands in
`eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
on macOS Catalina.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 07:28:02 GMT) Full text and rfc822 format available.

Message #8 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Tor Kringeland <tor.a.s.kringeland <at> ntnu.no>
Cc: 51832 <at> debbugs.gnu.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 09:26:48 +0200
> From: Tor Kringeland <tor.a.s.kringeland <at> ntnu.no>
> Date: Sun, 14 Nov 2021 04:10:10 +0100
> 
> Running
> 
>   (shell-command "echo -n '悟' | pbcopy")
> 
> or
> 
>   (shell-command "echo -n 'øøøø' | pbcopy")
> 
> fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
> the same commands in a terminal emulator outside Emacs I get back the
> original input.  The same happens if I run the same shell commands in
> `eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
> on macOS Catalina.

Please be specific about the "recent build" part: which commit are you
using?  There were some problems with the clipboard that were recently
fixed.

Also, do older versions of Emacs behave differently with that command?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 07:55:01 GMT) Full text and rfc822 format available.

Message #11 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: Tor Kringeland <tor.a.s.kringeland <at> ntnu.no>, 51832 <at> debbugs.gnu.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 08:53:51 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

>> Running
>> 
>>   (shell-command "echo -n '悟' | pbcopy")
>> 
>> or
>> 
>>   (shell-command "echo -n 'øøøø' | pbcopy")
>> 
>> fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
>> the same commands in a terminal emulator outside Emacs I get back the
>> original input.  The same happens if I run the same shell commands in
>> `eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
>> on macOS Catalina.
>
> Please be specific about the "recent build" part: which commit are you
> using?

I'm seeing the same issue with the current tree on Macos.

> There were some problems with the clipboard that were recently fixed.

This doesn't involve Emacs' interactions with the clipboard, though --
the pbcopy command is what's putting things on the clipboard.  But
pbcopy's apparently misinterpreting the bytes it's getting over the pipe
somehow, which is surprising, because I assumed shell-command just sent
the entire string to a shell for execution.  (But I haven't read the
code.)

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 08:15:01 GMT) Full text and rfc822 format available.

Message #14 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: tor.a.s.kringeland <at> ntnu.no, 51832 <at> debbugs.gnu.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 10:13:47 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Tor Kringeland <tor.a.s.kringeland <at> ntnu.no>,  51832 <at> debbugs.gnu.org
> Date: Sun, 14 Nov 2021 08:53:51 +0100
> 
> >>   (shell-command "echo -n '悟' | pbcopy")
> >> 
> >> or
> >> 
> >>   (shell-command "echo -n 'øøøø' | pbcopy")
> >> 
> >> fills the clipboard with `ÊÇü' and `√∏', respectively, while if I run
> >> the same commands in a terminal emulator outside Emacs I get back the
> >> original input.  The same happens if I run the same shell commands in
> >> `eshell'.  This happens when I run a recent build of Emacs 29 with `-Q'
> >> on macOS Catalina.
> >
> > Please be specific about the "recent build" part: which commit are you
> > using?
> 
> I'm seeing the same issue with the current tree on Macos.
> 
> > There were some problems with the clipboard that were recently fixed.
> 
> This doesn't involve Emacs' interactions with the clipboard, though --
> the pbcopy command is what's putting things on the clipboard.  But
> pbcopy's apparently misinterpreting the bytes it's getting over the pipe
> somehow, which is surprising, because I assumed shell-command just sent
> the entire string to a shell for execution.  (But I haven't read the
> code.)

It could be useful to replace the pipe with redirection to a file, and
see what you get when invoking the command from Emacs and from a shell
prompt outside Emacs.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 08:19:02 GMT) Full text and rfc822 format available.

Message #17 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: tor.a.s.kringeland <at> ntnu.no, 51832 <at> debbugs.gnu.org,
 Alan Third <alan <at> idiocy.org>
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 09:18:08 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> It could be useful to replace the pipe with redirection to a file, and
> see what you get when invoking the command from Emacs and from a shell
> prompt outside Emacs.

Good point.  I tried that now (with "| cat > /tmp/" to get a pipe in
there), and the contents that were written to file were correct utf-8.

Mysterious.  Could the problem be in pbcopy -- that's assuming something
about the coding system when run from inside Emacs somehow?  That
doesn't sound very likely, but...

I've added Alan to the CCs; perhaps he has some insights here.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 08:26:02 GMT) Full text and rfc822 format available.

Message #20 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: tor.a.s.kringeland <at> ntnu.no, 51832 <at> debbugs.gnu.org, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 10:25:29 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: tor.a.s.kringeland <at> ntnu.no,  51832 <at> debbugs.gnu.org, Alan Third
>  <alan <at> idiocy.org>
> Date: Sun, 14 Nov 2021 09:18:08 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > It could be useful to replace the pipe with redirection to a file, and
> > see what you get when invoking the command from Emacs and from a shell
> > prompt outside Emacs.
> 
> Good point.  I tried that now (with "| cat > /tmp/" to get a pipe in
> there), and the contents that were written to file were correct utf-8.
> 
> Mysterious.  Could the problem be in pbcopy -- that's assuming something
> about the coding system when run from inside Emacs somehow?  That
> doesn't sound very likely, but...

Maybe we set some locale-related environment variable, and that was
confuses pbcopy when it is run from Emacs?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 09:20:02 GMT) Full text and rfc822 format available.

Message #23 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: tor.a.s.kringeland <at> ntnu.no, 51832 <at> debbugs.gnu.org, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 10:19:13 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> Maybe we set some locale-related environment variable, and that was
> confuses pbcopy when it is run from Emacs?

I've now followed the call tree, and we end up doing:

(call-process-region (point) (point) shell-file-name nil
                     (current-buffer) nil shell-command-switch
                     "echo foo😀bar | pbcopy")

And that fails, too.  So it's not something that shell-command sets up
(if it's a locale-related thing).

Hm...  Oh!  I thought the original report said that this worked if run
under M-x shell.  But it doesn't -- I get the same garbled selection.
(And it works fine in a shell outside Emacs.)

So it could indeed be a locale setting in Emacs that's making pbcopy do
the wrong thing. 

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 09:34:01 GMT) Full text and rfc822 format available.

Message #26 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: tor.a.s.kringeland <at> ntnu.no, 51832 <at> debbugs.gnu.org, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 10:32:58 +0100
It's a bug in...  the locale settings.  Testing in the console,

echo fóo | LANG=en_US.utf-8 pbcopy

works fine, but

echo fóo | LANG=en_NO.utf-8 pbcopy

doesn't.  And that's the setting in Emacs for me.  It's correct that I
am in Norway and that I'm using the English locale, but there's no such
locale as en_NO.utf-8.

Didn't Emacs on Macos recently get some locale-related changes?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 09:47:02 GMT) Full text and rfc822 format available.

Message #29 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: tor.a.s.kringeland <at> ntnu.no, 51832 <at> debbugs.gnu.org, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 10:46:05 +0100
Lars Ingebrigtsen <larsi <at> gnus.org> writes:

> doesn't.  And that's the setting in Emacs for me.  It's correct that I
> am in Norway and that I'm using the English locale, but there's no such
> locale as en_NO.utf-8.
>
> Didn't Emacs on Macos recently get some locale-related changes?

It's this code, I guess, from 2016, so it's not recent:

  NSLocale *locale = [NSLocale currentLocale];

  NSTRACE ("ns_init_locale");

  @try
    {
      /* It seems macOS should probably use UTF-8 everywhere.
         'localeIdentifier' does not specify the encoding, and I can't
         find any way to get the OS to tell us which encoding to use,
         so hard-code '.UTF-8'.  */
      NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
                                     [locale localeIdentifier]];

      /* Set LANG to locale, but not if LANG is already set.  */
      setenv("LANG", [localeID UTF8String], 0);
    }

And...  it's a Macos bug?  Googling a bit seems to say that this does
indeed return invalid locale identifiers -- just language glued together
with the country, resulting in identifiers that doesn't match any
locales the OS knows about.

So...  I don't know what to do about that.  Is there a way to check that
the identifier is valid?

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 10:33:01 GMT) Full text and rfc822 format available.

Message #32 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: tor.a.s.kringeland <at> ntnu.no, 51832 <at> debbugs.gnu.org, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 12:31:57 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: tor.a.s.kringeland <at> ntnu.no,  51832 <at> debbugs.gnu.org,  alan <at> idiocy.org
> Date: Sun, 14 Nov 2021 10:46:05 +0100
> 
>   NSLocale *locale = [NSLocale currentLocale];
> 
>   NSTRACE ("ns_init_locale");
> 
>   @try
>     {
>       /* It seems macOS should probably use UTF-8 everywhere.
>          'localeIdentifier' does not specify the encoding, and I can't
>          find any way to get the OS to tell us which encoding to use,
>          so hard-code '.UTF-8'.  */
>       NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
>                                      [locale localeIdentifier]];
> 
>       /* Set LANG to locale, but not if LANG is already set.  */
>       setenv("LANG", [localeID UTF8String], 0);
>     }
> 
> And...  it's a Macos bug?  Googling a bit seems to say that this does
> indeed return invalid locale identifiers -- just language glued together
> with the country, resulting in identifiers that doesn't match any
> locales the OS knows about.
> 
> So...  I don't know what to do about that.  Is there a way to check that
> the identifier is valid?

I asked once why we push LANG into the environment, instead of calling
setlocale, which would only affect Emacs.  I don't think I saw an
answer to that question, or did I miss it?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 10:42:01 GMT) Full text and rfc822 format available.

Message #35 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Philipp <p.stephani2 <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51832 <at> debbugs.gnu.org, Lars Ingebrigtsen <larsi <at> gnus.org>,
 tor.a.s.kringeland <at> ntnu.no, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 11:41:38 +0100

> Am 14.11.2021 um 11:31 schrieb Eli Zaretskii <eliz <at> gnu.org>:
> 
>> From: Lars Ingebrigtsen <larsi <at> gnus.org>
>> Cc: tor.a.s.kringeland <at> ntnu.no,  51832 <at> debbugs.gnu.org,  alan <at> idiocy.org
>> Date: Sun, 14 Nov 2021 10:46:05 +0100
>> 
>>  NSLocale *locale = [NSLocale currentLocale];
>> 
>>  NSTRACE ("ns_init_locale");
>> 
>>  @try
>>    {
>>      /* It seems macOS should probably use UTF-8 everywhere.
>>         'localeIdentifier' does not specify the encoding, and I can't
>>         find any way to get the OS to tell us which encoding to use,
>>         so hard-code '.UTF-8'.  */
>>      NSString *localeID = [NSString stringWithFormat:@"%@.UTF-8",
>>                                     [locale localeIdentifier]];
>> 
>>      /* Set LANG to locale, but not if LANG is already set.  */
>>      setenv("LANG", [localeID UTF8String], 0);
>>    }
>> 
>> And...  it's a Macos bug?  Googling a bit seems to say that this does
>> indeed return invalid locale identifiers -- just language glued together
>> with the country, resulting in identifiers that doesn't match any
>> locales the OS knows about.
>> 
>> So...  I don't know what to do about that.  Is there a way to check that
>> the identifier is valid?
> 
> I asked once why we push LANG into the environment, instead of calling
> setlocale, which would only affect Emacs.  I don't think I saw an
> answer to that question, or did I miss it?
> 

AIUI the intention is that this should affect subprocesses started from Emacs.  At least that's how I interpret the comment

/* macOS doesn't set any environment variables for the locale when run
   from the GUI. Get the locale from the OS and set LANG.  */





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 10:57:02 GMT) Full text and rfc822 format available.

Message #38 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Philipp <p.stephani2 <at> gmail.com>
Cc: 51832 <at> debbugs.gnu.org, larsi <at> gnus.org, tor.a.s.kringeland <at> ntnu.no,
 alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 12:56:14 +0200
> From: Philipp <p.stephani2 <at> gmail.com>
> Date: Sun, 14 Nov 2021 11:41:38 +0100
> Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,
>  tor.a.s.kringeland <at> ntnu.no,
>  51832 <at> debbugs.gnu.org,
>  alan <at> idiocy.org
> 
> > I asked once why we push LANG into the environment, instead of calling
> > setlocale, which would only affect Emacs.  I don't think I saw an
> > answer to that question, or did I miss it?
> > 
> 
> AIUI the intention is that this should affect subprocesses started from Emacs.  At least that's how I interpret the comment
> 
> /* macOS doesn't set any environment variables for the locale when run
>    from the GUI. Get the locale from the OS and set LANG.  */

Why is that needed?

And if it is needed, how come we are setting LANG to an invalid locale
and the system somehow sets it to the correct locale?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 11:21:01 GMT) Full text and rfc822 format available.

Message #41 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51832 <at> debbugs.gnu.org, Philipp <p.stephani2 <at> gmail.com>,
 tor.a.s.kringeland <at> ntnu.no, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 12:20:03 +0100
Eli Zaretskii <eliz <at> gnu.org> writes:

> And if it is needed, how come we are setting LANG to an invalid locale
> and the system somehow sets it to the correct locale?

LANG outside of Emacs is "" for me on Macos.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 11:49:01 GMT) Full text and rfc822 format available.

Message #44 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Philipp <p.stephani2 <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 51832 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 tor.a.s.kringeland <at> ntnu.no, alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 12:48:21 +0100

> Am 14.11.2021 um 12:20 schrieb Lars Ingebrigtsen <larsi <at> gnus.org>:
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
>> And if it is needed, how come we are setting LANG to an invalid locale
>> and the system somehow sets it to the correct locale?
> 
> LANG outside of Emacs is "" for me on Macos.

For reference, on my Monterey system, only the following variables are initially set when launching Emacs from Finder:

__CF_USER_TEXT_ENCODING=0x1F5:0x0:0x3
__CFBundleIdentifier=org.gnu.Emacs
COMMAND_MODE=unix2003
DISPLAY=/private/tmp/com.apple.launchd.[...]/org.macosforge.xquartz:0
HOME=/Users/p
LOGNAME=p
PATH=/usr/bin:/bin:/usr/sbin:/sbin
SHELL=/opt/homebrew/bin/bash
SSH_AUTH_SOCK=/private/tmp/com.apple.launchd.[...]/Listeners
TMPDIR=/var/folders/hw/[...]/T/
USER=p
XPC_FLAGS=0x0
XPC_SERVICE_NAME=application.org.gnu.Emacs.[...]





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 12:18:01 GMT) Full text and rfc822 format available.

Message #47 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 51832 <at> debbugs.gnu.org, p.stephani2 <at> gmail.com, tor.a.s.kringeland <at> ntnu.no,
 alan <at> idiocy.org
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 14:16:49 +0200
> From: Lars Ingebrigtsen <larsi <at> gnus.org>
> Cc: Philipp <p.stephani2 <at> gmail.com>,  tor.a.s.kringeland <at> ntnu.no,
>   51832 <at> debbugs.gnu.org,  alan <at> idiocy.org
> Date: Sun, 14 Nov 2021 12:20:03 +0100
> 
> Eli Zaretskii <eliz <at> gnu.org> writes:
> 
> > And if it is needed, how come we are setting LANG to an invalid locale
> > and the system somehow sets it to the correct locale?
> 
> LANG outside of Emacs is "" for me on Macos.

And that doesn't work when running applications from inside Emacs?  If
it does work, why do we set LANG in Emacs?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 12:32:02 GMT) Full text and rfc822 format available.

Message #50 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 51832 <at> debbugs.gnu.org, Philipp <p.stephani2 <at> gmail.com>, larsi <at> gnus.org,
 tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 12:31:36 +0000
[Message part 1 (text/plain, inline)]
On Sun, Nov 14, 2021 at 12:56:14PM +0200, Eli Zaretskii wrote:
> > From: Philipp <p.stephani2 <at> gmail.com>
> > Date: Sun, 14 Nov 2021 11:41:38 +0100
> > Cc: Lars Ingebrigtsen <larsi <at> gnus.org>,
> >  tor.a.s.kringeland <at> ntnu.no,
> >  51832 <at> debbugs.gnu.org,
> >  alan <at> idiocy.org
> > 
> > > I asked once why we push LANG into the environment, instead of calling
> > > setlocale, which would only affect Emacs.  I don't think I saw an
> > > answer to that question, or did I miss it?
> > > 
> > 
> > AIUI the intention is that this should affect subprocesses started from Emacs.  At least that's how I interpret the comment
> > 
> > /* macOS doesn't set any environment variables for the locale when run
> >    from the GUI. Get the locale from the OS and set LANG.  */
> 
> Why is that needed?
> 
> And if it is needed, how come we are setting LANG to an invalid locale
> and the system somehow sets it to the correct locale?

macOS itself doesn't set any locale related environment variables, any
application that is running UNIX style commands is expected to set
them itself. The UNIX commands don't themselves pick up the locale
from the system, they rely on the environment variables.

In other words, as with anything UNIXy on macOS, it's a badly thought
out mess.

It seems suspicious to me that we've had this code since Emacs 26, but
only in the last few weeks we've had two complaints about it. Having
dug out my Mac I can't convince it to show any of the errors that have
been reported, so I suspect either the latest version of macOS has
made the locale handling much more strict or has removed a lot of
locales.

I've attached a patch that may do something towards preventing this
problem but ultimately this is a convenience to give a best guess at
choosing the correct dictionary, date format, etc. If we can't easily
fix it then we can drop it and tell people to set it in their init.el
themselves.

-- 
Alan Third
[0001-Only-set-LANG-if-the-ID-is-valid.patch (text/x-diff, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 13:42:02 GMT) Full text and rfc822 format available.

Message #53 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alan Third <alan <at> idiocy.org>
Cc: 51832 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Philipp <p.stephani2 <at> gmail.com>, tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 14:41:38 +0100
Alan Third <alan <at> idiocy.org> writes:

> I've attached a patch that may do something towards preventing this
> problem but ultimately this is a convenience to give a best guess at
> choosing the correct dictionary, date format, etc. If we can't easily
> fix it then we can drop it and tell people to set it in their init.el
> themselves.

That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
still the invalid en_NO.UTF-8 for me.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 14:24:01 GMT) Full text and rfc822 format available.

Message #56 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Philipp <p.stephani2 <at> gmail.com>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 51832 <at> debbugs.gnu.org, Alan Third <alan <at> idiocy.org>,
 Eli Zaretskii <eliz <at> gnu.org>, tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 15:23:03 +0100

> Am 14.11.2021 um 14:41 schrieb Lars Ingebrigtsen <larsi <at> gnus.org>:
> 
> Alan Third <alan <at> idiocy.org> writes:
> 
>> I've attached a patch that may do something towards preventing this
>> problem but ultimately this is a convenience to give a best guess at
>> choosing the correct dictionary, date format, etc. If we can't easily
>> fix it then we can drop it and tell people to set it in their init.el
>> themselves.
> 
> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
> still the invalid en_NO.UTF-8 for me.

Maybe we should add similar logic as iTerm2 (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107): create the locale identifier from language code and country code instead of the current locale identifier, and use setlocale (or better, newlocale) to check whether it's valid, and fall back to en_US.UTF-8 otherwise?



Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 14:29:01 GMT) Full text and rfc822 format available.

Message #59 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Philipp <p.stephani2 <at> gmail.com>
Cc: 51832 <at> debbugs.gnu.org, Alan Third <alan <at> idiocy.org>,
 Eli Zaretskii <eliz <at> gnu.org>, tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 15:28:02 +0100
Philipp <p.stephani2 <at> gmail.com> writes:

>> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
>> still the invalid en_NO.UTF-8 for me.
>
> Maybe we should add similar logic as iTerm2
> (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107):
> create the locale identifier from language code and country code
> instead of the current locale identifier,

I think that's what's Macos is returning -- it's just concatenating
those two codes to get a locale identifier.  (Which is wrong, of
course.)

> and use setlocale (or better, newlocale) to check whether it's valid,

Yes, that sounds good.

> and fall back to en_US.UTF-8 otherwise?

Hm...  I'd rather just leave LANG unset in that case -- it'll probably
lead to fewer glitches, I think.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 15:02:01 GMT) Full text and rfc822 format available.

Message #62 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Daniel Martín <mardani29 <at> yahoo.es>
To: Philipp <p.stephani2 <at> gmail.com>
Cc: Lars Ingebrigtsen <larsi <at> gnus.org>, 51832 <at> debbugs.gnu.org,
 tor.a.s.kringeland <at> ntnu.no, Alan Third <alan <at> idiocy.org>
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 16:01:17 +0100
Philipp <p.stephani2 <at> gmail.com> writes:

>> Am 14.11.2021 um 14:41 schrieb Lars Ingebrigtsen <larsi <at> gnus.org>:
>> 
>> Alan Third <alan <at> idiocy.org> writes:
>> 
>>> I've attached a patch that may do something towards preventing this
>>> problem but ultimately this is a convenience to give a best guess at
>>> choosing the correct dictionary, date format, etc. If we can't easily
>>> fix it then we can drop it and tell people to set it in their init.el
>>> themselves.
>> 
>> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
>> still the invalid en_NO.UTF-8 for me.
>
> Maybe we should add similar logic as iTerm2
> (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107):
> create the locale identifier from language code and country code
> instead of the current locale identifier, and use setlocale (or
> better, newlocale) to check whether it's valid, and fall back to
> en_US.UTF-8 otherwise?

Native macOS Terminal also has similar logic that calls setlocale.  It
tries to setlocale on LC_ALL (first argument 0) with these locale
identifiers in turn, until one of them succeeds:

- "localeIdentifier.UTF-8"
- "languageCode_countryCode.UTF-8"
- "languageCode_countryCode"

So they seem to give preference to [[NSLocale currentLocale]
localeIdentifier] and only use "languageCode_countryCode" as fallback.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 15:21:02 GMT) Full text and rfc822 format available.

Message #65 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 51832 <at> debbugs.gnu.org, Philipp <p.stephani2 <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 15:20:14 +0000
[Message part 1 (text/plain, inline)]
On Sun, Nov 14, 2021 at 03:28:02PM +0100, Lars Ingebrigtsen wrote:
> Philipp <p.stephani2 <at> gmail.com> writes:
> 
> >> That didn't fix the issue for me, I'm afraid -- with that patch, LANG is
> >> still the invalid en_NO.UTF-8 for me.
> >
> > Maybe we should add similar logic as iTerm2

I tried to find how iTerm2 does it. Your search-fu is better than
mine, apparently. :)

> > (https://github.com/gnachman/iTerm2/blob/79aff4d59fd591e7628649bcabe5f27541740bf6/sources/PTYSession.m#L7107):
> > create the locale identifier from language code and country code
> > instead of the current locale identifier,
> 
> I think that's what's Macos is returning -- it's just concatenating
> those two codes to get a locale identifier.  (Which is wrong, of
> course.)

Yeah, I don't think there's any advantage to building them up
manually.

> > and use setlocale (or better, newlocale) to check whether it's valid,
> 
> Yes, that sounds good.
> 
> > and fall back to en_US.UTF-8 otherwise?
> 
> Hm...  I'd rather just leave LANG unset in that case -- it'll probably
> lead to fewer glitches, I think.

I proposed something similar before:

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=51321#90

but it didn't look like we needed it then. We know better now.

New patch attached.
-- 
Alan Third
[v2-0001-Only-set-LANG-if-the-ID-is-valid.patch (text/plain, attachment)]

Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Sun, 14 Nov 2021 15:30:02 GMT) Full text and rfc822 format available.

Message #68 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alan Third <alan <at> idiocy.org>
Cc: 51832 <at> debbugs.gnu.org, Philipp <p.stephani2 <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Sun, 14 Nov 2021 16:29:09 +0100
Alan Third <alan <at> idiocy.org> writes:

> New patch attached.

Yup; that fixes the issue here -- LANG is unset in Emacs, and I can now
pipe in non-ASCII into pbcopy successfully.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Tue, 16 Nov 2021 20:53:01 GMT) Full text and rfc822 format available.

Message #71 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Alan Third <alan <at> idiocy.org>
To: Lars Ingebrigtsen <larsi <at> gnus.org>
Cc: 51832 <at> debbugs.gnu.org, Philipp <p.stephani2 <at> gmail.com>,
 Eli Zaretskii <eliz <at> gnu.org>, tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Tue, 16 Nov 2021 20:52:46 +0000
On Sun, Nov 14, 2021 at 04:29:09PM +0100, Lars Ingebrigtsen wrote:
> Alan Third <alan <at> idiocy.org> writes:
> 
> > New patch attached.
> 
> Yup; that fixes the issue here -- LANG is unset in Emacs, and I can now
> pipe in non-ASCII into pbcopy successfully.

Thanks. I've pushed to master.
-- 
Alan Third




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#51832; Package emacs. (Tue, 20 Sep 2022 13:25:01 GMT) Full text and rfc822 format available.

Message #74 received at 51832 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Alan Third <alan <at> idiocy.org>
Cc: 51832 <at> debbugs.gnu.org, Eli Zaretskii <eliz <at> gnu.org>,
 Philipp <p.stephani2 <at> gmail.com>, tor.a.s.kringeland <at> ntnu.no
Subject: Re: bug#51832: Piping unicode text in `shell-command'
Date: Tue, 20 Sep 2022 15:24:32 +0200
Alan Third <alan <at> idiocy.org> writes:

>> Yup; that fixes the issue here -- LANG is unset in Emacs, and I can now
>> pipe in non-ASCII into pbcopy successfully.
>
> Thanks. I've pushed to master.

The bug report was left open, so I'm closing it now.  (I only lightly
skimmed this long bug report thread -- if there were other issues here
that need fixing, please respond to the debbugs address, and we'll
reopen.  Or even better -- open a new bug report.)




bug closed, send any further explanations to 51832 <at> debbugs.gnu.org and Tor Kringeland <tor.a.s.kringeland <at> ntnu.no> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Tue, 20 Sep 2022 13:25:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 19 Oct 2022 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 1 year and 190 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.