GNU bug report logs - #29872
26.0.90; `man' output encoding, hyphen chars

Previous Next

Package: emacs;

Reported by: Drew Adams <drew.adams <at> oracle.com>

Date: Thu, 28 Dec 2017 02:00:02 UTC

Severity: minor

Found in version 26.0.90

Fixed in version 26.1

Done: Stefan Kangas <stefan <at> marxist.se>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 29872 in the body.
You can then email your comments to 29872 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Thu, 28 Dec 2017 02:00:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to Drew Adams <drew.adams <at> oracle.com>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Thu, 28 Dec 2017 02:00:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: bug-gnu-emacs <at> gnu.org
Subject: 26.0.90; `man' output encoding, hyphen chars
Date: Wed, 27 Dec 2017 17:59:20 -0800 (PST)
1. emacs -Q
2. load library cygwin-mount.el
3. load library setup-cygwin.el
4. M-x man RET find RET

The two libraries are on Emacs Wiki:
https://www.emacswiki.org/emacs?action=elisp-area;context=0

The `man' output has (presumably) end-of-line hyphens that are not shown
as such, perhaps because of the buffer encoding.  This is what is
present, for example:

 This  manual page documents the GNU version of find.  GNU find searches
 the directory tree rooted at each given file  name  by  evaluating  the
 given  expression  from left to right, according to the rules of preceâ€
 dence (see section OPERATORS), until the outcome  is  known  (the  left

(`report-emacs-bug' would not let me send this message with the above
text, so I've removed the character at the end of the line (after `â€'),
which is displayed as \220 (but a single character), so I could send the
msg.)

After "prece" there are these characters (from `C-u C-x ='):

LATIN SMALL LETTER A WITH CIRCUMFLEX
EURO SIGN
3fff90 (displayed as \220)

The charset is apparently windows-1252 (from `C-u C-x =').

Is this an Emacs bug, or is something different needed in, say,
setup-cygwin.el?  Currently, this code is in that library:

;;; Use Unix-style line endings.
;; Per Eli Z. http://debbugs.gnu.org/cgi/bugreport.cgi?bug=21780#40:
;;
;; $$$$$$ (setq-default buffer-file-coding-system 'undecided-unix)
(setq-default buffer-file-coding-system
              (coding-system-change-eol-conversion
                (default-value 'buffer-file-coding-system)
                'unix))
and this:

(setq process-coding-system-alist
      (cons '("bash" . (raw-text-dos . raw-text-unix))
             process-coding-system-alist))

I do not see this problem prior to this pretest build (e.g. with Emacs
24.5 or 25.3.1).  I do see the problem also with this Emacs 27 build:

 GNU Emacs 27.0.50 (build 4, x86_64-w64-mingw32) of 2017-12-21


In GNU Emacs 26.0.90 (build 3, x86_64-w64-mingw32)
 of 2017-10-13 built on LAPHROAIG
Repository revision: 906224eba147bdfc0514090064e8e8f53160f1d4
Windowing system distributor 'Microsoft Corp.', version 6.1.7601
Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.
Loading d:/usr/drew/drews-lisp-20/contrib/cygwin-mount.el (source)...done
Loading d:/usr/drew/drews-lisp-20/setup-cygwin.el (source)...done
Invoking man find in the background
find man page formatted
Type C-x 1 to delete the help window, C-M-v to scroll help.
Char: â (226, #o342, #xe2, file #xE2) point=477 of 75478 (1%) column=77
mwheel-scroll: Beginning of buffer [3 times]
Configured using:
 'configure --without-dbus --host=x86_64-w64-mingw32
 --without-compress-install 'CFLAGS=-O2 -static -g3''

Configured features:
XPM JPEG TIFF GIF PNG RSVG SOUND NOTIFY ACL GNUTLS LIBXML2 ZLIB
TOOLKIT_SCROLL_BARS

Important settings:
  value of $LANG: ENU
  locale-coding-system: cp1252

Major mode: Man

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message rmc puny seq byte-opt gv
bytecomp byte-compile cconv format-spec rfc822 mml mml-sec
password-cache epa derived epg epg-config gnus-util rmail rmail-loaddefs
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils cl-seq pp wid-edit descr-text help-mode tabify imenu man
easymenu cl-loaddefs cl-lib setup-cygwin cygwin-mount ange-ftp comint
ansi-color ring dired dired-loaddefs elec-pair time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel dos-w32 ls-lisp disp-table term/w32-win w32-win w32-vars
term/common-win tool-bar dnd fontset image regexp-opt fringe
tabulated-list replace newcomment text-mode elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow isearch timer select
scroll-bar mouse jit-lock font-lock syntax facemenu font-core
term/tty-colors frame cl-generic cham georgian utf-8-lang misc-lang
vietnamese tibetan thai tai-viet lao korean japanese eucjp-ms cp51932
hebrew greek romanian slovak czech european ethiopic indian cyrillic
chinese composite charscript charprop case-table epa-hook jka-cmpr-hook
help simple abbrev obarray minibuffer cl-preloaded nadvice loaddefs
button faces cus-face macroexp files text-properties overlay sha1 md5
base64 format env code-pages mule custom widget hashtable-print-readable
backquote w32notify w32 multi-tty make-network-process emacs)

Memory information:
((conses 16 159308 20379)
 (symbols 56 33242 1)
 (miscs 48 101 150)
 (strings 32 88693 1858)
 (string-bytes 1 2015171)
 (vectors 16 22521)
 (vector-slots 8 1548032 220308)
 (floats 8 57 295)
 (intervals 56 17867 9)
 (buffers 992 14))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Thu, 28 Dec 2017 03:35:01 GMT) Full text and rfc822 format available.

Message #8 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Thu, 28 Dec 2017 05:34:44 +0200
> Date: Wed, 27 Dec 2017 17:59:20 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> 
> 1. emacs -Q
> 2. load library cygwin-mount.el
> 3. load library setup-cygwin.el
> 4. M-x man RET find RET
> 
> The two libraries are on Emacs Wiki:
> https://www.emacswiki.org/emacs?action=elisp-area;context=0

But that's not all, because one also needs to install the 'man'
command and its database of man pages, right?  And those which you
have are from Cygwin, right?

> I do not see this problem prior to this pretest build (e.g. with Emacs
> 24.5 or 25.3.1).  I do see the problem also with this Emacs 27 build:

What do you see in Emacs 25.3 instead of that passage?  Can you show
the corresponding part of the man-page buffer displayed by that Emacs
version?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Thu, 28 Dec 2017 23:53:01 GMT) Full text and rfc822 format available.

Message #11 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Thu, 28 Dec 2017 15:52:30 -0800 (PST)
> But that's not all, because one also needs to install the 'man'
> command and its database of man pages, right?  And those which you
> have are from Cygwin, right?

I mentioned that I use Cygwin.  (I can add that it is not
a recent version.)  But I did nothing special "to install
the 'man' command and its database of man pages".  I just
installed Cygwin (years ago).

> > I do not see this problem prior to this pretest build
> > (e.g. with Emacs 24.5 or 25.3.1).  I do see the problem
> > also with this Emacs 27 build:
> 
> What do you see in Emacs 25.3 instead of that passage?  Can you show
> the corresponding part of the man-page buffer displayed by that Emacs
> version?

In Emacs 25.3.1 and prior I see a hyphen character
instead.  I thought that was clear.

 This  manual page documents the GNU version of find.  GNU find searches
 the directory tree rooted at each given file  name  by  evaluating  the
 given  expression  from left to right, according to the rules of prece-
 dence (see section OPERATORS), until the outcome  is  known  (the  left

`C-u C-x =' before that hyphen says this:

  name: HYPHEN-MINUS
  general-category: Pd (Punctuation, Dash)
  decomposition: (45) ('-')

(BTW, I showed only that one hyphen occurrence, but all of
the "hyphens" show the same problem in Emacs 26 and later.)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Fri, 29 Dec 2017 09:50:02 GMT) Full text and rfc822 format available.

Message #14 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Fri, 29 Dec 2017 11:48:55 +0200
> Date: Thu, 28 Dec 2017 15:52:30 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> In Emacs 25.3.1 and prior I see a hyphen character
> instead.  I thought that was clear.

It wasn't clear because you never said that explicitly, and because
"hyphen" is ambiguous in this context: it could refer to ASCII '-'
character or to non-ASCII '­' or to non-ASCII '‐'.  They are all
different characters, and potentially hint on different problems.

>  This  manual page documents the GNU version of find.  GNU find searches
>  the directory tree rooted at each given file  name  by  evaluating  the
>  given  expression  from left to right, according to the rules of prece-
>  dence (see section OPERATORS), until the outcome  is  known  (the  left
> 
> `C-u C-x =' before that hyphen says this:
> 
>   name: HYPHEN-MINUS
>   general-category: Pd (Punctuation, Dash)
>   decomposition: (45) ('-')
> 
> (BTW, I showed only that one hyphen occurrence, but all of
> the "hyphens" show the same problem in Emacs 26 and later.)

OK.  If you manually load man.el from Emacs 25, does the problem go
away?

Also, if you invoke the 'man' command from the Bash shell and redirect
its output to a file, what characters/bytes do you see in place of
those hyphens in the file created by that command?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Fri, 29 Dec 2017 17:22:01 GMT) Full text and rfc822 format available.

Message #17 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Fri, 29 Dec 2017 09:21:42 -0800 (PST)
> OK.  If you manually load man.el from Emacs 25, does the problem go
> away?

No, I see the same problem.

> Also, if you invoke the 'man' command from the Bash shell and redirect
> its output to a file, what characters/bytes do you see in place of
> those hyphens in the file created by that command?

I do `M-x shell' and see this in buffer `*shell*':

 bash: cannot set terminal process group (-1): Inappropriate ioctl for device
 bash: no job control in this shell
 >

(Perhaps you have an idea about that?  If not, OK.)

But doing what you said anyway (`man find > foo'), the output
is correct; the "hyphens" are actually hyphens:

  name: HYPHEN
  general-category: Pd (Punctuation, Dash)
  decomposition: (8208) ('‐')




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Fri, 29 Dec 2017 18:56:02 GMT) Full text and rfc822 format available.

Message #20 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Fri, 29 Dec 2017 20:54:56 +0200
> Date: Fri, 29 Dec 2017 09:21:42 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> > OK.  If you manually load man.el from Emacs 25, does the problem go
> > away?
> 
> No, I see the same problem.
> 
> > Also, if you invoke the 'man' command from the Bash shell and redirect
> > its output to a file, what characters/bytes do you see in place of
> > those hyphens in the file created by that command?
> 
> I do `M-x shell' and see this in buffer `*shell*':
> 
>  bash: cannot set terminal process group (-1): Inappropriate ioctl for device
>  bash: no job control in this shell
>  >
> 
> (Perhaps you have an idea about that?  If not, OK.)
> 
> But doing what you said anyway (`man find > foo'), the output
> is correct; the "hyphens" are actually hyphens:
> 
>   name: HYPHEN
>   general-category: Pd (Punctuation, Dash)
>   decomposition: (8208) ('‐')

Then I think you will have to set up process-coding-system-alist such
that it reads output from 'man' with utf-8 decoding.  I don't know
what changed since Emacs 25, but if 'man' produces UTF-8 encoded
hyphens by default, Emacs needs to be told about that.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 00:37:01 GMT) Full text and rfc822 format available.

Message #23 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Fri, 29 Dec 2017 16:35:50 -0800 (PST)
> > I do `M-x shell' and see this in buffer `*shell*':
> >
> >  bash: cannot set terminal process group (-1): Inappropriate ioctl for
> >  device bash: no job control in this shell
> >
> > (Perhaps you have an idea about that?  If not, OK.)
> >
> > But doing what you said anyway (`man find > foo'), the output
> > is correct; the "hyphens" are actually hyphens:
> >
> >   name: HYPHEN
> >   general-category: Pd (Punctuation, Dash)
> >   decomposition: (8208) ('‐')
> 
> Then I think you will have to set up process-coding-system-alist such
> that it reads output from 'man' with utf-8 decoding.  I don't know
> what changed since Emacs 25, but if 'man' produces UTF-8 encoded
> hyphens by default, Emacs needs to be told about that.

I have no idea in what way to change
`process-coding-system-alist'.  The doc string and
(elisp) `Default Coding System' give me no hint that
I can recognize.

I don't even know whether it is the VAL (which is
`(raw-text-dos . raw-text-unix)') that is incorrect
or it is the PATTERN (which is "bash") that is
incorrect, or both.

And it is not even a user option.

_Someone_ should be able to tell "what changed
since Emacs 25", hopefully.  This is a (presumaby
incompatible) change in Emacs, not something a user
has provoked, AFAICS.

Suddenly something that (still) works in all prior
Emacs releases (as far back as Emacs 20, at least)
no longer works.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 00:41:02 GMT) Full text and rfc822 format available.

Message #26 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Fri, 29 Dec 2017 16:39:51 -0800 (PST)
> Suddenly something that (still) works in all prior
> Emacs releases (as far back as Emacs 20, at least)
> no longer works.

Sorry, my bad.  At least as far back as Emacs 22.
Emacs 20 shows the same problem.  There the output
shows this, where the end of the line is displayed
as `\342\200\220'.

 given  expression  from left to right, according to the rules of prece‐




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 08:14:02 GMT) Full text and rfc822 format available.

Message #29 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 10:13:18 +0200
> Date: Fri, 29 Dec 2017 16:35:50 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> > Then I think you will have to set up process-coding-system-alist such
> > that it reads output from 'man' with utf-8 decoding.  I don't know
> > what changed since Emacs 25, but if 'man' produces UTF-8 encoded
> > hyphens by default, Emacs needs to be told about that.
> 
> I have no idea in what way to change
> `process-coding-system-alist'.  The doc string and
> (elisp) `Default Coding System' give me no hint that
> I can recognize.

You said you had this stuff set up for Bash:

  (setq process-coding-system-alist
        (cons '("bash" . (raw-text-dos . raw-text-unix))
               process-coding-system-alist))

so I assumed you knew how to do that for another program.

> I don't even know whether it is the VAL (which is
> `(raw-text-dos . raw-text-unix)') that is incorrect
> or it is the PATTERN (which is "bash") that is
> incorrect, or both.

If you don't have problems with Bash, then its existing association in
the alist, as set by those cygwin-* libraries, is fine for you.  (The
latest Cygwin uses UTF-8 by default, so if your Bash is fairly recent,
I'd suggest to change the above as well, to use utf-8 instead of
raw-text.  If your Bash is old, then you probably don't need to
bother.)

For 'man', try this:

  (setq process-coding-system-alist
        (cons '("man" . (utf-8-dos . utf-8-unix))
               process-coding-system-alist))

> And it is not even a user option.

I'm surprised you find this customization so impenetrable.  It's an
alist which is IMO clearly documented, and you already have an example
for Bash.  So what exactly is the difficulty to customize it for a
given program, given the existing documentation?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 08:16:01 GMT) Full text and rfc822 format available.

Message #32 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 10:15:13 +0200
> Date: Fri, 29 Dec 2017 16:39:51 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> > Suddenly something that (still) works in all prior
> > Emacs releases (as far back as Emacs 20, at least)
> > no longer works.
> 
> Sorry, my bad.  At least as far back as Emacs 22.
> Emacs 20 shows the same problem.  There the output
> shows this, where the end of the line is displayed
> as `\342\200\220'.
> 
>  given  expression  from left to right, according to the rules of prece‐

Thanks, this is good to know, but then how did you obtain the text
with an ASCII hyphen that you said earlier you saw in older versions
of Emacs?

In any case, the above means the decoding of the stuff produced by
the Cygwin 'man' you have installed was incorrect since long ago.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 17:05:02 GMT) Full text and rfc822 format available.

Message #35 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 09:03:47 -0800 (PST)
> > > Suddenly something that (still) works in all prior
> > > Emacs releases (as far back as Emacs 20, at least)
> > > no longer works.
> >
> > Sorry, my bad.  At least as far back as Emacs 22.
> >
> > Emacs 20 shows the same problem.  There the output
> > shows this, where the end of the line is displayed
> > as `\342\200\220'.
> 
> Thanks, this is good to know, but then how did you obtain the text
> with an ASCII hyphen that you said earlier you saw in older versions
> of Emacs?

Sorry, I don't understand your question.  The output is
correct in all prior Emacs releases, as far back as Emacs
22 (not 20).

But it is incorrect in Emacs 26.0.90 and later, as I said.
(And it is incorrect in Emacs 20, in the way I showed.)

What part of this is not yet clear to you?  The output
is correct, with a normal hyphen, with Emacs 22 through
25.3.1.  The output is incorrect for Emacs 26 and 27.
(And the output is incorrect for Emacs 20.)

> In any case, the above means the decoding of the stuff produced by
> the Cygwin 'man' you have installed was incorrect since long ago.

I don't understand what you are saying.  As I said, the
output looks correct since Emacs 22, until Emacs 26.
It has not been "incorrect since long ago".  It is
incorrect since Emacs 26 (pretest).




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 17:05:02 GMT) Full text and rfc822 format available.

Message #38 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>, Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 09:03:51 -0800 (PST)
> > I don't even know whether it is the VAL (which is
> > `(raw-text-dos . raw-text-unix)') that is incorrect
> > or it is the PATTERN (which is "bash") that is
> > incorrect, or both.
> 
> If you don't have problems with Bash, then its existing association in
> the alist, as set by those cygwin-* libraries, is fine for you.  (The
> latest Cygwin uses UTF-8 by default, so if your Bash is fairly recent,
> I'd suggest to change the above as well, to use utf-8 instead of
> raw-text.  If your Bash is old, then you probably don't need to
> bother.)

My bash is from my Cygwin installation, which is old.

I mentioned here the one (known) problem that I have, which
is noted in the Commentary of `setup-cygwin.el', as follows.
I'm guessing that it is unrelated to the problem reported
for this bug, but I don't know that.

;;  NOTE:
;;
;;   When using precompiled GNU Emacs (all versions, at least 20-25)
;;   with a Cygwin installation with Cygwin1.dll version 1.7.11-1, you
;;   have trouble running bash in emacs. On `M-x shell` you get:
;;
;;    bash: cannot set terminal process group (-1):
;;          Inappropriate ioctl for device
;;    bash: no job control in this shell
;;
;;   This shell then is rather useless, because apart from the missing
;;   job control some commands called in that shell just hang.
;;
;;   People on the Cygwin mailing list have apparently suggested that
;;   it is a GNU Emacs problem.  This issue is still not resolved yet.
;;
;;   Workarounds some people have tried:
;;
;;   * Use Cygwin Emacs (package emacs-w32 uses the windows GUI, there
;;     are also X11 and console packages)
;;
;;   * Don't upgrade Cygwin above Cygwin1.dll, version 1.7.9.
;;
;;   See also https://www.emacswiki.org/emacs/NTEmacsWithCygwin.

That Emacs Wiki page has more info about this particular
problem.  I'm (obviously) no expert on this.

> For 'man', try this:
>   (setq process-coding-system-alist
>         (cons '("man" . (utf-8-dos . utf-8-unix))
>                process-coding-system-alist))

Thanks; I tried that.  It did not change the result -
the same problem as reported for this bug report.

Again I did `emacs -Q', loaded cygwin-mount.el then
setup-cygwin.el, then evaluated the above code to add
a `man' entry to `process-coding-system-alist', then
did `M-x man RET find RET'.  I again see the "preceâ€"
with the problematic "hyphen".

process-coding-system-alist is a variable defined in
‘C source code’.
Its value is
(("man" utf-8-dos . utf-8-unix)
 ("bash" raw-text-dos . raw-text-unix)
 ("[pP][lL][iI][nN][kK]" . #1=(undecided-dos . undecided-dos))
 ("[cC][mM][dD][pP][rR][oO][xX][yY]" . #1#))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 18:21:02 GMT) Full text and rfc822 format available.

Message #41 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 20:20:16 +0200
> Date: Sat, 30 Dec 2017 09:03:51 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> > For 'man', try this:
> >   (setq process-coding-system-alist
> >         (cons '("man" . (utf-8-dos . utf-8-unix))
> >                process-coding-system-alist))
> 
> Thanks; I tried that.  It did not change the result -
> the same problem as reported for this bug report.
> 
> Again I did `emacs -Q', loaded cygwin-mount.el then
> setup-cygwin.el, then evaluated the above code to add
> a `man' entry to `process-coding-system-alist', then
> did `M-x man RET find RET'.  I again see the "preceâ€"
> with the problematic "hyphen".
> 
> process-coding-system-alist is a variable defined in
> ‘C source code’.
> Its value is
> (("man" utf-8-dos . utf-8-unix)
>  ("bash" raw-text-dos . raw-text-unix)
>  ("[pP][lL][iI][nN][kK]" . #1=(undecided-dos . undecided-dos))
>  ("[cC][mM][dD][pP][rR][oO][xX][yY]" . #1#))

What about the below, does that work?

  (let ((locale-coding-system 'utf-8))
    (man "find"))




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 18:23:02 GMT) Full text and rfc822 format available.

Message #44 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 20:22:00 +0200
> Date: Sat, 30 Dec 2017 09:03:47 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> > > > Suddenly something that (still) works in all prior
> > > > Emacs releases (as far back as Emacs 20, at least)
> > > > no longer works.
> > >
> > > Sorry, my bad.  At least as far back as Emacs 22.
> > >
> > > Emacs 20 shows the same problem.  There the output
> > > shows this, where the end of the line is displayed
> > > as `\342\200\220'.
> > 
> > Thanks, this is good to know, but then how did you obtain the text
> > with an ASCII hyphen that you said earlier you saw in older versions
> > of Emacs?
> 
> Sorry, I don't understand your question.  The output is
> correct in all prior Emacs releases, as far back as Emacs
> 22 (not 20).

Well, AFAICT, it could only have worked by sheer luck.

> But it is incorrect in Emacs 26.0.90 and later, as I said.
> (And it is incorrect in Emacs 20, in the way I showed.)
> 
> What part of this is not yet clear to you?  The output
> is correct, with a normal hyphen, with Emacs 22 through
> 25.3.1.  The output is incorrect for Emacs 26 and 27.
> (And the output is incorrect for Emacs 20.)
> 
> > In any case, the above means the decoding of the stuff produced by
> > the Cygwin 'man' you have installed was incorrect since long ago.
> 
> I don't understand what you are saying.  As I said, the
> output looks correct since Emacs 22, until Emacs 26.
> It has not been "incorrect since long ago".  It is
> incorrect since Emacs 26 (pretest).

Forget it, I just became confused about what did and what didn't work
before and after Emacs 22.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 18:29:01 GMT) Full text and rfc822 format available.

Message #47 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: drew.adams <at> oracle.com
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 20:28:28 +0200
> Date: Sat, 30 Dec 2017 20:20:16 +0200
> From: Eli Zaretskii <eliz <at> gnu.org>
> Cc: 29872 <at> debbugs.gnu.org
> 
> What about the below, does that work?
> 
>   (let ((locale-coding-system 'utf-8))
>     (man "find"))

One other thing to try: set GROFF_TYPESETTER=ascii in the environment,
then invoke "M-x man" normally.  This method assumes that you don't
have a find.1 file in the man/cat1 directory; if you do, delete it
first, so that running "man" the first time after that will recreate
it.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 30 Dec 2017 23:12:02 GMT) Full text and rfc822 format available.

Message #50 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sat, 30 Dec 2017 15:11:05 -0800 (PST)
> What about the below, does that work?
>   (let ((locale-coding-system 'utf-8))
>     (man "find"))

Yes!  With `emacs -Q', loading the two files mentioned,
and then evaluating that sexp, the `man' output is correct:
hyphens appear as they should - that is:

  name: HYPHEN
  general-category: Pd (Punctuation, Dash)
  decomposition: (8208) ('‐')

What should I then change in, say, `setup-cygwin.el',
to make that happen?  (Or does something need to be
changed in Emacs itself?)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sun, 31 Dec 2017 16:25:02 GMT) Full text and rfc822 format available.

Message #53 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sun, 31 Dec 2017 18:24:05 +0200
> Date: Sat, 30 Dec 2017 15:11:05 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> > What about the below, does that work?
> >   (let ((locale-coding-system 'utf-8))
> >     (man "find"))
> 
> Yes!  With `emacs -Q', loading the two files mentioned,
> and then evaluating that sexp, the `man' output is correct:
> hyphens appear as they should - that is:
> 
>   name: HYPHEN
>   general-category: Pd (Punctuation, Dash)
>   decomposition: (8208) ('‐')
> 
> What should I then change in, say, `setup-cygwin.el',
> to make that happen?  (Or does something need to be
> changed in Emacs itself?)

For Emacs 26, I've just committed a change that introduces a new
defcustom, Man-coding-system, which you can customize to utf-8 to get
the correct behavior in your case.  For older versions of Emacs, you
will need to use a separate command that invokes 'man' as shown above,
because man.el unconditionally uses locale-coding-system for that, and
locale-coding-system on MS-Windows can never be UTF-8.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sun, 31 Dec 2017 17:35:01 GMT) Full text and rfc822 format available.

Message #56 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Drew Adams <drew.adams <at> oracle.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org
Subject: RE: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sun, 31 Dec 2017 09:34:42 -0800 (PST)
> > > What about the below, does that work?
> > >   (let ((locale-coding-system 'utf-8))
> > >     (man "find"))
> >
> > Yes!  With `emacs -Q', loading the two files mentioned,
> > and then evaluating that sexp, the `man' output is correct:
> > hyphens appear as they should - that is:
> >
> >   name: HYPHEN
> >   general-category: Pd (Punctuation, Dash)
> >   decomposition: (8208) ('‐')
> >
> > What should I then change in, say, `setup-cygwin.el',
> > to make that happen?  (Or does something need to be
> > changed in Emacs itself?)
> 
> For Emacs 26, I've just committed a change that introduces a new
> defcustom, Man-coding-system, which you can customize to utf-8 to get
> the correct behavior in your case.  For older versions of Emacs, you
> will need to use a separate command that invokes 'man' as shown above,
> because man.el unconditionally uses locale-coding-system for that, and
> locale-coding-system on MS-Windows can never be UTF-8.

Thank you.  I assume that you'll mention this in NEWS.
I wonder whether it is something that I should set in
`setup-cygwin.el' or just tell users there, in a comment,
that they will need to customize it.

Leavning aside, for the moments, arguments about whether
code should mess with user options, can you say when it
is appropriate, typically, for a Windows user to customize
that option?  Does it have to do with whether Cygwin is
used, for instance (with a non-Cygwin Emacs)?  Or does
the option default value depend on the platform perhaps? 




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sun, 31 Dec 2017 18:54:02 GMT) Full text and rfc822 format available.

Message #59 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Drew Adams <drew.adams <at> oracle.com>
Cc: 29872 <at> debbugs.gnu.org
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sun, 31 Dec 2017 20:53:13 +0200
> Date: Sun, 31 Dec 2017 09:34:42 -0800 (PST)
> From: Drew Adams <drew.adams <at> oracle.com>
> Cc: 29872 <at> debbugs.gnu.org
> 
> > For Emacs 26, I've just committed a change that introduces a new
> > defcustom, Man-coding-system, which you can customize to utf-8 to get
> > the correct behavior in your case.  For older versions of Emacs, you
> > will need to use a separate command that invokes 'man' as shown above,
> > because man.el unconditionally uses locale-coding-system for that, and
> > locale-coding-system on MS-Windows can never be UTF-8.
> 
> Thank you.  I assume that you'll mention this in NEWS.

I'm not sure it's NEWS worthy.  The situations in which this variable
is useful are quite obscure (see below) and since the code in its
present form exists for a long time without anyone complaining, we can
assume they are rare.  And mentioning this variable won't enhance its
discoverability, since no one will know to look it up when/if they
bump into this problem.

> I wonder whether it is something that I should set in
> `setup-cygwin.el' or just tell users there, in a comment,
> that they will need to customize it.

Probably the latter.

> Leavning aside, for the moments, arguments about whether
> code should mess with user options, can you say when it
> is appropriate, typically, for a Windows user to customize
> that option?

On Windows, probably only when using a Cygwin 'man' from a native
Windows Emacs.  On some other system, only if the man pages were
pre-formatted on another system with an incompatible locale setting,
or if Groff was forced to produce UTF-8 encoded man pages when those
man pages were formatted.

> Does it have to do with whether Cygwin is used, for instance (with a
> non-Cygwin Emacs)?  Or does the option default value depend on the
> platform perhaps?

On almost any platform with correct setup, the default of using
locale-coding-system for decoding the man pages should work correctly,
so the default should not depend on the platform.  This new variable
is a fire escape for those rare cases where the default doesn't work.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#29872; Package emacs. (Sat, 28 Sep 2019 23:27:01 GMT) Full text and rfc822 format available.

Message #62 received at 29872 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Kangas <stefan <at> marxist.se>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 29872 <at> debbugs.gnu.org, Drew Adams <drew.adams <at> oracle.com>
Subject: Re: bug#29872: 26.0.90; `man' output encoding, hyphen chars
Date: Sun, 29 Sep 2019 01:26:07 +0200
fixed 29872 26.1
close 29872
quit

Eli Zaretskii <eliz <at> gnu.org> writes:

>> Date: Sat, 30 Dec 2017 15:11:05 -0800 (PST)
>> From: Drew Adams <drew.adams <at> oracle.com>
>> Cc: 29872 <at> debbugs.gnu.org
>>
>> > What about the below, does that work?
>> >   (let ((locale-coding-system 'utf-8))
>> >     (man "find"))
>>
>> Yes!  With `emacs -Q', loading the two files mentioned,
>> and then evaluating that sexp, the `man' output is correct:
>> hyphens appear as they should - that is:
>>
>>   name: HYPHEN
>>   general-category: Pd (Punctuation, Dash)
>>   decomposition: (8208) ('‐')
>>
>> What should I then change in, say, `setup-cygwin.el',
>> to make that happen?  (Or does something need to be
>> changed in Emacs itself?)
>
> For Emacs 26, I've just committed a change that introduces a new
> defcustom, Man-coding-system, which you can customize to utf-8 to get
> the correct behavior in your case.  For older versions of Emacs, you
> will need to use a separate command that invokes 'man' as shown above,
> because man.el unconditionally uses locale-coding-system for that, and
> locale-coding-system on MS-Windows can never be UTF-8.

It seems like this was:

commit 39ca289a7a33d514c2a46f005db4e7173fb7e9f5
Author: Eli Zaretskii <eliz <at> gnu.org>
Date:   Sun Dec 31 18:20:12 2017 +0200

    Allow customization of decoding of "man" command

    * lisp/man.el (Man-coding-system): New defcustom.
    (Man-start-calling): Use it, and also pay attention to user
    overriding coding-system-for-read.  (Bug#29872)

Since no one has indicated otherwise in this thread, I'm going to assume
that this issue is now fixed and close this bug report.  If that is
incorrect, please reopen it.

Best regards,
Stefan Kangas




bug Marked as fixed in versions 26.1. Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Sat, 28 Sep 2019 23:29:03 GMT) Full text and rfc822 format available.

bug closed, send any further explanations to 29872 <at> debbugs.gnu.org and Drew Adams <drew.adams <at> oracle.com> Request was from Stefan Kangas <stefan <at> marxist.se> to control <at> debbugs.gnu.org. (Sat, 28 Sep 2019 23:29:03 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Sun, 27 Oct 2019 11:24:12 GMT) Full text and rfc822 format available.

This bug report was last modified 4 years and 181 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.