GNU bug report logs - #40702
28.0.50; (what-cursor-position) barfs on non-ASCII char

Previous Next

Package: emacs;

Reported by: Dima Kogan <dima <at> secretsauce.net>

Date: Sat, 18 Apr 2020 21:37:01 UTC

Severity: normal

Tags: fixed

Found in version 28.0.50

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 40702 in the body.
You can then email your comments to 40702 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sat, 18 Apr 2020 21:37:01 GMT) Full text and rfc822 format available.

Acknowledgement sent to Dima Kogan <dima <at> secretsauce.net>:
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Sat, 18 Apr 2020 21:37:01 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: bug-gnu-emacs <at> gnu.org
Subject: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sat, 18 Apr 2020 14:27:39 -0700
Hi. I'm using a very recent build of emacs from git. I see this:

1. emacs -Q
   Fresh emacs. Opens in the *scratch* buffer

2. C-x 8 ' e
   i.e. insert some non-ASCII character. Opening any buffer with such
   characters works too

3. Left
   Move the point to this character

4. C-x =
   (what-cursor-position) to ask emacs to tell us about this character.
   I see this:

   cl--assertion-failed: Assertion failed: (not (multibyte-string-p str))

Thanks!




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sat, 18 Apr 2020 21:54:01 GMT) Full text and rfc822 format available.

Message #8 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Štěpán Němec <stepnem <at> gmail.com>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sat, 18 Apr 2020 23:53:47 +0200
On Sat, 18 Apr 2020 14:27:39 -0700
Dima Kogan wrote:

> Hi. I'm using a very recent build of emacs from git. I see this:
>
> 1. emacs -Q
>    Fresh emacs. Opens in the *scratch* buffer
>
> 2. C-x 8 ' e
>    i.e. insert some non-ASCII character. Opening any buffer with such
>    characters works too
>
> 3. Left
>    Move the point to this character
>
> 4. C-x =
>    (what-cursor-position) to ask emacs to tell us about this character.
>    I see this:
>
>    cl--assertion-failed: Assertion failed: (not (multibyte-string-p str))

I can't reproduce this on current master (d890e5b73a Fix misnamed
variable breaking GNUstep)

GNU Emacs 28.0.50 (build 26, x86_64-pc-linux-gnu, GTK+ Version 3.24.17, cairo version 1.17.3)

-- 
Štěpán




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sat, 18 Apr 2020 22:23:02 GMT) Full text and rfc822 format available.

Message #11 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: Štěpán Němec <stepnem <at> gmail.com>
Cc: 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sat, 18 Apr 2020 15:22:13 -0700
Štěpán Němec <stepnem <at> gmail.com> writes:

> I can't reproduce this on current master

Thanks for checking. It's very consistent on my end. I poked at it a
little bit just now.

I see that buffer-file-coding-system is nil

It ends up evaluating

  (encoded-string-description "é" nil)

which looks at the value of

  (multibyte-string-p "é")

[ The string above is supposed to be a single unicode character; my
  email maybe will mangle it; I don't know ]

On my install this evaluates to t, which is causing the error. Which of
these shouldn't be happening? For the record, it used to work for me.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sun, 19 Apr 2020 13:02:02 GMT) Full text and rfc822 format available.

Message #14 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Štěpán Němec <stepnem <at> gmail.com>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sun, 19 Apr 2020 15:02:24 +0200
On Sat, 18 Apr 2020 15:22:13 -0700
Dima Kogan wrote:

> Thanks for checking. It's very consistent on my end. I poked at it a
> little bit just now.
>
> I see that buffer-file-coding-system is nil
>
> It ends up evaluating
>
>   (encoded-string-description "é" nil)
>
> which looks at the value of
>
>   (multibyte-string-p "é")
>
> [ The string above is supposed to be a single unicode character; my
>   email maybe will mangle it; I don't know ]
>
> On my install this evaluates to t, which is causing the error. Which of
> these shouldn't be happening? For the record, it used to work for me.

I'm not sure I'll be able to help you given my lack of familiarity with
this and related code, but can you at least post the full backtrace?

Looking at `what-cursor-position', apparently due to your
`buffer-file-coding-system' being nil (which seems a bit strange to me:
is even your (default-value 'buffer-file-coding-system) nil?) the
multibyte string isn't properly encoded and instead passed directly to
`encoded-string-description', leading to the error.

That said, there haven't been any relevant recent changes to
`what-cursor-position'.

In any case, I think more info is needed: backtrace, system/environment.

-- 
Štěpán




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sun, 19 Apr 2020 15:23:01 GMT) Full text and rfc822 format available.

Message #17 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Štěpán Němec <stepnem <at> gmail.com>
Cc: dima <at> secretsauce.net, 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sun, 19 Apr 2020 18:22:30 +0300
> From: Štěpán Němec
>  <stepnem <at> gmail.com>
> Date: Sun, 19 Apr 2020 15:02:24 +0200
> Cc: 40702 <at> debbugs.gnu.org
> 
> Looking at `what-cursor-position', apparently due to your
> `buffer-file-coding-system' being nil (which seems a bit strange to me:
> is even your (default-value 'buffer-file-coding-system) nil?)

buffer-file-coding-system being nil means 'no-conversion'.  You can
easily simulate that yourself, by an explicit setq, and you will then
get the error described in the report.

> the multibyte string isn't properly encoded and instead passed
> directly to `encoded-string-description', leading to the error.

Emacs 26.3 doesn't signal an error in this case, so I think this is a
regression we should fix.

> That said, there haven't been any relevant recent changes to
> `what-cursor-position'.
> 
> In any case, I think more info is needed: backtrace, system/environment.

Here's a backtrace:

  Debugger entered--Lisp error: (cl-assertion-failed ((not (multibyte-string-p str)) nil))
    cl--assertion-failed((not (multibyte-string-p str)))
    encoded-string-description(#("é" 0 1 (charset unicode)) nil)
    describe-char(146)
    what-cursor-position((4))
    funcall-interactively(what-cursor-position (4))
    call-interactively(what-cursor-position nil nil)
    command-execute(what-cursor-position)




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sun, 19 Apr 2020 16:18:01 GMT) Full text and rfc822 format available.

Message #20 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Štěpán Němec <stepnem <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dima <at> secretsauce.net, 40702 <at> debbugs.gnu.org,
 Stefan Monnier <monnier <at> iro.umontreal.ca>
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sun, 19 Apr 2020 18:18:13 +0200
On Sun, 19 Apr 2020 18:22:30 +0300
Eli Zaretskii wrote:

>> Looking at `what-cursor-position', apparently due to your
>> `buffer-file-coding-system' being nil (which seems a bit strange to me:
>> is even your (default-value 'buffer-file-coding-system) nil?)
>
> buffer-file-coding-system being nil means 'no-conversion'.  You can
> easily simulate that yourself, by an explicit setq, and you will then
> get the error described in the report.

Indeed, thanks, the meaning of `nil' is described in the doc string. I
was more surprised that it ever ends up being nil by default, but that's
probably because I have very little understanding of how the Emacs
coding setup works.

>> the multibyte string isn't properly encoded and instead passed
>> directly to `encoded-string-description', leading to the error.
>
> Emacs 26.3 doesn't signal an error in this case, so I think this is a
> regression we should fix.
>
>> That said, there haven't been any relevant recent changes to
>> `what-cursor-position'.
>> 
>> In any case, I think more info is needed: backtrace, system/environment.
>
> Here's a backtrace:
>
>   Debugger entered--Lisp error: (cl-assertion-failed ((not (multibyte-string-p str)) nil))
>     cl--assertion-failed((not (multibyte-string-p str)))
>     encoded-string-description(#("é" 0 1 (charset unicode)) nil)
>     describe-char(146)
>     what-cursor-position((4))
>     funcall-interactively(what-cursor-position (4))
>     call-interactively(what-cursor-position nil nil)
>     command-execute(what-cursor-position)

Thanks. I was looking at all the wrong places. The problem was simply
introduced by the addition of the assert in

2019-05-28T20:59:35-04:00!monnier <at> iro.umontreal.ca
146486f8a6 (* mule-cmds.el (encoded-string-description): Require unibyte string as input)
https://git.sv.gnu.org/cgit/emacs.git/commit/?id=146486f8a6

Removing the assertion reverts to the Emacs 26 behaviour.

Unfortunately there is no explanation regarding the change. Maybe Stefan
could provide some insight?

-- 
Štěpán




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sun, 19 Apr 2020 16:45:01 GMT) Full text and rfc822 format available.

Message #23 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: Štěpán Němec <stepnem <at> gmail.com>,
 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sun, 19 Apr 2020 12:44:33 -0400
>> I can't reproduce this on current master
> Thanks for checking. It's very consistent on my end. I poked at it a
> little bit just now.
> I see that buffer-file-coding-system is nil

It would be worth looking into how/why you get a nil value here.

> It ends up evaluating
>   (encoded-string-description "é" nil)

This seems to point to a bug in `encode-coding-char`:

    M-: (encode-coding-char ?\é nil) RET

returns "é" which is not a unibyte string and hence is not a valid
encoded string.  Note that

    M-: (encode-coding-char ?\é 'no-conversion) RET

does not suffer from the same problem.  This comes from
`encode-coding-string` which also returns a multibyte string when its
coding arg is nil.

I'm not sure if `encode-coding-string/char` should accept a nil argument
nor how it should treat it, so maybe it's a bug in `what-char-position`
which should not pass a nil argument here.  So maybe the patch below
is a good fix?


        Stefan


diff --git a/lisp/simple.el b/lisp/simple.el
index 8bc84a9dfa..e5180119e8 100644
--- a/lisp/simple.el
+++ b/lisp/simple.el
@@ -1470,7 +1470,11 @@ what-cursor-position
 	    encoded encoding-msg display-prop under-display)
 	(if (or (not coding)
 		(eq (coding-system-type coding) t))
-	    (setq coding (default-value 'buffer-file-coding-system)))
+	    (setq coding (or (default-value 'buffer-file-coding-system)
+                             ;; A nil value of `buffer-file-coding-system'
+                             ;; means "no conversion" which means each byte
+                             ;; is a char and vice versa.
+                             'binary)))
 	(if (eq (char-charset char) 'eight-bit)
 	    (setq encoding-msg
 		  (format "(%d, #o%o, #x%x%s, raw-byte)" char char char char-name-fmt))





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sun, 19 Apr 2020 16:51:01 GMT) Full text and rfc822 format available.

Message #26 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: Štěpán Němec <stepnem <at> gmail.com>
Cc: dima <at> secretsauce.net, 40702 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sun, 19 Apr 2020 19:50:10 +0300
> From: Štěpán Němec <stepnem <at> gmail.com>
> Cc: dima <at> secretsauce.net,  40702 <at> debbugs.gnu.org, Stefan Monnier
>  <monnier <at> iro.umontreal.ca>
> Date: Sun, 19 Apr 2020 18:18:13 +0200
> 
> >   Debugger entered--Lisp error: (cl-assertion-failed ((not (multibyte-string-p str)) nil))
> >     cl--assertion-failed((not (multibyte-string-p str)))
> >     encoded-string-description(#("é" 0 1 (charset unicode)) nil)
> >     describe-char(146)
> >     what-cursor-position((4))
> >     funcall-interactively(what-cursor-position (4))
> >     call-interactively(what-cursor-position nil nil)
> >     command-execute(what-cursor-position)
> 
> Thanks. I was looking at all the wrong places. The problem was simply
> introduced by the addition of the assert in
> 
> 2019-05-28T20:59:35-04:00!monnier <at> iro.umontreal.ca
> 146486f8a6 (* mule-cmds.el (encoded-string-description): Require unibyte string as input)
> https://git.sv.gnu.org/cgit/emacs.git/commit/?id=146486f8a6
> 
> Removing the assertion reverts to the Emacs 26 behaviour.
> 
> Unfortunately there is no explanation regarding the change. Maybe Stefan
> could provide some insight?

Could the discussion below provide such an explanation?

  https://lists.gnu.org/archive/html/emacs-devel/2019-05/msg00949.html




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Sun, 19 Apr 2020 19:40:02 GMT) Full text and rfc822 format available.

Message #29 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Štěpán Němec <stepnem <at> gmail.com>
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: dima <at> secretsauce.net, 40702 <at> debbugs.gnu.org, monnier <at> iro.umontreal.ca
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sun, 19 Apr 2020 21:39:48 +0200
On Sun, 19 Apr 2020 19:50:10 +0300
Eli Zaretskii wrote:

>> 2019-05-28T20:59:35-04:00!monnier <at> iro.umontreal.ca
>> 146486f8a6 (* mule-cmds.el (encoded-string-description): Require unibyte string as input)
>> https://git.sv.gnu.org/cgit/emacs.git/commit/?id=146486f8a6
>> 
>> Removing the assertion reverts to the Emacs 26 behaviour.
>> 
>> Unfortunately there is no explanation regarding the change. Maybe Stefan
>> could provide some insight?
>
> Could the discussion below provide such an explanation?
>
>   https://lists.gnu.org/archive/html/emacs-devel/2019-05/msg00949.html

Yes, and also a lot of other useful context/reference information.

Thanks!

-- 
Štěpán




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Mon, 20 Apr 2020 04:18:02 GMT) Full text and rfc822 format available.

Message #32 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Štěpán Němec <stepnem <at> gmail.com>,
 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Sun, 19 Apr 2020 21:16:51 -0700
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

>> I see that buffer-file-coding-system is nil
>
> It would be worth looking into how/why you get a nil value here.

Any suggestions about how to do that? For the record, unicode stuff
seems to work in general, this bug excepted. Would you expect stuff to
break with nil here?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Mon, 20 Apr 2020 13:28:02 GMT) Full text and rfc822 format available.

Message #35 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Stefan Monnier <monnier <at> iro.umontreal.ca>
To: Dima Kogan <dima <at> secretsauce.net>
Cc: Štěpán Němec <stepnem <at> gmail.com>,
 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Mon, 20 Apr 2020 09:27:27 -0400
>>> I see that buffer-file-coding-system is nil
>> It would be worth looking into how/why you get a nil value here.
> Any suggestions about how to do that?

If you get that in the scratch buffer in `emacs -Q`, then I'd guess it
depends on the locale setting.


        Stefan





Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Mon, 20 Apr 2020 21:45:01 GMT) Full text and rfc822 format available.

Message #38 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Dima Kogan <dima <at> secretsauce.net>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Štěpán Němec <stepnem <at> gmail.com>,
 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Mon, 20 Apr 2020 14:44:16 -0700
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> If you get that in the scratch buffer in `emacs -Q`, then I'd guess it
> depends on the locale setting.

  $ locale

  LANG=C
  LANGUAGE=
  LC_CTYPE="C"
  LC_NUMERIC="C"
  LC_TIME="C"
  LC_COLLATE="C"
  LC_MONETARY="C"
  LC_MESSAGES="C"
  LC_PAPER="C"
  LC_NAME="C"
  LC_ADDRESS="C"
  LC_TELEPHONE="C"
  LC_MEASUREMENT="C"
  LC_IDENTIFICATION="C"
  LC_ALL=C

I happen to live in an English-speaking country, so generally doing
everything in ASCII works ok. Is there anything to "fix" here?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#40702; Package emacs. (Wed, 30 Sep 2020 03:46:02 GMT) Full text and rfc822 format available.

Message #41 received at 40702 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: Stefan Monnier <monnier <at> iro.umontreal.ca>
Cc: Štěpán Němec <stepnem <at> gmail.com>,
 Dima Kogan <dima <at> secretsauce.net>, 40702 <at> debbugs.gnu.org
Subject: Re: bug#40702: 28.0.50; (what-cursor-position) barfs on non-ASCII char
Date: Wed, 30 Sep 2020 05:45:05 +0200
Stefan Monnier <monnier <at> iro.umontreal.ca> writes:

> I'm not sure if `encode-coding-string/char` should accept a nil argument
> nor how it should treat it, so maybe it's a bug in `what-char-position`
> which should not pass a nil argument here.  So maybe the patch below
> is a good fix?

With

LANG=C LANGUAGE= LC_CTYPE="C" LC_NUMERIC="C" LC_TIME="C" LC_COLLATE="C" LC_MONETARY="C" LC_MESSAGES="C" LC_PAPER="C" LC_NAME="C" LC_ADDRESS="C" LC_TELEPHONE="C" LC_MEASUREMENT="C" LC_IDENTIFICATION="C" LC_ALL=C ./src/emacs -geometry -0+0 -Q  

I can reproduce the bug Dima is seeing, and Stefan's patch fixes the
problem, and seems otherwise unproblematic, so I've pushed it to Emacs
28.

There may be other, more general problems when running under the "C"
locale, but...

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




Added tag(s) fixed. Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Wed, 30 Sep 2020 03:46:02 GMT) Full text and rfc822 format available.

bug marked as fixed in version 28.1, send any further explanations to 40702 <at> debbugs.gnu.org and Dima Kogan <dima <at> secretsauce.net> Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Wed, 30 Sep 2020 03:46:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Wed, 28 Oct 2020 11:24:03 GMT) Full text and rfc822 format available.

This bug report was last modified 3 years and 178 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.