GNU bug report logs - #31062
26.0.91; warning on UTF-8 encoding of unibyte text

Previous Next

Package: emacs;

Reported by: charles <at> aurox.ch (Charles A. Roelli)

Date: Wed, 4 Apr 2018 18:27:01 UTC

Severity: wishlist

Found in version 26.0.91

Fixed in version 28.1

Done: Lars Ingebrigtsen <larsi <at> gnus.org>

Bug is archived. No further changes may be made.

To add a comment to this bug, you must first unarchive it, by sending
a message to control AT debbugs.gnu.org, with unarchive 31062 in the body.
You can then email your comments to 31062 AT debbugs.gnu.org in the normal way.

Toggle the display of automated, internal messages from the tracker.

View this report as an mbox folder, status mbox, maintainer mbox


Report forwarded to bug-gnu-emacs <at> gnu.org:
bug#31062; Package emacs. (Wed, 04 Apr 2018 18:27:02 GMT) Full text and rfc822 format available.

Acknowledgement sent to charles <at> aurox.ch (Charles A. Roelli):
New bug report received and forwarded. Copy sent to bug-gnu-emacs <at> gnu.org. (Wed, 04 Apr 2018 18:27:02 GMT) Full text and rfc822 format available.

Message #5 received at submit <at> debbugs.gnu.org (full text, mbox):

From: charles <at> aurox.ch (Charles A. Roelli)
To: bug-gnu-emacs <at> gnu.org
Subject: 26.0.91; warning on UTF-8 encoding of unibyte text
Date: Wed, 04 Apr 2018 20:26:52 +0200
(This test case assumes a locale-coding-system of utf-8-unix, and
LANG: en_GB.UTF-8 or anything similar.)

emacs -q
C-x b test RET
M-: (insert-byte 195 1) RET
M-: (insert-byte 188 1) RET	> buffer text should look like \303\274
C-x C-s /tmp/foo RET		> the path is irrelevant

There's this warning:

These default coding systems were tried to encode text
in the buffer ‘test’:
  (utf-8-unix (1 . 4194243) (2 . 4194236))
However, each of them encountered characters it couldn’t encode:
  utf-8-unix cannot encode these: \303 \274

Is the text "(1 . 4194243) (2 . 4194236)" useful here?  It looks like
it's there by accident.  If it is helpful, could someone please
explain what it means?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31062; Package emacs. (Wed, 04 Apr 2018 19:21:02 GMT) Full text and rfc822 format available.

Message #8 received at 31062 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: charles <at> aurox.ch (Charles A. Roelli)
Cc: 31062 <at> debbugs.gnu.org
Subject: Re: bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
Date: Wed, 04 Apr 2018 22:20:55 +0300
> Date: Wed, 04 Apr 2018 20:26:52 +0200
> From: charles <at> aurox.ch (Charles A. Roelli)
> 
> emacs -q
> C-x b test RET
> M-: (insert-byte 195 1) RET
> M-: (insert-byte 188 1) RET	> buffer text should look like \303\274
> C-x C-s /tmp/foo RET		> the path is irrelevant
> 
> There's this warning:
> 
> These default coding systems were tried to encode text
> in the buffer ‘test’:
>   (utf-8-unix (1 . 4194243) (2 . 4194236))
> However, each of them encountered characters it couldn’t encode:
>   utf-8-unix cannot encode these: \303 \274
> 
> Is the text "(1 . 4194243) (2 . 4194236)" useful here?

It shows the positions and the codepoints of the offending characters,
and the coding-system that was tried.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31062; Package emacs. (Thu, 05 Apr 2018 18:27:02 GMT) Full text and rfc822 format available.

Message #11 received at 31062 <at> debbugs.gnu.org (full text, mbox):

From: charles <at> aurox.ch (Charles A. Roelli)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 31062 <at> debbugs.gnu.org
Subject: Re: bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
Date: Thu, 05 Apr 2018 20:27:15 +0200
> Date: Wed, 04 Apr 2018 22:20:55 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
> 
> > There's this warning:
> > 
> > These default coding systems were tried to encode text
> > in the buffer ‘test’:
> >   (utf-8-unix (1 . 4194243) (2 . 4194236))
> > However, each of them encountered characters it couldn’t encode:
> >   utf-8-unix cannot encode these: \303 \274
> > 
> > Is the text "(1 . 4194243) (2 . 4194236)" useful here?
> 
> It shows the positions and the codepoints of the offending characters,
> and the coding-system that was tried.

Thank you for clarifying.  Could we write something like,

> These default coding systems were tried to encode text in the buffer
> 'test', but failed for the listed (POSITION . CODEPOINT) elements:

to make that clear to the user?




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31062; Package emacs. (Thu, 05 Apr 2018 18:48:02 GMT) Full text and rfc822 format available.

Message #14 received at 31062 <at> debbugs.gnu.org (full text, mbox):

From: Eli Zaretskii <eliz <at> gnu.org>
To: charles <at> aurox.ch (Charles A. Roelli)
Cc: 31062 <at> debbugs.gnu.org
Subject: Re: bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
Date: Thu, 05 Apr 2018 21:47:11 +0300
> Date: Thu, 05 Apr 2018 20:27:15 +0200
> From: charles <at> aurox.ch (Charles A. Roelli)
> CC: 31062 <at> debbugs.gnu.org
> 
> > These default coding systems were tried to encode text in the buffer
> > 'test', but failed for the listed (POSITION . CODEPOINT) elements:
> 
> to make that clear to the user?

Feel free to suggest a patch, but the list includes the coding-systems
tried, not just positions and codepoints.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31062; Package emacs. (Sun, 08 Apr 2018 09:49:01 GMT) Full text and rfc822 format available.

Message #17 received at 31062 <at> debbugs.gnu.org (full text, mbox):

From: charles <at> aurox.ch (Charles A. Roelli)
To: Eli Zaretskii <eliz <at> gnu.org>
Cc: 31062 <at> debbugs.gnu.org
Subject: Re: bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
Date: Sun, 08 Apr 2018 11:49:12 +0200
> Date: Thu, 05 Apr 2018 21:47:11 +0300
> From: Eli Zaretskii <eliz <at> gnu.org>
>
> > > These default coding systems were tried to encode text in the buffer
> > > 'test', but failed for the listed (POSITION . CODEPOINT) elements:
> > 
> > to make that clear to the user?
> 
> Feel free to suggest a patch, but the list includes the coding-systems
> tried, not just positions and codepoints.

That's true, but after looking at the code of
select-safe-coding-system-interactively, it seems that the "rejected"
list is also printed in the same run as "unsafe", and "rejected" is
indeed a list of coding systems.

	    (insert
	     "These default coding systems were tried to encode"
	     (if (stringp from)
		 (concat " \"" (if (> (length from) 10)
				   (concat (substring from 0 10) "...\"")
				 (concat from "\"")))
	       (format-message " text\nin the buffer `%s'" bufname))
	     ":\n")
	    (let ((pos (point))
		  (fill-prefix "  "))
	      (dolist (x (append rejected unsafe)) ← "rejected" printed here
		(princ "  ") (princ x))
	      (insert "\n")
	      (fill-region-as-paragraph pos (point)))

Strangely, the "rejected" list is then printed again, if it's non-nil:

	    (when rejected
	      (insert "These safely encode the text in the buffer,
but are not recommended for encoding text in this context,
e.g., for sending an email message.\n ")
	      (dolist (x rejected)
		(princ " ") (princ x))
	      (insert "\n"))

One solution might be to only print the "rejected" list in this second
form, and in the first form explain more clearly what is the meaning
of the elements in the "unsafe" list.




Information forwarded to bug-gnu-emacs <at> gnu.org:
bug#31062; Package emacs. (Thu, 02 Sep 2021 08:53:01 GMT) Full text and rfc822 format available.

Message #20 received at 31062 <at> debbugs.gnu.org (full text, mbox):

From: Lars Ingebrigtsen <larsi <at> gnus.org>
To: charles <at> aurox.ch (Charles A. Roelli)
Cc: 31062 <at> debbugs.gnu.org
Subject: Re: bug#31062: 26.0.91; warning on UTF-8 encoding of unibyte text
Date: Thu, 02 Sep 2021 10:52:35 +0200
charles <at> aurox.ch (Charles A. Roelli) writes:

> (This test case assumes a locale-coding-system of utf-8-unix, and
> LANG: en_GB.UTF-8 or anything similar.)
>
> emacs -q
> C-x b test RET
> M-: (insert-byte 195 1) RET
> M-: (insert-byte 188 1) RET	> buffer text should look like \303\274
> C-x C-s /tmp/foo RET		> the path is irrelevant
>
> There's this warning:
>
> These default coding systems were tried to encode text
> in the buffer ‘test’:
>   (utf-8-unix (1 . 4194243) (2 . 4194236))
> However, each of them encountered characters it couldn’t encode:
>   utf-8-unix cannot encode these: \303 \274
>
> Is the text "(1 . 4194243) (2 . 4194236)" useful here?  It looks like
> it's there by accident.  If it is helpful, could someone please
> explain what it means?

I've now made this warning more readable (and informative) by formatting
it as a table, and saying what the data means in Emacs 28.

-- 
(domestic pets only, the antidote for overdose, milk.)
   bloggy blog: http://lars.ingebrigtsen.no




bug marked as fixed in version 28.1, send any further explanations to 31062 <at> debbugs.gnu.org and charles <at> aurox.ch (Charles A. Roelli) Request was from Lars Ingebrigtsen <larsi <at> gnus.org> to control <at> debbugs.gnu.org. (Thu, 02 Sep 2021 08:53:02 GMT) Full text and rfc822 format available.

bug archived. Request was from Debbugs Internal Request <help-debbugs <at> gnu.org> to internal_control <at> debbugs.gnu.org. (Thu, 30 Sep 2021 11:24:06 GMT) Full text and rfc822 format available.

This bug report was last modified 2 years and 180 days ago.

Previous Next


GNU bug tracking system
Copyright (C) 1999 Darren O. Benham, 1997,2003 nCipher Corporation Ltd, 1994-97 Ian Jackson.